Openai releases reinforced fine-tuning (RFT) on O4-Mini: a step to custom model optimization

Openai has introduced reinforced fine-tuning (RFT) on its O4-Mini inference model and has introduced a new technology tailored to professional tasks. RFT is based on the principles of reinforcement learning, enabling organizations to define custom goals and reward features, allowing fine control over model improvements – beyond the offers provided by fine-tuning of standard supervision.
The core of RFT is designed to help developers bring their models closer to the ideal behavior of real-life applications by teaching them not only what to output, but also preferred that output in a specific field.
What is enhanced fine tuning?
Strengthening fine-tuning will apply the enhanced learning principle to fine-tuning of language models. Developers rely not only on examples of tags, but provide task-specific Classification machine– Functions to evaluate and score model output based on custom criteria. The model is then trained to optimize for this reward signal, gradually learning to produce a response that is consistent with the desired behavior.
This approach is particularly valuable for nuances or subjective tasks that are difficult to define ground truth. For example, you may not have marked the data as “the best way to explain it in medicine”, but you can write a program that evaluates clarity, correctness, and integrity and let the model learn accordingly.
Why O4 Mini?
Openai’s O4-Mini is a compact inference model released in April 2025, optimized for text and image input. It is part of Openai’s new generation of multitasking models, especially powerful in structured reasoning and thoughtful tips.
By enabling RFT on O4-Mini, OpenAI gives developers access to a lightweight but capable foundation that can be precisely tuned for high-risk, domain-specific inference tasks – while still remaining computationally efficient and fast enough for real-time applications.
Application Use Cases: Which developers are building with RFT
Several early adopters have demonstrated the actual potential of RFT on O4-Mini:
- Compatible with artificial intelligence A custom tax analysis model was built that uses a rule-based classifier to implement compliance logic, improving accuracy by 39%.
- Environmental Health Care Use RFT to improve medical coding accuracy, increasing ICD-10 allocation performance by 12 points on labels written by physicians.
- HarveyA legal AI startup fine-tuned the model to extract citations from legal documents, while F1 improved by 20%, matching performance for extended latency and GPT-4O.
- Runloop The model was trained to generate efficient striped API fragments, and gained 12% using AST validation and syntax-based grading.
- Milo, the dispatch assistant, improves the output quality of complex calendar prompts, which can improve 25 points.
- SafetyKit The moderate accuracy of produced content is increased from 86% to 90% F1 by performing granular policy compliance through customized grading capabilities.
These examples emphasize the strength of RFT in settlement models in models with specific purpose requirements – whether it involves legal reasoning, medical understanding, code synthesis, or policy enforcement.
How to use RFT on O4-Mini
Starting to strengthen the investigation involves four key components:
- Design grading function: Developers define Python functions that evaluate model output. This feature returns scores from 0 to 1 and can encode task-specific preferences such as correctness, format or tone.
- Prepare the dataset: High-quality prompt datasets are crucial. Openai recommends using examples that reflect the diversity and challenges of target tasks.
- Start training: Through OpenAI’s fine-tuning API or dashboard, users can start RFT running through adjustable configuration and performance tracking.
- Evaluation and iteration: Developers monitor reward progress, evaluate checkpoints and refine hierarchical logic to maximize performance.
Comprehensive documentation and examples are available through OpenAI’s RFT guide.
Visit and Pricing
RFT is currently available for verification organizations. Active training time is USD 100 per hour per hour. If you use a managed OpenAI model to run a classifier (such as GPT-4O), the token usage of these calls will be charged separately at a standard inference rate.
As an incentive, OpenAI offers a 50% discount on training costs for organizations that agree to share their datasets for research and model improvement purposes.
A leap in model customization
Strengthening fine-tuning represents how we adapt the underlying model to a specific need to change. RFT not only replicates the output of the tag, but also enables the model to internalize feedback loops that reflect real-world application goals and constraints. For organizations engaging in complex workflows where precision and consistency are important, this new capability opens up key avenues for reliable and effective AI deployment.
Now, with RFT available on the O4-MINI inference model, OpenAI not only provides developers with tools to fine-tune not only the language, but also the reasoning itself.
Check Detailed documentation here. Also, don’t forget to follow us twitter.
Here is a brief overview of what we built in Marktechpost:
Nikhil is an intern consultant at Marktechpost. He is studying for a comprehensive material degree in integrated materials at the Haragpur Indian Technical College. Nikhil is an AI/ML enthusiast and has been studying applications in fields such as biomaterials and biomedical sciences. He has a strong background in materials science, and he is exploring new advancements and creating opportunities for contribution.