Researchers at the National University of Singapore introduce “meaningless” adaptive frameworks that reduce unnecessary reasoning by up to 90%

0 0 3 minutes read

Researchers at the National University of Singapore introduce “meaningless” adaptive frameworks that reduce unnecessary reasoning by up to 90%

The effectiveness of language models depends on their ability to simulate step-by-step inferences similar to humans. However, these inference sequences are resource-intensive and can be wasted for simple problems that do not require careful calculation. The lack of awareness of task complexity is one of the core challenges in these models. Even queries that can be answered directly, they often default to detailed reasoning. This approach increases token usage, extends response time, and increases system latency and memory usage. As a result, there is an urgent need to equip language models with a mechanism that allows them to make independent decisions about whether to think deeply or to make concise.

The current tools trying to solve this problem either rely on manually setting up heuristics or quickly switch engineering between short-term and long responses. Some methods use separate models and routing problems based on complexity estimation. Nevertheless, these external routing systems often lack insight into the advantages of the target model and cannot make the best decisions. Fine-tuning models of other techniques have quick hints, such as “inference on/off,” but these models rely on static rules rather than dynamic understanding. Despite some improvements, these approaches fail to achieve fully autonomous and context-sensitive controls in a single model.

Researchers at the National University of Singapore have launched a new framework called “thinkless” that gives language models the ability to make dynamic decisions between using short or long form reasoning. The framework is based on reinforcement learning and introduces two special control tokens – For the simple answer and Detailed response. By combining a new algorithm called Decoupling Group Relative Policy Optimization (DEGRPO), training focus between selecting inference patterns and improving the accuracy of generating responses can be separated. The design prevents the model from falling into one-dimensional behavior and can tailor adaptive reasoning for each query.

The method involves two stages: warm-up distillation and enhanced learning. During the distillation phase, the outputs from two expert models were trained, one specializing in brief responses and the other was detailed reasoning. This phase helps the model create a strong connection between the control token and the required inference format. The enhanced learning phase then fine-tunes the ability to decide which inference mode to use. DeGrpo breaks down learning into two separate goals: one for training control tokens and the other for perfecting the response token. This approach avoids gradient imbalances in early models, where longer responses overwhelm the learning signal, leading to a collapse in inference diversity. No thoughts to ensure and Tokens are updated balanced to facilitate stable learning across responsive types.

When assessing, there is no doubt that it will significantly reduce long-term reasoning while maintaining high accuracy. In the Minerva algebraic benchmark, the model uses In only 25.88% of cases, 94.59% of accuracy was obtained at the same time. Instead, traditional inference models must use extended chains of thought more frequently. On the AIME 2024 dataset, it undoubtedly achieved an accuracy of 27.33%, and its inference pattern uses 100%, which shows that it can maintain performance when sufficient reasoning is needed. On the GSM8K dataset, it uses Only 13.31% of the time, but still achieves 84.18% accuracy. These results reflect the model’s ability to handle simple and complex queries at the appropriate depth of reasoning, reducing unnecessary token generation by up to 90% in some tasks.

Overall, this study by researchers at the National University of Singapore provides a compelling solution to the inefficiency of unified reasoning in large language models. By introducing a mechanism that enables the model to judge the complexity of the task and adjust its inference strategy accordingly, thus meaninglessly optimizing accuracy and efficiency. This approach balances the depth of inference and response accuracy without relying on fixed rules, providing a data-driven approach for smarter language model behavior.

View paper and GitHub pages. All credits for this study are to the researchers on the project. Also, please feel free to follow us twitter And don’t forget to join us 95k+ ml reddit And subscribe Our newsletter.

Nikhil is an intern consultant at Marktechpost. He is studying for a comprehensive material degree in integrated materials at the Haragpur Indian Technical College. Nikhil is an AI/ML enthusiast and has been studying applications in fields such as biomaterials and biomedical sciences. He has a strong background in materials science, and he is exploring new advancements and creating opportunities for contribution.