Salesforce AI research introduces new benchmarks, guardrails and model architectures to enhance trusted and capable AI agents

Salesforce AI Research outlines a comprehensive roadmap for building smarter, reliable and versatile AI agents. Recent initiatives focus on addressing the fundamental limitations in current AI systems, especially their inconsistent task performance, lack of robustness, and the challenges of adapting to complex enterprise workflows. By introducing new benchmarks, model architectures and security mechanisms, Salesforce is building a multi-layer framework to responsibly scale the proxy system.
Solve “jagged intelligence” through targeted benchmarks
One of the main challenges highlighted in this study is the Salesforce terminology Jagged intelligence: Unstable behavior of AI agents across similar complexity tasks. To systematically diagnose and reduce this problem, the team introduced Simple Benchmark. The dataset contains 225 direct, reasoning-oriented questions that humans answer with almost perfect consistency, but still lack the language model. The goal is to reveal the gap in the capabilities of models that cross seemingly unified problems, especially in real-world inference schemes.
Supplement simple ContextualJudgeBenchwhich evaluates the agent’s ability to maintain accuracy and loyalty in context-specific answers. This benchmark not only emphasizes factual correctness, but also the agency’s ability to recognize when to abstain, an important feature of trust-sensitive applications such as the legal, financial and health care sectors.
Enhance security and robustness through trust mechanisms
Recognizing the importance of AI reliability in enterprise settings, Salesforce is expanding its Trust layer With new safeguards. this sfr-guard The model family has been trained in open domain and domain specificity (CRM) data to detect timely injection, toxic output, and hallucinatory content. These models serve as dynamic filters to support real-time inference of context modest functionality.
Another component, Crmarenais a simulation-based evaluation suite designed to test proxy performance under conditions that mimic real CRM workflows. This ensures that AI agents can be promoted outside of training prompts and operate predictably in various enterprise tasks.
Professional model family of reasoning and action
To support more structured target-guided behaviors in agencies, Salesforce has introduced two new model series: XLAM and Tacos.
this Xlam (Extended Language and Action Models) Connection is optimized for tool use, multi-turn interactions and function calls. These models vary in size (from 1B to 200B+ parameters) and are built to support enterprise-level deployments where integration with APIs and internal knowledge sources is critical.
Tacos (idea and action chain optimization) Models designed to improve agent planning capabilities. By explicitly modeling intermediate inference steps and corresponding actions, Taco enhances the ability of the agent to decompose complex targets into sequences of operations. This structure is especially important with use cases such as document automation, analysis and decision support systems.
Operation agents through Agentforce
These functions are being unified agentSalesforce’s platform construction and deployment of autonomous agents. The platform includes a codeless Agent BuilderThis allows developers and domain experts to specify proxy behaviors and constraints in natural language. Integration with the broader Salesforce ecosystem ensures that agents can access customer data, invoke workflows and maintain auditing.
A Valoir study found that teams using AgentForce can be 16 times faster, while improving operational accuracy by up to 75% compared to traditional software methods. Importantly, agent agents are embedded in the Salesforce Trust layer, thus inheriting the security and compliance capabilities required in the enterprise context.
in conclusion
Salesforce’s research agenda reflects a shift toward more deliberate architecturally aware AI development. By combining targeted assessments, fine-grained security models, and dedicated building structures for reasoning and action, the company is laying the foundation for the next generation of agency systems. These advancements are not only technical, but also structural – emphasizing reliability, adaptability and aligning with the nuanced needs of enterprise software.
Check Technical details. Also, don’t forget to follow us twitter And join us Telegram Channel and LinkedIn GrOUP. Don’t forget to join us 90K+ ml reddit.
🔥 [Register Now] Minicon Agesic AI Virtual Conference: Free Registration + Certificate of Attendance + 4-hour Short Event (May 21, 9am-1pm) + Hands-On the Workshop
Asif Razzaq is CEO of Marktechpost Media Inc. As a visionary entrepreneur and engineer, ASIF is committed to harnessing the potential of artificial intelligence to achieve social benefits. His recent effort is to launch Marktechpost, an artificial intelligence media platform that has an in-depth coverage of machine learning and deep learning news that can sound both technically, both through technical voices and be understood by a wide audience. The platform has over 2 million views per month, demonstrating its popularity among its audience.
