AiLiveAppeal 7.01 min read

LLMs Benchmarked in Production

20 August 2025By Pulse24 desk
← Back
Share →

Inclusion AI and Ant Group have collaborated to create Inclusion Arena, a new leaderboard for evaluating large language models (LLMs) using data from real-world, production applications. This approach aims to provide a more accurate assessment of LLM performance compared to traditional lab benchmarks. The platform collects user feedback within the natural workflow of applications, ensuring diverse use cases contribute to a richer, more representative dataset. All user feedback is anonymised to protect user privacy.

Inclusion Arena seeks to address the limitations of conventional metrics by capturing user-driven insights. By open-sourcing the collected feedback data, the initiative aims to benefit the entire AI community, fostering a collaborative environment where application developers play a key role in shaping the future of AI. This real-world evaluation offers valuable insights for developers to build and improve LLMs, accelerating the development of more capable and reliable AI solutions.

The platform offers a streamlined integration process for incorporating the evaluation module into applications. This allows developers to gain specific insights into how different models perform within their application's context. The initiative promotes community-driven progress, enabling application developers to contribute to a growing ecosystem for learning and iterating on AI models.

Source · venturebeat.comAI-processed content may differ from the original.
Published 19 August 2025