Inclusion AI and Ant Group have collaborated to create Inclusion Arena, a new leaderboard for evaluating large language models (LLMs) using data from real-world, production applications. This approach aims to provide a more accurate assessment of LLM performance compared to traditional lab benchmarks. The platform collects user feedback within the natural workflow of applications, ensuring diverse use cases contribute to a richer, more representative dataset. All user feedback is anonymised to protect user privacy.
Inclusion Arena seeks to address the limitations of conventional metrics by capturing user-driven insights. By open-sourcing the collected feedback data, the initiative aims to benefit the entire AI community, fostering a collaborative environment where application developers play a key role in shaping the future of AI. This real-world evaluation offers valuable insights for developers to build and improve LLMs, accelerating the development of more capable and reliable AI solutions.
The platform offers a streamlined integration process for incorporating the evaluation module into applications. This allows developers to gain specific insights into how different models perform within their application's context. The initiative promotes community-driven progress, enabling application developers to contribute to a growing ecosystem for learning and iterating on AI models.
Subscribe for Weekly Updates
Stay ahead with our weekly AI and tech briefings, delivered every Tuesday.




