AI4Bharat Launches Indic LLM Arena, Revolutionizing Indian Language AI Evaluation
Crowd-sourced public utility creates trusted benchmarks, ensuring AI effectively serves India's diverse languages and cultural contexts.
November 11, 2025

In a significant step towards fostering a more inclusive and contextually relevant artificial intelligence ecosystem, the AI4Bharat research lab at IIT Madras has launched the 'Indic LLM Arena'.[1] This new platform serves as a public, crowd-sourced leaderboard designed to benchmark the performance of Large Language Models (LLMs) specifically on the diverse and complex spectrum of Indian languages and cultural nuances.[2][1] The initiative, supported by Google Cloud, aims to address a critical gap in the global AI landscape, where evaluation metrics have been overwhelmingly dominated by English-centric benchmarks and Western cultural contexts.[2] By providing a neutral and transparent evaluation mechanism, AI4Bharat intends for the Arena to function as a "public utility," empowering developers, enterprises, and the general public to make informed decisions and contribute to the development of AI that truly serves the Indian population.[2][1]
The launch of the Indic LLM Arena is a direct response to the inadequacy of existing global leaderboards for evaluating AI models in the Indian context.[2] These established benchmarks often fail to capture the linguistic realities of India, where communication is frequently characterized by code-switching—the fluid mixing of languages such as 'Hinglish' (Hindi-English) or 'Tanglish' (Tamil-English) within a single conversation.[2][3][1] An AI model's proficiency in perfect English is of little consequence if it cannot comprehend a query from a farmer in rural Maharashtra or provide a culturally appropriate response to a user in Sikkim.[2] The Indic LLM Arena is engineered to assess models on three foundational pillars crucial to the Indian experience: language, context, and safety.[2][1] This involves evaluating a model's ability to handle multilingual inputs, understand local contexts and cultural sensitivities, and adhere to fairness norms relevant to Indian society.[1] This initiative aligns with the broader IndiaAI Mission, which seeks to accelerate the country's sovereign AI efforts by creating trusted benchmarks for both domestic and international LLMs.[1]
The operational framework of the Indic LLM Arena is rooted in a "human-in-the-loop" system that leverages crowdsourcing to generate robust and practical evaluations.[1] The process is designed for accessibility and ease of use. A user can enter a prompt through various methods including typing, voice, or transliteration in any Indian language or a mix of languages.[2][3][1] The platform then presents two responses generated by two anonymous AI models, designated simply as "Model A" and "Model B."[2] The user's role is to act as a judge, voting for the response they deem superior or declaring a tie.[2] Over thousands of these anonymous "battles," the platform aggregates the human judgments and employs the statistically robust Bradley-Terry model to establish a relative ranking of the models' performance on real-world Indian prompts.[2] This methodology ensures that the resulting leaderboard reflects a model's practical utility and effectiveness from a user's perspective, rather than just its performance on standardized academic tests.
AI4Bharat emphasizes that the Indic LLM Arena is more than a mere leaderboard; it is a foundational public utility designed to catalyze the entire Indian AI ecosystem.[2][1] For developers and researchers, it offers an invaluable, neutral ground for benchmarking their models against others, specifically on Indic use-cases, thereby fostering a more focused and rapid innovation cycle.[2] Enterprises can leverage the data and rankings to make informed decisions about which AI models to adopt, mitigating risks and accelerating the deployment of solutions that can effectively serve their customer base.[2] Furthermore, the platform empowers the public by allowing them to actively participate in defining what constitutes "good" AI for India, ensuring that the technology's benefits are not confined to English speakers and that the resulting digital public goods are accessible and useful for all Indians.[2]
Looking ahead, AI4Bharat has outlined a phased roadmap for the expansion of the Indic LLM Arena's capabilities. The initial phase, which is currently live, focuses on text-based inputs across multiple Indian languages and code-mixed scenarios.[2] The second phase will see the platform expand to evaluate omni-models, incorporating vision and audio capabilities to address image-based and voice-based interactions.[2][3] The third phase will introduce more complex, agentic tasks, such as the ability to handle large documents, integrate with web searches, and utilize other advanced workflows.[2][3] This forward-looking plan, combined with the commitment to keeping the platform's resources open-source, signals a long-term vision to create a comprehensive and evolving standard for AI evaluation in India, one that will undoubtedly inspire and guide the future of inclusive AI development.[2][1]