Amazon shuts down internal AI leaderboard after employees game metrics and spike cloud costs

Workers gamed an internal leaderboard with useless AI loops, inflating cloud costs and exposing the perils of forced tech adoption.

May 29, 2026

Amazon shuts down internal AI leaderboard after employees game metrics and spike cloud costs
Amazon has officially shut down KiroRank, an internal leaderboard designed to track and rank employees’ usage of artificial intelligence tools, following revelations that workers were systematically gaming the system[1][2]. The decision highlights an unexpected challenge for the tech giant: rather than fostering genuine innovation, the gamified ranking system incentivized employees to run redundant, low-value AI tasks to inflate their metrics, a practice internally dubbed tokenmaxxing[3][4]. By deploying autonomous software agents to perform needless operations, workers drove up computing activities and substantially increased Amazon's cloud infrastructure costs[1][5]. The shutdown of the leaderboard underscores the complex and often expensive hurdles that major technology firms face as they rush to integrate generative artificial intelligence into everyday operations[1][6].
The issue began with Amazon’s aggressive push to make artificial intelligence a cornerstone of its internal operations[7]. To accelerate adoption, the company set an internal target aiming for more than 80 percent of its software developers to utilize generative AI tools on a weekly basis[8][9]. As part of this effort, Amazon widely deployed MeshClaw, an in-house agentic AI tool designed to let employees create autonomous software bots[8][10]. These agents could connect to standard workplace programs to perform repetitive tasks, such as triaging emails, drafting communications, and interacting with workplace chat software like Slack[10][9]. To monitor progress and encourage compliance, Amazon launched the KiroRank dashboard on its internal Kiro developer platform[1][3]. KiroRank scored employees based on their overall AI activity, primarily measuring raw token consumption—the units of data processed by language models[8][9].
While the tool was designed to motivate staff, it quickly created a highly competitive atmosphere[7]. Although Amazon executives assured employees that token consumption statistics would not be used in formal performance reviews, many workers remained skeptical[7][9]. Fearing that managers were quietly monitoring the data, and seeking to demonstrate compliance with company-wide AI mandates, some employees began utilizing MeshClaw to automate completely unnecessary tasks[8][7]. Rather than driving productivity, the setup allowed workers to easily generate high volumes of background activity, such as triggering endless AI-driven email triage cycles or setting up loops of agents communicating with one another[1][10]. This artificial inflation of usage scores quickly spread across teams, exposing the pitfalls of focusing on the volume of AI use rather than its utility[10][4].
The widespread manipulation of the system eventually forced senior leadership to intervene as the financial impact of tokenmaxxing became impossible to ignore[3][6]. Dave Treadwell, an Amazon senior vice-president, addressed employees directly, confirming that the KiroRank leaderboard had been taken offline[1][3]. Treadwell explained to staff that while the tracking service had been built with good intentions to encourage technology adoption, it had inadvertently led to significant, unnecessary infrastructure costs[5][6]. In his message, he issued a direct plea to the workforce, instructing employees not to use artificial intelligence just for the sake of using it[3][2]. The situation served as a classic corporate demonstration of Goodhart's Law: when a measure becomes a target, it ceases to be a good measure[10][4].
This challenge is not unique to Amazon. Other Silicon Valley giants, including Meta, have reportedly experienced similar patterns where internal pressure to adopt AI resulted in employees artificially inflating their usage scores to satisfy managerial expectations[2]. When companies emphasize adoption metrics over actual value, the human instinct to optimize for metrics inevitably takes over[10]. For Amazon, which prides itself on operational efficiency, the realization that its own highly paid developers were dedicating computational power to useless AI loops represented both a cultural and financial embarrassment.
In response to the leaderboard debacle, Amazon is shifting its approach to how it measures and encourages artificial intelligence integration[3][2]. Instead of tracking raw token counts, the company has introduced a new metric known as normalized deployments[3][4]. This metric focuses on the quality and utility of AI engagement, specifically tracking instances where AI-generated code is successfully deployed and utilized in actual, productive work[3][2]. By tying metrics directly to output rather than raw input or computational consumption, Amazon hopes to curb wasteful activity while continuing to advance its developmental goals[3][4]. This shift reflects a broader, industry-wide realization that raw API calls are a poor proxy for genuine productivity gains[10][4].
The financial stakes of managing these costs are massive[6]. Amazon's capital expenditure is projected to hit approximately $200 billion, with the vast majority of those resources allocated to expanding data centers and securing the advanced hardware required to run complex AI models[1][2]. Because running generative AI models is incredibly resource-intensive, every needless query or automated loop contributes directly to the company's electricity and server hosting bills[6][4]. At a time when Amazon has executed sweeping layoffs and cost-cutting measures across various divisions to fund its massive AI infrastructure buildup, wasting expensive cloud compute on dummy tasks to climb an internal dashboard was an unsustainable irony[1].
Ultimately, the quiet demise of Amazon's AI leaderboard offers a vital lesson for the broader technology industry as it navigates the generative AI boom. While tech companies face immense pressure from Wall Street to prove they are rapidly adopting and implementing AI, forcing adoption through arbitrary metrics can easily backfire[7][9]. Simply using artificial intelligence does not automatically equate to business value, and when metrics are poorly aligned, employees will inevitably find ways to game the system at the company's expense[10]. As the initial hype around generative AI matures into a demand for measurable returns on investment, companies will need to move past superficial adoption targets and focus on building systems that reward genuine, cost-effective productivity rather than meaningless digital noise[10][4].

Sources
Share this article