Arm Moves AI Inference to the Edge, Creating Pervasive Device Intelligence.
Decentralizing AI: Arm’s architectural shift pushes real-time intelligence onto billions of edge devices to curb data center energy use.
December 23, 2025

The semiconductor powerhouse Arm Holdings is strategically repositioning itself at the crux of the artificial intelligence revolution, championing a massive architectural shift that moves core AI processing away from massive, centralized data centers and onto the physical devices that comprise the Internet of Things, a movement the company calls AI at the edge. The company's vision, articulated by executives like Vince Jesaitis, head of global government affairs at Arm, sees the next frontier of intelligence defined by highly responsive, power-efficient, and context-aware local processing. Jesaitis noted that while the industry has long referred to connected devices as 'smart,' they are on the cusp of becoming "truly intelligent" as processing moves onto the device itself.[1]
This pivot is driven by both technological potential and a critical energy imperative. The current paradigm of cloud-centric AI, particularly for running inference tasks—the process of applying a trained AI model to new data—is proving to be economically and environmentally unsustainable at scale. Arm CEO Rene Haas has directly addressed the issue, arguing that the industry's reliance on multi-gigawatt data centers for every AI workload is heading for a sustainability wall.[2] This has necessitated a shift in focus, where the compute-intensive process of AI model training will largely remain in the cloud, but the vast majority of inference—estimated to be between 50 to 60 percent of all AI computing—will be decentralized to local devices.[3] The deployment of AI in these edge environments, ranging from industrial sensors and autonomous vehicles to smartphones and earbuds, offers immediate benefits including lower power consumption, which shrinks the environmental footprint, and near-zero network latency, enabling instant translation or immediate triggering of safety controls.[1] For end-users, this means AI assistants and other applications can operate with a speed and reliability that centralized compute simply cannot deliver.[4]
The foundation of this edge revolution is Arm's next-generation intellectual property, which is purpose-built to handle sophisticated AI workloads in constrained environments. Central to this strategy is the Armv9 Edge AI Platform, a complete compute platform designed to support intelligent IoT applications.[5] The Armv9 platform integrates key components like the Arm Cortex-A320 CPU and the Arm Ethos-U85 Neural Processing Unit (NPU), which together enable on-device AI models with over one billion parameters.[4][6] This hardware is paired with the Arm KleidiAI software development platform, which extends to the edge to provide compute libraries that optimize AI and machine learning frameworks like Llama.cpp and ExecuTorch. This software layer is critical, as it can boost the performance of the new Cortex-A320 CPUs by up to 70% in certain scenarios, allowing lightweight large language models to run efficiently on devices.[5] The company’s entire approach is rooted in an architectural consistency, providing a unified compute platform that spans from the cloud all the way to the edge. This seamless, system-level design allows the massive developer ecosystem—which stands at over 22 million strong—to deploy AI models without the friction of fragmented software stacks, accelerating innovation and time-to-market.[7]
Arm’s unparalleled market ubiquity gives it a decisive advantage in pushing AI to the edge. With more than 310 billion Arm-based chips shipped to date, the architecture already serves as the silent engine behind nearly all modern consumer electronics and countless industrial systems.[8][9] Looking ahead, the company expects to have over 100 billion Arm devices ready for AI by the end of the next year, reinforcing its position as the backbone for the vast, distributed AI compute landscape.[10] This pervasive footprint is being strategically leveraged through programs like Arm Flexible Access, which now includes the Armv9 edge AI platform. This initiative lowers the barrier to entry for startups and smaller OEMs, offering low-cost or no-cost access to the latest technology, thereby democratizing the ability to innovate in edge AI.[4][6]
While the focus on edge AI is paramount, Arm maintains a critical presence in the high-end cloud compute market, ensuring a comprehensive, 'all-of-the-above' strategy. The energy-efficient nature of its Reduced Instruction Set Computing (RISC) architecture is also redefining data center infrastructure, where major hyperscalers like Amazon Web Services (Graviton), Google Cloud (Axion), and Microsoft Azure (Cobalt) are increasingly deploying Arm-based custom chips for both training and complex inference workloads.[11][7] Furthermore, Arm is a key partner to industry leaders, with its Neoverse architecture underpinning AI superchips like Nvidia’s Grace.[7] This cloud-to-edge continuum is vital, as hybrid models—like the one demonstrated in a collaboration with Meta, where voice recognition in smart glasses happens locally on an Arm chip before a cloud query—will define many future AI applications.[2]
Beyond the technological architecture, the company’s vision encompasses the necessary policy and workforce infrastructure. Jesaitis’s role highlights the strategic engagement with global policymakers on issues like supply chain resilience, concentrated dependencies, and the need for an "AI-ready workforce."[1] He stresses that effective AI policy must be grounded in an understanding of hardware and systems design, making the case that physical constraints like energy availability and manufacturing capacity are key drivers of the industry’s renewed interest in efficiency and heterogeneous compute.[12] This integrated approach, which simultaneously pushes for architectural and policy solutions, positions Arm as an essential enabler of the future intelligent ecosystem. The culmination of these efforts signals a profound market transformation where the AI utility that captivates consumers and drives industrial efficiency will increasingly reside not in a distant server farm, but locally on the device, ushering in an era of truly pervasive, real-time intelligence.[1][13]
Sources
[2]
[3]
[6]
[7]
[8]
[10]
[11]
[12]
[13]