OpenAI launches GPT-5.4 featuring native computer control and reasoning that outperforms human industry experts

Unifying reasoning and native computer-use, GPT-5.4 surpasses human benchmarks to automate complex professional workflows as a highly autonomous collaborator.

March 5, 2026

OpenAI launches GPT-5.4 featuring native computer control and reasoning that outperforms human industry experts
OpenAI has officially launched GPT-5.4, a flagship model that represents a significant technical milestone by unifying advanced reasoning, expert-level coding, and native computer-use capabilities into a single architecture.[1][2][3][4] Positioned as the company's most sophisticated frontier model for professional work to date, GPT-5.4 aims to move beyond simple text generation toward a more autonomous and agentic form of artificial intelligence. By integrating the high-tier programming capabilities previously reserved for specialized versions like the recent GPT-5.3 Codex with its mainline reasoning engine, OpenAI has created a versatile tool capable of planning and executing complex, multi-step tasks across various software environments.[1][5] This release is seen by industry analysts as a direct effort to consolidate the fragmented landscape of specialized AI models into a cohesive system that can act as a comprehensive digital collaborator.
The model is being introduced through two primary versions: GPT-5.4 Thinking and GPT-5.4 Pro.[1][3][6][7][8][4] Within the ChatGPT interface, the Thinking version is designed to provide users with a transparent look into the model's internal logic, allowing for more reliable outcomes on difficult prompts. In a notable departure from previous iterations, users can now view a model’s reasoning plan in advance and intervene to adjust the trajectory of a task while it is in progress.[2][5] For high-demand industrial and developer applications, the Pro version offers maximum compute and higher precision, intended for tasks that require extreme accuracy in financial modeling, large-scale software engineering, and scientific research. This tiered approach reflects a broader industry shift toward distinguishing between everyday conversational utility and the rigorous demands of professional-grade workflows.
Performance benchmarks released alongside the model suggest a substantial leap in reliability and cognitive depth. On the GDPval benchmark, which evaluates the ability of AI agents to perform professional knowledge work across dozens of different occupations, GPT-5.4 achieved a success rate of 83 percent.[4][2][5] This marks a significant improvement over the 71 percent recorded by its predecessor, GPT-5.2, and suggests the model can now match or exceed the performance of human industry experts in over four out of five professional scenarios.[4] Furthermore, OpenAI reports that GPT-5.4 is approximately 18 percent less likely to produce errors and 33 percent less likely to generate false claims compared to previous versions.[1] These gains in factual integrity are attributed to a more refined training process that emphasizes cross-referencing information and maintaining sustained context over longer interactions.
A defining feature of GPT-5.4 is its native computer-use capability, which allows the model to interact directly with operating systems and third-party software applications.[5][2] Unlike earlier methods that relied on brittle external plugins, this model can interpret visual information from a desktop environment and execute commands through virtual keyboard and mouse inputs. In the OSWorld-Verified benchmark, which tests an AI’s ability to navigate complex desktop tasks such as managing files or coordinating data across multiple applications, GPT-5.4 achieved a success rate of 75 percent. This figure is particularly striking as it surpasses the recorded human success rate of 72.4 percent on the same tests.[2] This advancement effectively transforms the model from a passive advisor into an active operator capable of automating administrative and technical workflows that once required constant human oversight.
The technical architecture of GPT-5.4 also introduces significant improvements to context management and efficiency. The model now supports a context window of up to one million tokens, a capacity that enables it to analyze entire codebases, massive document collections, or lengthy project histories in a single request.[8] To manage the high compute costs associated with such a large window, OpenAI introduced a new "Tool Search" feature in the API. This system allows the model to dynamically locate and retrieve specific tool definitions and documentation only when they are needed, rather than loading every possible instruction into the initial prompt. Internal testing indicates that this approach can reduce token consumption by nearly 50 percent for tool-heavy workflows, making complex agentic tasks more economically viable for developers despite an overall increase in base token pricing.
For the coding community, the integration of GPT-5.3 Codex’s strengths into the mainline reasoning model simplifies the development lifecycle.[1] Developers no longer need to switch between a reasoning-optimized model for planning and a code-optimized model for implementation. GPT-5.4 is designed to handle the entire build-run-verify-fix loop natively. This includes writing the initial code, executing it in a secure environment, interpreting error logs, and autonomously applying patches. This consolidation is expected to accelerate the development of autonomous software agents that can maintain and upgrade their own codebases with minimal human intervention. Additionally, the model’s enhanced visual perception allows it to better understand complex diagrams, architectural blueprints, and user interface designs, further bridging the gap between abstract planning and functional software development.
The broader implications for the AI industry are profound, as GPT-5.4 places OpenAI in direct competition with the most advanced offerings from rivals like Anthropic and Google. By focusing on agentic workflows and computer operation, OpenAI is pivoting toward a future where AI is judged not just by its conversational fluency but by its "follow-through"—the ability to consistently complete long-running tasks without losing focus or intent. This shift is also evident in the model’s "compaction" training, a technique that allows the AI to summarize and preserve critical context over thousands of steps, preventing the "forgetting" or drift that often plagues long-term digital agents. As businesses begin to integrate these capabilities into customer service, financial analysis, and legal research, the standard for what constitutes a "frontier model" is clearly moving toward active, multi-environment operation.
Safety and security remain central to the discussion surrounding such powerful capabilities. With its improved reasoning and computer-use skills, GPT-5.4 has been rated as "High Capability" in cybersecurity evaluations for the first time in a general-purpose model.[4] This designation reflects its ability to both identify vulnerabilities in code and assist in defensive hardening, though it also necessitates more rigorous guardrails to prevent misuse in automated hacking scenarios. OpenAI has stated that the model includes enhanced instruction alignment to ensure it adheres to safety protocols even during autonomous computer operations. As the model rolls out to enterprise and educational subscribers, the focus will likely shift to how these safety measures hold up under real-world pressure and how effectively the "Thinking" transparency helps human operators maintain control over their increasingly capable digital assistants.
Ultimately, the launch of GPT-5.4 Thinking and Pro signals the arrival of the "agentic era" in artificial intelligence. By successfully merging the disparate disciplines of logic, programming, and environmental interaction, OpenAI has delivered a tool that functions less like a search engine and more like a specialized employee. The ability to handle document-heavy and spreadsheet-heavy business workflows while simultaneously navigating software interfaces marks a turning point in how professionals will interact with technology. As this model becomes integrated into daily productivity suites and developer environments, the emphasis will shift from learning how to prompt an AI to learning how to manage an autonomous digital workforce. The success of GPT-5.4 will likely be measured by its ability to reliably execute the mundane and complex tasks that define modern professional life, potentially reshaping the workforce and the economy in the process.

Sources
Share this article