OpenAI upgrades Deep Research to GPT-5.2 engine featuring targeted search and real-time tracking
OpenAI’s move to GPT-5.2 adds targeted search and real-time tracking, empowering professional workers despite ongoing reliability concerns.
February 10, 2026
The move by OpenAI to transition its Deep Research capabilities to the GPT-5.2 engine represents a major milestone in the evolution of autonomous agents, marking a shift from specialized reasoning prototypes to a unified, production-grade intelligence layer.[1] Deep Research, which originally debuted using experimental reasoning models, has now been fully integrated into the flagship GPT-5.2 series, a move that provides users with significant boosts in long-context processing and tool-calling reliability. Along with the architectural upgrade, OpenAI has introduced two long-requested features: the ability to target specific websites for deep-dive investigations and a real-time tracking interface that allows users to monitor the agent’s internal "trajectory" as it navigates the web.[1] While these updates enhance the utility of the tool for professional knowledge workers, the transition also highlights the persistent challenges of factuality and the potential for error compounding in multi-step autonomous workflows.[1]
The technical core of this update lies in the capabilities of GPT-5.2, which OpenAI has segmented into three distinct modes: Instant, Thinking, and Pro.[2] Deep Research primarily leverages the Pro and Thinking variants, which are optimized for high-effort reasoning and complex planning.[1] Unlike its predecessors, which often struggled to maintain coherence over hours-long research sessions, GPT-5.2 features an expanded 400,000-token context window and a 128,000-token output limit. This allows the model to ingest thousands of pages of source material, from dense legal filings to technical manuals, without losing track of the original user intent.[1] Internal benchmarks released by the company suggest that GPT-5.2 Thinking reduces hallucinations by approximately 30 percent compared to the previous GPT-5.1 model, specifically in tasks requiring the integration of information across massive document sets.[1]
One of the most transformative additions to the Deep Research suite is the targeted website search feature.[1] Previously, the agent functioned as a broad-spectrum web crawler, autonomously deciding which corners of the internet to explore based on general search engine results.[1] Now, users can explicitly instruct the model to confine its research to a specific list of domains or individual URLs.[1] For a financial analyst, this might mean pinning the research to the Securities and Exchange Commission’s database and reputable market data providers, while a legal professional could restrict the model to official government archives.[1] This level of granularity effectively transforms ChatGPT from a generalist search tool into a specialized research assistant capable of operating within strict "knowledge silos," thereby reducing the noise from irrelevant or low-quality web sources that often lead to superficial summaries.[1]
To complement this increased control, OpenAI has introduced a real-time tracking system that visualizes the agent's research path. Because Deep Research queries can take anywhere from five to thirty minutes to complete, users were previously left in the dark until a final report was generated.[1] The new interface provides a live stream of the model's activities, showing which search queries it is executing, which links it has deemed relevant, and when it is backtracking to find better information.[1] This transparency is not just aesthetic; it serves as a critical feedback loop.[1] By watching the agent’s chain of thought in real-time, users can identify early on if the model has misunderstood a prompt or is drifting into an unproductive line of inquiry, allowing for more precise interventions in the research process.[1]
However, the leap in intelligence provided by GPT-5.2 does not necessarily solve the problem of reliability, a point that both industry critics and early adopters have emphasized.[1] Despite the 30 percent reduction in hallucination rates, the "agentic" nature of Deep Research introduces a new kind of risk known as error compounding.[1] Because the model makes a series of autonomous decisions—where each subsequent step is based on the findings of the previous one—a single misunderstood fact early in the process can lead the model down a "rabbit hole" of increasingly inaccurate conclusions.[1] While a human researcher might notice a logical inconsistency and verify the source, an AI agent may treat a hallucinated premise as a foundational truth, building an entire 40-page report on a flawed data point. This risk is particularly acute in the "Pro" mode, where the model is encouraged to think harder and longer, potentially over-analyzing data to the point of inventing non-existent connections.[1]
Early feedback from the professional community also points to a noticeable shift in the model's "personality" following the upgrade to GPT-5.2.[1][3] Many users have reported that while the model is undeniably more capable at technical tasks like spreadsheet generation and complex coding, its output has become increasingly "robotic" and "corporate" in tone.[1] This appears to be an intentional trade-off by OpenAI to ensure tighter instruction following and professional-grade report writing. By prioritizing adherence to structured formats and rigorous citations, the model has lost some of the conversational warmth and creative flexibility found in the GPT-4 and early GPT-5 eras.[1] For enterprise clients, this cold, analytical tone is a feature, ensuring that reports are ready for the boardroom; for casual researchers, it can make the AI feel more like a rigid database than a collaborative partner.
The competitive landscape for deep research agents has intensified alongside OpenAI’s release, with Google recently launching its Gemini Deep Research agent.[1] The two systems represent fundamentally different approaches to the same problem.[4][1][5] Google’s version leverages the company’s massive web index and cached pages to summarize information from thousands of sources simultaneously.[1] In contrast, OpenAI’s Deep Research on GPT-5.2 focuses on a "quality over quantity" approach, using its reasoning engine to navigate the live web as a human would, opening and reading specific pages one by one. This allows the OpenAI model to interact with real-time data and behind-the-scenes content that might not yet be indexed by search engines, though it does so at a higher computational cost and slower speed than Google’s transformer-based summarization approach.[1]
As these tools become more integrated into the global economy, the question of their broader impact looms.[1] OpenAI leadership has suggested that the current iteration of Deep Research is already capable of performing a single-digit percentage of economically valuable knowledge work.[6][1] By automating the "grunt work" of data collection, synthesis, and report drafting, GPT-5.2 allows professionals to focus on high-level strategy and final verification. Yet, the burden of that verification remains firmly with the human user.[1] Industry experts warn that the polish and professional formatting of the Deep Research reports can create a "veneer of authority" that discourages users from double-checking the underlying citations.[1] As the AI industry moves toward even more autonomous "Agentic AI" systems, the divide between a model’s perceived capability and its actual reliability remains the most significant hurdle for widespread adoption in high-stakes environments.[1]
Ultimately, the deployment of GPT-5.2 within Deep Research suggests that OpenAI is moving toward a future where the AI is no longer just an answering machine, but an active participant in professional workflows.[1] The introduction of targeted search and real-time tracking addresses the two biggest pain points of autonomous research: lack of control and lack of transparency.[1] However, as the complexity of these models grows, so too does the difficulty of auditing their work.[1] While GPT-5.2 is undoubtedly the most powerful research tool OpenAI has ever released, its debut serves as a reminder that "more intelligence" does not always equate to "more truth." For now, the tool remains a high-powered engine that requires a human driver who is willing to look closely at the map, verify the directions, and take the wheel when the agent ventures into uncertain territory.[1]