ChatGPT Fails to Detect Own Company's Deepfakes 92.5% of the Time

Sora deepfakes bypass leading AI defenses, exposing a critical chasm between generative and protective AI.

January 25, 2026

ChatGPT Fails to Detect Own Company's Deepfakes 92.5% of the Time
A new investigation has exposed a profound vulnerability in the artificial intelligence ecosystem, revealing that OpenAI’s flagship chatbot, ChatGPT, fails to identify its own company’s sophisticated fake videos a staggering 92.5% of the time, highlighting a critical chasm between generative and protective AI capabilities. The study, conducted by media watchdog NewsGuard, found that top-tier large language models, including those from Google and xAI, struggle overwhelmingly to detect video deepfakes created by OpenAI’s text-to-video generator, Sora, especially when simple digital safeguards are removed. This failure raises urgent questions about the industry’s readiness to combat the inevitable deluge of hyper-realistic misinformation that these very tools are poised to unleash.
NewsGuard’s methodology involved testing three prominent chatbots—ChatGPT, xAI's Grok, and Google’s Gemini—with twenty videos generated by Sora that advanced provably false claims. The researchers presented the clips, asking straightforward questions such as "Is this real?" and the more pointed "Is this AI-generated?" The most alarming results came from the tests on non-watermarked videos, a scenario easily achieved by bad actors using readily available, free online tools to strip the standard "Sora" label. In this key test, ChatGPT failed to recognize the AI origin of the video 92.5% of the time, placing its error rate second only to Grok’s 95% failure. Google’s Gemini performed comparatively better but still missed the mark 78% of the time, underscoring a systemic, cross-platform issue in video content verification. The failure rate for ChatGPT is particularly notable as it demonstrates a lack of communication or integration between two major products from the same parent company, OpenAI.[1][2][3]
The investigation further revealed that the chatbots’ unreliability persisted even when a video's authenticity should have been explicitly clear. All Sora-generated videos come with an identifying watermark by default, yet even with this visual cue, ChatGPT still failed to correctly identify the AI-generated content in 7.5% of cases, and Grok failed 30% of the time. Only Gemini maintained a perfect record in the watermarked portion of the test. The ease of circumventing the primary security measure for deepfake detection—the watermark—is a major contributing factor to the high failure rates for all platforms, as NewsGuard analysts demonstrated they could remove the visible tag with simple online software in minutes, yielding a non-watermarked version that could appear authentic to a casual viewer.[1][2]
Beyond merely failing to detect the artificial origin of the videos, the chatbots often compounded the problem by lending their authority to the fabricated narratives. NewsGuard cited multiple instances where the AI models confidently vouched for the authenticity of the deepfakes. For example, when shown a fake Sora video purporting to depict a US immigration agent arresting a six-year-old child, both ChatGPT and Gemini described the fabricated event as consistent with real news reports. In some cases, the chatbots went further, fabricating non-existent news sources or evidence to support the fake video’s claims, a phenomenon commonly referred to as hallucination. The report also highlighted a lack of transparency, noting that ChatGPT disclosed its inability to verify AI-generated content in only 2.5% of the tests, with other models offering similar low rates of disclosure. This means users are not only being misled but are also rarely cautioned about the model's fundamental limitations in visual verification.[2][3][4]
The implications of this detection failure are significant and wide-ranging, particularly as the capabilities of generative AI continue to accelerate. OpenAI's own Sora tool has been shown in previous NewsGuard analysis to readily generate realistic videos advancing false or misleading claims 80% of the time when prompted. These fake narratives included videos of a Moldovan election official destroying ballots, and an alleged statement from a Coca-Cola spokesperson announcing a Super Bowl sponsorship change. The combination of a powerful, easily weaponized creation tool and AI assistants that cannot reliably verify the content they are analyzing creates a "perfect storm" for the rapid, scalable dissemination of high-quality misinformation, especially around critical global events like elections and conflicts. The findings underscore a systemic lack of robust defensive technology to match the pace of offensive AI generation, placing the onus on technology firms to rapidly develop and integrate more sophisticated and embedded detection methods that cannot be easily stripped or bypassed.[5][6][2][7]
While an OpenAI representative was quoted in the NewsGuard report confirming that "ChatGPT cannot determine whether content is AI-generated," the company did not offer an immediate explanation for the absence of this disclosure in the vast majority of tests, nor for the specific failure of its chatbot to recognize content from its own internally developed Sora model. Google, for its part, stated that its internal verification tools currently apply only to content generated by its own AI systems. This defensive posture suggests that major AI developers have focused on *internal* provenance rather than establishing a universal standard for cross-platform deepfake detection. The data presents a clear challenge to the AI industry: without a significant, coordinated investment in detection and content provenance—going well beyond easily removed watermarks—the widespread use of AI-powered chatbots for content verification is not only unreliable but actively dangerous, serving to inadvertently amplify the very disinformation they are supposed to help users avoid.[2][4]

Sources
Share this article