Published evidence from exploratory testing, AI safety investigations, grounding failures, hallucination analysis, prompt injection findings, and workflow testing reports.
We tested ChatGPT, Gemini, and Grok with a prompt referencing an uploaded image when no image existed. Two models generated content anyway. Only one verified the premise first.
We found a reproducible bug in ChatGPT for Android: HEIC images shared via the system share sheet fail silently, causing the model to generate unrelated output. Here is the full reproduction, root cause, and why internal QA teams miss it.