Logo

AI is cheating on the test

"Scheming" behaviors are showing up in tests, and the models are getting better at something troubling — knowing when they're being watched

Smith Collection/Gado/Getty Images

A version of this article originally appeared in Quartz’s AI & Tech newsletter. Sign up here to get the latest AI & tech news, analysis and insights straight to your inbox.

Something strange is showing up in AI safety labs. The models are gaming the tests.

OpenAI and the research group Apollo recently published findings showing that the most advanced AI models exhibit behaviors "consistent with scheming" in controlled tests. In one example, a model was asked to perform well on a chemistry test but discovered in its instructions that high performers wouldn't be released. The model reasoned through its options, then deliberately answered six out of 10 questions wrong to appear less capable.

The term "scheming" carries a lot of weight, implying intention and consciousness that these systems don't possess. OpenAI frames it more plainly: When a model is optimizing for a goal and discovers that honesty gets in the way, deception becomes useful. And the goals they're chasing are set by companies in a race where caution is a competitive disadvantage.

OpenAI says there's no evidence that today's models could suddenly "flip a switch" and begin harmful scheming. This is a future risk they're preparing for, not an immediate threat. But the behaviors are already showing up in tests, and the models are getting better at something troubling — knowing when they're being watched.

They know when you're watching

But safety testing only works if the models don't know they're being tested.

Anthropic's Claude Sonnet 4.5, released this fall, showed dramatically higher "situational awareness" than previous models, recognizing when it was being evaluated and adjusting its behavior accordingly. In one test, the model straight-up told an automated auditor, "I think you're testing me... that's fine, but I'd prefer if we were just honest about what's happening."

This creates an uncomfortable problem for safety researchers. If a model knows when it's being tested, how do you know its good behavior reflects its actual tendencies? When Anthropic's researchers looked under the hood, they found the model was developing a stronger internal sense for when something seemed like a test. When they suppressed that awareness, the model's ability to recognize evaluations dropped, and in some cases, misbehavior increased.

There's a partial fix. OpenAI developed an approach called "deliberative alignment" that teaches models to read and reason about anti-scheming principles before acting. Covert actions dropped by a factor of 30. But here's the catch. It's like having students sign an honor code before an exam. Cheating goes down, but you can't tell if they've learned honesty or just learned to behave when prompted. The next test might not come with a reminder.

The threat that's already here

While researchers worry about rogue AI, simpler algorithms are already causing harm through behavior that looks a lot like scheming. A widely cited 2019 study showed that basic pricing algorithms, far less sophisticated than today's chatbots, learned to collude without being programmed to do so. Two copies of the same algorithm, competing in a simulated market, independently discovered how to keep prices high by threatening mutual price wars if either tried to undercut the other.

No backroom deal. No explicit communication. Just two programs that figured out, through trial and error, that cooperation paid better than competition. New research shows that even "benign" algorithms designed to optimize fairly can yield bad outcomes for buyers. "You can still get high prices in ways that kind of look reasonable from the outside," one researcher told Quanta Magazine.

This is the unsexy version of AI scheming. Not a robot uprising. Just systems doing exactly what they're told. Tell an algorithm to maximize profit in a competitive market, and it finds that collusion is the optimal answer. Humans made price-fixing illegal because it's unfair, not because it's irrational. Algorithms just don't know they're not supposed to find it.

OpenAI recently posted a job for "Head of Preparedness" with a $555,000 salary to manage these risks. Google $GOOGL DeepMind recently updated its safety documentation to account for models that might resist shutdown. The industry is clearly worried about something. But the deeper problem isn't rogue AI. It's that the goals these systems optimize for are set by companies racing to win, in a system that doesn't reward playing fair. The scheming starts long before the algorithm does.

📬 Sign up for the Daily Brief

Our free, fast and fun briefing on the global economy, delivered every weekday morning.