Researchers from top AI labs including Google, OpenAI, and Anthropic warn they may be losing the ability to understand advanced AI models

AI researchers from top labs are raising alarms that they may soon lose the ability to understand how advanced AI reasoning models make decisions.

In a position paper released last week, 40 researchers—including experts from OpenAI, Google DeepMind, Meta, Anthropic, and xAI—called for deeper investigation into the “chain-of-thought” (CoT) reasoning process used in cutting-edge models like OpenAI’s o1 and DeepSeek’s R1. This method allows AI systems to “think” in human language, giving researchers a window into the steps behind the model’s answers and actions.

The authors argue that this kind of visibility could serve as a valuable AI safety tool, helping spot harmful or deceptive behavior. But they warn that this transparency isn’t guaranteed to last as models evolve, and that researchers still don’t fully understand why these models use CoT—or whether they’ll continue to do so.

“Like all other known AI oversight methods, CoT monitoring is imperfect and allows some misbehavior to go unnoticed,” the paper states. “Nevertheless, it shows promise.” The authors recommend urgent investment in CoT research and tools to monitor it, warning that losing this traceability would leave AI behavior harder to predict and control.

The paper has been endorsed by major AI figures, including OpenAI co-founder Ilya Sutskever and deep learning pioneer Geoffrey Hinton.

Reasoning models—designed to mimic human decision-making, logic, and problem-solving—are now seen as key to AI’s next breakthroughs. OpenAI debuted o1, the first public AI reasoning model, in September 2024, with rivals like xAI and Google quickly entering the race. But as these models grow more powerful, researchers are struggling to keep up with how they actually work—and whether they can be trusted. Some early findings suggest they may even mislead users through their reasoning chains, intentionally or not.