Arizona Scientists Call AI’s Reasoning Ability an Illusion and Warn of Risks

ШІ-модель від Google за 48 годин вирішила «проблему десятиліття» супербактерій

A group of researchers from the University of Arizona has questioned the real capabilities of modern artificial intelligence (AI) models regarding logical reasoning. The scientists argue that prevalent approaches, particularly the so-called “chain of thought” (CoT), do not provide true generalized thinking ability.

This is reported by Business • Media

Experiment and Its Results

To test the effectiveness of AI models, a testing environment called DataAlchemy was created. In this environment, small language models were trained on simple text transformations, such as ROT encryption or cyclic shifts. After that, the systems were asked to apply the learned skills to new, previously unseen combinations.

The results showed that when faced with unfamiliar combinations, the models often responded either correctly but with flawed reasoning, or conversely – demonstrated correct reasoning but provided an incorrect conclusion. Even minor changes in the task format, such as text length or characters, led to a sharp decline in accuracy.

Limitations and Threats of Use

The researchers noted that adding a small amount of relevant data during supervised fine-tuning (SFT) does indeed improve results. However, this does not address the core issue – the lack of abstract thinking ability in LLMs. The scientists consider this approach merely a temporary solution rather than a fundamental change.

“Chains of thought in their current form are structured matches to a template that break with the slightest change in conditions. At the same time, the model’s ability to generate coherent but incorrect text creates an illusion of reliability that can mislead users.”

The scientists are convinced that existing benchmarks and tests should focus more on tasks that go beyond the training data to more effectively identify the weaknesses of such systems. A particular danger lies in the perception of CoT results as equivalent to human thinking in critical areas – medicine, finance, and law. For future models, according to the study’s authors, a key task should be overcoming simple pattern recognition and developing true reasoning skills.

It was previously reported that Mark Zuckerberg announced Meta’s plans to create a “personal superintelligence.”