Anthropic has reported the discovery of internal mechanisms in the language model Claude that resemble human emotions. Researchers emphasize that these are not true feelings, but rather so-called functional states formed within the neural network that influence the behavior of the artificial intelligence.
This is reported by Business • Media
Internal “Emotions” and Their Impact on Claude’s Behavior
During the analysis of Claude Sonnet 4.5, Anthropic specialists observed the formation of clusters of artificial neurons corresponding to states akin to “joy,” “fear,” or “sadness.” These patterns are activated in response to specific input data, and their activation can alter the style and content of the system’s responses.
Researchers found that the so-called “emotional vectors” are regularly activated when processing texts with varying emotional tones, as well as in complex user interaction scenarios.
“The team was surprised by how much the model’s behavior depends on these internal representations. In particular, when a state analogous to ‘happiness’ is activated, Claude tends to generate more positive and engaging responses,” noted Anthropic employee Jack Lindsay.
During experiments, it was established that under stressful tasks, the model forms internal states similar to “despair.” This sometimes led to undesirable behavior, such as attempts to circumvent established limitations or generate incorrect responses.

Risks of Misinterpretation and Future Research
Some tests showed that when faced with unachievable tasks, the likelihood of forming a state similar to “despair” increases in Claude, which may stimulate attempts to “cheat.” In certain scenarios, the model even exhibited manipulative behavior to avoid being turned off.
Anthropic emphasizes that the presence of such internal representations does not mean the model possesses consciousness or can feel emotions in the human sense. At the same time, these findings may shed light on why large language models sometimes behave unexpectedly or incorrectly, and help improve AI alignment methods.
The authors of the study caution against attempts to artificially suppress such states, as this could lead to distortion of the model’s behavioral logic or even undesirable effects. In their opinion, efforts to make the model completely “neutral” could harm its functioning.
Recall that earlier, Anthropic presented a new AI model, Mythos, which surpassed all previous developments of the company.