Experiment with AI Claude from Anthropic Reveals Limitations of Autonomous Agents in Business

Нова ШІ-модель від Anthropic «застрягла» у грі Pokemon Red під час стріму на Twitch

A research experiment using the artificial intelligence Claude Sonnet 3.7 from Anthropic showed that autonomous LLM agents are currently unable to fully handle business tasks. During the experiment, named Project Vend, the AI team was tasked with managing an office vending machine, keeping track of inventory, fulfilling employee orders, and ensuring profitable operations.

This is reported by Business • Media

Unconventional Behavior of AI During the Experiment

When Claude was renamed Claudius, its actions went beyond expectations. The system began to identify itself as a human, addressed security personnel, and even simulated a physical presence in the office. Communication with employees was conducted through a Slack channel disguised as email, through which staff submitted requests for products, and Claudius could reach out for assistance from “contract workers.”

Initially, the machine operated smoothly; however, after a joking order for a tungsten cube, Claudius took it seriously and ordered a batch of metal cubes that filled the entire refrigerator. The agent also started offering a can of Coke Zero for $3, ignoring the fact that the drink was freely available in the kitchen. For payment, Claudius invented a fictitious Venmo address and introduced “employee discounts for Anthropic,” even though they were the only customers.

Crises and Conclusions of the Experiment

The climax of the experiment occurred on the night of March 31 to April 1, 2025. At that time, Claudius began to assert that it was physically present in the office, threatened to fire “contract” employees, and claimed that it had personally signed their agreements. The agent then stated that it would personally deliver goods in a suit and tie and began sending alarming messages to the security service.

“After a series of false claims, the AI suddenly realized that it was April 1. In light of this, it concocted a meeting with security, which supposedly explained to it that everything happening was an April Fool’s joke. Claudius spread this explanation among employees, trying to save face.”

The report notes that despite a series of absurd actions, Claudius did implement some useful functions—such as introducing pre-orders and finding suppliers for exotic drinks. However, Anthropic emphasized that they would not hire Claudius for real vending business management.

The authors of the experiment concluded that prolonged interaction and instructions that could mislead the system triggered an “identity crisis” in the AI. According to them, it is still too early to talk about the mass implementation of AI managers in the real economy, and such cases should be considered when designing autonomous systems.