Despite significant progress in the field of software development automation, artificial intelligence is still far from fully replacing programmers, especially in debugging and testing code. This is confirmed by research results from Microsoft Research, which introduced a new environment called debug-gym — a specialized simulator for testing and improving AI agents in the process of debugging real codebases.
This is reported by Business • Media
Limitations of Modern AI Models in Debugging
Unlike popular tools such as GitHub Copilot, debug-gym provides AI with access to features that were previously unavailable to AI models. This includes breakpoints, code navigation, reading variables, and creating tests. Such capabilities allow the models to operate closer to the work of a real developer, but even so, they demonstrate only 48.4% successful solutions, experts note.
“The fixes proposed by the debugging-capable coding agent, and then approved by the programmer, will be based on the context of the relevant codebase, program execution, and documentation, rather than relying solely on guesses based on previously reviewed training data.”
Reasons for Limitations and Future Development Prospects
Researchers point out that the main reason for the insufficient effectiveness of AI in debugging is the lack of adequate training data with step-by-step testing and debugging scenarios. Furthermore, the models are not yet trained to effectively use debugging tools. The next step may involve creating a supporting model that gathers the necessary information for the main system.
The authors emphasize that the primary value of artificial intelligence is to assist humans, not to replace them. Even when generating code for specific tasks, the models can create vulnerabilities and unstable solutions. Therefore, full automation of the development process remains an unattainable goal for now.
The Role of Humans in Future Development
Experts conclude that the development of agent-based AI systems in programming is progressing, but the role of human developers remains irreplaceable, especially in complex tasks of analysis, interpretation, and error correction.
As a reminder, it was previously reported that Shopify plans to hire only those specialists who cannot be replaced by artificial intelligence.