OpenAI Introduces EVMbench for Testing AI Agents in Ethereum Security

OpenAI запустила бенчмарк для ШІ-агентів із пошуку вразливостей в екосистемі Ethereum

OpenAI has developed a new benchmark, EVMbench, aimed at assessing the effectiveness of artificial intelligence in detecting vulnerabilities in smart contracts within the Ethereum ecosystem. The tool was created in partnership with the investment firm Paradigm and the cybersecurity company OtterSec.

This is reported by Business • Media

Features of the EVMbench Platform and Results of Initial Tests

The benchmark is based on 120 vulnerabilities collected from 40 smart contract audits. The vast majority of these vulnerabilities were discovered during open-source competitions. Following the tests, the best result was achieved by the Claude Opus 4.6 model, which received a “detection reward” of $37,824.

Developers note that the launch of EVMbench comes amid a rise in financial threats. In 2025 alone, cybercriminals stole cryptocurrencies worth over $4 billion, surpassing the previous year’s figures.

Comparison of AI models in detecting vulnerabilities in Ethereum smart contracts. Data: OpenAI.
Comparison of AI models in detecting vulnerabilities in Ethereum smart contracts. Data: OpenAI.

The Importance of AI for Security and the Future of the Industry

The company emphasizes that as the use of AI agents grows, it becomes increasingly important to measure their performance in real economic conditions where significant funds are at stake. EVMbench allows for the evaluation of AI’s ability to analyze, write, and execute code in critical environments.

“Smart contracts regularly secure over $100 billion in open-source crypto assets. As AI agents improve in reading, writing, and executing code, it becomes increasingly important to measure their capabilities in economically significant environments,” the company stated.

OpenAI expects that the use of AI to protect smart contracts will help reduce risks in the crypto industry. In particular, the company predicts an increase in stablecoin payments made by AI agents, which will heighten the demand for the security of such systems. Developers stress that the potential of artificial intelligence should be harnessed to combat cybercrime and enhance the protection of deployed contracts.

With EVMbench, the industry gains the ability to track progress in detecting and mitigating vulnerabilities, contributing to an overall increase in security within the cryptocurrency space. It is also worth mentioning that previously generated Claude code was responsible for the hacking of the Moonwell protocol for nearly $2 million.