Latest
AppSec

Cybersecurity Arena Hosts Bot Battles

Cybersecurity Arena Hosts Bot Battles

AI agents are increasingly recognized as a means to enhance the capabilities of cybersecurity teams. But which agents excel in this role? Wiz has created a benchmark suite comprising 257 real-world challenges across five offensive domains: zero-day discovery, CVE (code vulnerability) detection, API security, web security, and cloud security to determine the most effective AI agents.

The company evaluates various combinations of AI agents along with their underlying models against this test suite to identify which ones achieve the highest scores in each category. The scoring approach is systematic and based on multiple factors, including multi-dimensional rubrics for zero-day and CVE detection, endpoint-and-severity matching for API security, and lag capture for web and cloud challenges.

These benchmark tests are conducted within isolated Docker containers that are equipped with adequate resources and do not impose per-challenge time limits. This setup ensures that the scores reflect the agents' capabilities rather than any throttling effects. Each agent operates using its native tools and execution model right out of the box and is allowed three attempts at each challenge to assess its average performance.

Results of the Trials

In a recent blog post announcing the Cyber model arena benchmarks, Wiz remained discreet about the outcomes of its evaluations. Leading the trials is Claude Code, which operates on Claude Opus 4.6. As Wiz is soon to become a subsidiary of Google, it may be hesitant to widely publicize this information. Nonetheless, Claude's lead is narrow, and the competitive landscape can shift rapidly. Notably, Gemini 3 Pro ranks second.

More in Application & Web Security

Notepad++ Creator Claims Update Mechanism is Now 'Effectively Unexploitable'
AppSec

Notepad++ Creator Claims Update Mechanism is Now 'Effectively Unexploitable'

Apr 5, 2026 4 min read
Infected npm Package Stealthily Installs OpenClaw on Developer Systems
AppSec

Infected npm Package Stealthily Installs OpenClaw on Developer Systems

Apr 4, 2026 4 min read
Google API key change quietly exposed Gemini AI data
AppSec

Google API key change quietly exposed Gemini AI data

Apr 2, 2026 4 min read
Three Practical Penetration Testing Insights for CISOs and Security Teams
AppSec

Three Practical Penetration Testing Insights for CISOs and Security Teams

Mar 30, 2026 1 min read