The WordPress AI Team has announced the release of WP‑Bench, the first official benchmark designed to measure how well language models understand WordPress development.
WP-Bench & the AI Team
Introduced by Core AI co‑lead James LePage, WP‑Bench evaluates models across two dimensions: knowledge through multiple‑choice questions on WordPress concepts, and execution through code generation tasks graded by a real WordPress runtime.
“WordPress powers over 40% of the web, yet AI models are rarely tested on WordPress‑specific tasks,” said LePage. “WP‑Bench ensures developers and AI providers alike can make informed decisions about tooling and model performance.”
Benchmarks like MMLU, SWE‑Bench, GPQA, and HumanEval dominate general AI evaluation, but WP‑Bench fills a critical gap by testing WordPress‑specific skills. This makes it highly relevant for the millions of developers and site owners who rely on WordPress.
The AI team has a clear idea about the future of WP-Bench. “Our goal is for WP-Bench to become the standard evaluation AI providers use when releasing new models – creating a virtuous cycle where WordPress performance improves with each generation,” LePage shared.
WP-Bench is launching as an early release, and the team has been open about its current limitations. The dataset is still relatively small, coverage is weighted toward newer WordPress 6.9 features, and some older concepts no longer provide strong differentiation between models.
“These limitations are exactly why we’re releasing now rather than waiting,” said LePage. “We know that the WordPress community is uniquely positioned to help build a robust, representative benchmark.”
WordPress AI Team was formed in 2025 as a dedicated group focused on accelerating and coordinating artificial intelligence projects across the WordPress ecosystem. The team was launched with James LePage, Felix Arntz, Pascal Birchler, and Jeff Paul as the team contributors.
The team’s current projects include the PHP AI Client SDK, Abilities API, MCP Adapter, and AI Experiments Plugin. WP-Bench complements these projects, including the Abilities API and MCP Adapter, by providing a standardized method for evaluating how well AI models integrate with WordPress.
The team is working toward an open, public leaderboard that will track how different models perform on WordPress tasks. Community members can contribute to WP-Bench by adding test cases, running benchmarks, improving grading logic, and submitting results. They can also join AI discussions at #core-ai channel.
Nik McLaughlin, Product Manager at GoDaddy, tweeted about WP-Bench, “I’ve been saying it, but this is the perfect move from the core team to improve LLM performance in WP contexts 👏 No partnerships, no weights or training scripts, just open access to tooling for the community to test and see.”