Pass@4 by Agent and Budget
Claude Opus GPT-5.4 Individual trial ± SEM (n=5)
Baseline
Baseline pass@4: Claude 11.2/12.2, GPT 1.8/2.4.
With playbook hints
With playbook hints pass@4: Claude 11.9/11.4, GPT 11.2/8.8.