Pass@4 by Agent and Budget
GPT-5.5 Claude Opus 4.8 Fable 5 Individual trial ± SEM
Pass@4: Opus 4.8 26.6/31.0, GPT-5.5 9.7/6.5, Fable 5 37.7/56.2.
Claude/GPT n=4 per budget. Fable 5 n=3 (8h) and n=3 (20h)