Pass@4 by Agent and Budget
GPT-5.5
Claude Opus 4.8
Fable 5
Individual trial
± SEM
Pass@4: Opus 4.8 26.6/31.0, GPT-5.5 9.7/6.5, Fable 5 37.7/56.2.
Claude/GPT n=4 per budget. Fable 5 n=3 (8h) and n=3 (20h)