Oddbean new post about | logout
 RT @MarioNawfal: 🚨GROK-2 OUTRUNS COMPETITION | BENCHMARK PERFORMANCE KEY SCORES

- Graduate-Level Science Knowledge (GPQA):  56.0% (GPT-4 T…