The Readiness Gap Simulator — clinicians.build

Apply a stress test

Each toggle perturbs the question without changing the medicine. Watch the red marker — the model's stressed score — drop away from its pristine benchmark score.

Accuracy on USMLE-style questions

● benchmark ● under stress

—

Benchmark leader

—

Most robust under stress

Toggle a stress test above to begin. Right now every model is sitting on its benchmark score — the number that lands it in the press release.

The discovery: the model that tops the benchmark is rarely the one left standing under stress. Rankings reshuffle, and the gap is widest for the models that were "best" on paper — including the med-tuned specialist that overfit the test.

Read today's newsletter →