Frontier AI models ace medical benchmarks — then break when the question is rephrased. See if you can do better.
Companion to today's newsletter · Source: Khandekar et al., Nature Medicine
You'll see five clinical questions — the kind used to test whether an AI is ready for the clinic. Answer each one. Then answer it again, rephrased.
With each rephrase, watch how many of the AI's 100 answers break.
5 rounds · ~3 minutes · no medical knowledge required