July 2, 2026

The Real POCQi Explorer

A condensed scatter of all 30 specialties (drag the sample-size floor and watch the fake trend dissolve), plus a live OpenEvidence run showing why the hard half of point-of-care — building the question from a 1,200-document chart — never reaches the test set.

Data explorer
July 1, 2026

The Common Answer Trap

Companion to “Primary care declares independence”

A naive model reaches for the answer the internet gives most often. Play the model, then meet the patient the common answer would have hurt.

~3 min

The Verification Layer

Companion to “Primary care declares independence”

Build the checker that flags a confidently-common wrong answer before a human signs it. Discover why model confidence is the wrong signal to trust.

~3 min
June 30, 2026

Can You Survive the Rephrase?

Companion to “Health AI flunks the stress test”

Frontier AI models ace medical benchmarks — then break when the question is rephrased. See if you can do better.

~3 min

The Readiness Gap Simulator

Companion to “Health AI flunks the stress test”

Frontier AI models top the medical benchmark — then collapse under stress. Toggle the perturbations and watch the leaderboard reshuffle.

~3 min

The PERC Consistency Test

Experiment companion

20 runs. 3 models. 2 temperatures. One patient on Camila. Watch LLMs struggle with the estrogen trap hidden in prior visit notes.

~5 min