Interactives — clinicians.dev

July 2, 2026

The Real POCQi Explorer

Source: jjfenglab/Real-POCQi · arXiv 2606.28960

A condensed scatter of all 30 specialties (drag the sample-size floor and watch the fake trend dissolve), plus a live OpenEvidence run showing why the hard half of point-of-care — building the question from a 1,200-document chart — never reaches the test set.

Data explorer

The Citation Effect

Source: jjfenglab/Real-POCQi · arXiv 2606.28960

Same answer, different score. Showing citations boosts OpenEvidence from 71% to 83% — while hurting every other model. The finding that upends how we evaluate clinical AI.

~3 min

The Inverse-Care Explorer

Source: “The Quantified Few” · CDC PLACES

Wearables skew toward the healthy and wealthy. But CDC data shows chronic disease clusters where coverage is thinnest. Every dot is a state — drag the coverage line and explore the gap.

Data explorer

The Quantified Few

Source: “The Quantified Few” · CDC PLACES

Continuous monitoring works — it cut blood pressure across 106,261 patients. So predict who's actually wearing the monitor, then meet the people the data forgets.

~3 min

July 1, 2026

The Common Answer Trap

Companion to “Primary care declares independence”

A naive model reaches for the answer the internet gives most often. Play the model, then meet the patient the common answer would have hurt.

~3 min

The Verification Layer

Companion to “Primary care declares independence”

Build the checker that flags a confidently-common wrong answer before a human signs it. Discover why model confidence is the wrong signal to trust.

~3 min

June 30, 2026

Can You Survive the Rephrase?

Companion to “Health AI flunks the stress test”

Frontier AI models ace medical benchmarks — then break when the question is rephrased. See if you can do better.

~3 min

The Readiness Gap Simulator

Companion to “Health AI flunks the stress test”

Frontier AI models top the medical benchmark — then collapse under stress. Toggle the perturbations and watch the leaderboard reshuffle.

~3 min

The PERC Consistency Test

Experiment companion

20 runs. 3 models. 2 temperatures. One patient on Camila. Watch LLMs struggle with the estrogen trap hidden in prior visit notes.

~5 min