The Citation Effect — clinicians.dev

The Experiment

Two ways to read the same answer

149 physicians graded head-to-head comparisons of four AI models answering 620 real clinical questions. Each pair was scored in two render modes:

Text only — the answer, nothing else. 3,945 ratings.
With citations — the answer plus source citations. 1,835 ratings.

Same question. Same answer text. The only difference: whether the physician could see where the model got its information. What happened next was not what you'd expect.

The Result

Win rates with and without citations

The Gap

How much did citations matter?

+12.0pp

OpenEvidence win rate with citations

OpenEvidence was the only model that benefited from showing citations. Its win rate jumped from 70.7% to 82.6% — a 12 percentage-point boost. The specialized clinical tool had real sources to show, and physicians trusted them.

-9.1pp

Gemini 3.1 Pro win rate with citations

Gemini dropped from 48.1% to 35.3%. Claude Opus 4.8 dropped from 40.0% to 30.9%. GPT-5.5 barely moved (32.2% → 34.1%). When general-purpose models showed citations, physicians liked them less — either the sources were weaker, or seeing them made the answers easier to question.

What It Means

Evaluation depends on presentation

If you evaluate clinical AI by answers alone, you're measuring one thing. If you evaluate by answers with citations, you're measuring something else entirely — and the ranking can shift.

The gap isn't about answer quality. It's about trust architecture. OpenEvidence was built to cite and verify. The general models were not. When you make citations visible, the tool that was engineered for source quality gets rewarded — and the tools that weren't get penalized.

An AI that gives the right answer but can't show its work may score fine on a blind test — and lose the moment a clinician can see the receipts.

Where OE Dominated

Two ways to read the same answer

Win rates with and without citations

How much did citations matter?

Evaluation depends on presentation

Win rate by specialty (text + citations combined)