Timeline
Oct 2025 – Present
Role
Researcher & Systems Designer
Team
AI Game & Governance Lab, UC Davis
Status
In Development
Long Story Short

AI is entering clinical settings faster than the best practices meant to guide its use.

Guidelines exist for liability and compliance, but in practice, how clinicians actually integrate AI into their workflows is largely self-directed. Experienced practitioners have the instinct to catch when something is off. Less experienced ones are still building that instinct.

That gap is where Second Opinion lives.

Second Opinion is a browser-based simulation game where players take on the role of a Residency Director navigating a hospital floor increasingly dependent on AI-assisted clinical documentation. Players consult patients, review AI-generated notes, and make decisions whose consequences ripple forward through readmissions, memos, and policy changes.

It is not a training module. It is an orienting experience. The goal is one thing: make questioning what looks right a habit, not an afterthought.

💭
How might we give entry-level clinicians a space to experience the consequences of AI compliance before those consequences belong to a real patient?
The Research

"The AI couldn't understand the nuances of the context we actually work in"

We interviewed practicing clinicians about how they were actually using AI, not how they thought they should, but what was happening in their day to day practice. These were the key findings:

Finding 01

AI misses clinical context in ways that are not obvious

Clinicians described having to go back and correct audio summaries that looked right but were wrong in subtle ways. AI summarizing spoken consultations does not know which details are clinically significant and which are not.

"Had to go back and edit a lot. The AI kept reading an 8-year-old patient as a 5-month-old in the audio summary. It couldn't understand the nuances of the context we actually work in."
Clinician interview
Finding 02

HIPAA compliance is being navigated individually

Clinicians described working out their own approaches to using AI within HIPAA requirements, replacing patient names and details before prompting, or having families consent to their own data being used. There is no shared protocol. Everyone is figuring it out alone.

"We replace patient names and details before using AI. One colleague noted they were not being HIPAA compliant at all. They just did not realize it."
Clinician interview
Finding 03

The experience gap is the real risk

Experienced clinicians have enough pattern recognition to catch when something is off. Less experienced practitioners are still building that instinct. The risk is not AI replacing clinicians. It is AI misleading the ones who have not yet learned to question it.

"Clinicians who've worked for decades might not become overreliant, but they may cause risks for less experienced clinicians."
Clinician interview

Taken together, the findings pointed to a consistent pattern: AI is being used without shared protocols, it misses context in ways that are not obvious, and the practitioners most at risk are the ones still building the instincts to catch it. The problem is not awareness. It is that awareness alone does not change what happens in the moment.

The Design Problem

You cannot learn to distrust confidence by reading about it.

Research on automation bias is consistent: awareness of a problem does not reliably change the behavior that produces it. Knowing that AI systems can be confidently wrong does not interrupt the habit of deferring to them under time pressure. Behavioral change requires practice under the conditions that trigger the behavior, not information about those conditions.

That is what simulation offers that training modules do not. A simulation does not tell you what to do. It creates the situation where you have to decide, with something at stake if you get it wrong.

01
Games create the conditions training modules cannot.
Time pressure, incomplete information, and consequences that matter. These are the conditions under which AI over-reliance becomes dangerous. A simulation replicates them. A lecture does not.
02
Entry-level practitioners are where habits form.
Experienced clinicians have decades of pattern recognition to fall back on. Entry-level practitioners are forming their clinical instincts now, before those instincts exist. An orienting experience at this stage has the most leverage.
03
Delayed consequences mirror real accountability.
Clinical errors do not surface immediately. A patient returns days later. A memo arrives the next morning. Designing consequences with delay mirrors how accountability actually works and makes the learning feel earned, not instructed.
The Player & The World

The role had to carry clinical, supervisory, and institutional weight simultaneously.

The player needed to be close enough to patient care to feel consequences personally, and senior enough to feel the institutional weight of policy decisions. A Residency Director sits exactly at that intersection.

Layer
What the player does
Why it matters
Clinical
Consults patients directly and reviews AI-generated visit notes before signing off
Maintains personal stakes even as responsibility scales. You feel it when you get it wrong.
Supervisory
Reviews residents' AI-assisted documentation for accuracy and completeness
Introduces the second-order problem: trusting that your staff has protocols to catch what AI might miss.
Administrative
Sets and responds to policy, handles memos, compliance documentation, incident reports
Makes institutional consequences feel earned rather than imposed. Decisions have weight beyond the patient room.
Core Mechanics

Every mechanic exists to create a specific kind of pressure.

The game loop is designed so there is no obviously correct path. Every question is clinically reasonable. The tension comes from triage judgment under time pressure, not from identifying a right answer.

Mechanic 01

Patient conversations with a question budget

Players consult patients through a dialogue tree but cannot ask everything. What they learn is the only firsthand knowledge they will have before reviewing the AI's version. Choosing what to pursue under time pressure is the core skill, and there is no obviously correct path.

Mechanic 02

AI note review: accept, edit, or override

After the conversation, players see the AI-generated visit note alongside a confidence score. The note is usually accurate. Occasionally it contains a subtle error that only careful reading catches. Each choice has resource and time implications.

Mechanic 03

Resource decisions with real costs

Tests consume budget. Observation beds block capacity for other patients. Unnecessary orders may trigger a Chief intervention. Every clinical decision has a cost, which is what makes the question of when to trust the AI a genuine dilemma, not an obvious one.

Case Design

Every case was designed to teach something.

Every case targets a specific failure mode that AI systems actually exhibit in practice. The difficulty is not arbitrary. The error typology came directly from what clinicians told us, mapped onto the three ways AI tends to mislead rather than obviously fail.

Confident Hedge

Correct but incomplete

AI gets the main diagnosis right but buries a critical risk as an afterthought. Sounds thorough. Misses what matters.

Omission

Correct given what it knew

AI is not wrong. It just doesn't know what wasn't asked. The error is in what the system didn't prompt for, not what it said.

Anchoring

Confidently wrong

AI anchors to a plausible diagnosis and actively explains away contradicting signals. The note sounds complete and confident. It isn't.

One orienting case. One trap.

AS
Alexander Smoots
Male, 27 · Compromised immune system
91% confidence
Confident Hedge

The nurse note mechanic

Before this case opens, a nurse note arrives in the inbox: "Mr. Smoots mentioned he hasn't had a tetanus shot in years when I was taking his vitals." The player decides when to read it. The AI note buries the tetanus risk as an afterthought. The player has to actively connect the nurse's observation to the note and order the vaccine.

If missed: Smoots returns in week 2 with early tetanus symptoms. Seeds the consequence system.
BH
Bartholomew Hasselbach
Male, 20 · No prior conditions
94% confidence
Anchoring: Trap Case

The highest-stakes case in the demo

AI calls bronchitis at 94% confidence. Dismisses cardiac risk because of patient age. HR 102 explained away. No nurse note. No guidance. The note sounds complete and reasonable. The player is entirely on their own. HR 102 with exertional shortness of breath in a 20-year-old warrants cardiac workup. The AI does not see it.

If accepted: Bartholomew readmitted week 2, myocarditis confirmed, condition worsened. The demo's gut-punch moment.
The Consequence System

Decisions don't resolve immediately. They come back.

Real clinical accountability doesn't arrive in the moment. A patient returns days later. A memo arrives the next morning. A policy change is triggered by a pattern of decisions across cases. The consequence system was designed to mirror that timeline and to make the learning feel like something that happened to you, not something you were told.

Demo Structure
Day 1

Decision made

Player accepts, edits, or overrides the AI note. Clinical resources allocated. Case signed off.

Same day

Quiet signal

A lab result. A routine follow-up. Something that doesn't demand attention but rewards the player who notices.

Day 2

Consequence surfaces

Readmission alert. Memo from the Chief. An incident report. The connection to the earlier decision is traceable, but the player has to make it.

Ongoing

Pattern emerges

Policy changes. Audit flags. Institutional responses to patterns across cases. Decisions accumulate into a record the player has to answer for.

The Richard Radford case is the demo's clearest example. He presents with food poisoning on Day 1. AI is technically correct. A detailed player orders an ECG given his cardiac risk profile. Most don't. End of Day 2: alert fires. Radford is back in the ER with chest pain. If the ECG was ordered, it was caught early. If not, a cardiac event is developing. The gut-punch moment the demo is built toward.

Design Screens

Familiarity by design. Discomfort by intention.

The interface is designed to feel like a real hospital system, not a game. The visual language is clinical and restrained. The AI confidence score is always visible but never the loudest element on the page. The player has to choose to look at it carefully.

The world before a case opens

The queue and message panel sit side by side. Clinical cases on the left, communications on the right. The split is intentional. Critical context can arrive through either channel, and deciding when to check messages is itself a decision the player has to make.

Where the tension lives

Patient intake on the left, AI-generated note on the right. The layout does not tell the player which to trust. The confidence score is visible but not prominent. Accept and Sign sits alongside Override with equal visual weight. The interface withholds judgment on purpose.

Where It Stands

The design is solid. The build is evolving.

Second Opinion is actively in development. The current build is exploring a simplified version of the case system, focused on healthcare students practicing AI trust calibration through symptom-based cases. The full consequence system and institutional layer described in this case study represent the design vision, some of which will ship in later iterations.

🔬

Also part of ongoing research at UC Davis

This project is developed in collaboration with the AI Game and Governance Lab at UC Davis. The simulation framework is being explored as a research tool for studying how clinicians make decisions when AI is involved. Insights that can help shape policy, guidelines, and future training.

Learnings

What this project taught me about designing for behavior change.

Glad we could cross paths.
I hope it left you with a bit of curiosity.