A demo that turns a spoken sentence into a tidy vitals note takes an afternoon. An AI feature a nurse trusts at 2 a.m., on her fourth admission, over a flaky 4G connection, with a resident who switches between Thai and English mid-sentence — that takes the other eleven months.
The distance between those two things is almost never a better model. It's everything around the model. Here's what I keep relearning building EasyChart, the AI charting layer inside CloudNurse, our EHR for nursing homes and rehab centers across Southeast Asia.
01 Start with the workflow, not the model
Most clinical "AI" tasks aren't open-ended reasoning problems — they're workflows with a couple of decision points. Charting a Barthel Index isn't "be creative," it's "map this conversation onto ten well-defined fields." The moment you frame it that way, you can constrain the model, validate its output, and actually reason about failure. Reach for an autonomous agent only when you genuinely can't predefine the path — which, inside an EHR, is rarer than the hype suggests.
02 The model proposes, the clinician disposes
Nothing the AI produces should land in the chart on its own. EasyChart writes to a draft, never the record. A human reviews, edits, and commits. This isn't a UX nicety — it's the entire safety model. It keeps a person accountable for every clinical claim, it gives you a clean correction signal to learn from, and it turns a hallucination into an annoyance to fix rather than an incident to report.
03 Structure is the product
The valuable output of clinical AI isn't prose — it's discrete, validated fields. So don't let the model free-write into the record. Constrain generation to your schema: enums for assessment scores, ranges for vitals, required units. When the model is unsure, the field stays empty and flags for review instead of inventing a plausible number. Empty-and-honest beats full-and-wrong every single time.
04 Design for the unhappy path first
The demo always uses clean audio and textbook English. Production is a TV in the background, a phone held too far away, and a resident code-switching mid-sentence. Build for that reality: graceful fallback to manual entry, partial capture that keeps whatever it got, and one hard rule — the AI can degrade, but it must never block care. If the model is down, the nurse charts the way she always has. The feature is additive, or it's gone.
05 Treat PHI like plutonium
You're handling some of the most sensitive data there is, under Thailand's PDPA and the quiet trust of families. That shapes architecture, not just policy. Hard tenant isolation, so one facility's data can never bleed into another's. Redaction before anything crosses your boundary. An audit trail that logs every AI suggestion, who reviewed it, and what they changed — both for compliance and because that record turns out to be gold for evals. And decide retention on purpose; "keep everything forever" is a liability, not a strategy.
06 Evals before features
Before I add a capability, I want a golden set: a few hundred real, de-identified encounters with known-correct fields. Then you can measure what actually matters — field-level accuracy, not vibes — and put a regression gate in CI so a prompt tweak that quietly tanks Barthel scoring never reaches a resident. In healthcare, "it seemed better in testing" is not a sentence you want to defend later. This is the same spirit as building effective agents: measure the system, not the magic.
07 Latency and cost are clinical UX
A nurse will not wait eight seconds at a spinner; she'll go back to typing and never open the feature again. Speed is a feature. Stream output so something appears immediately. Use the smallest model that passes your evals for each step — voice cleanup, field extraction, and summarization rarely need the same horsepower. Cache aggressively. The cheapest, fastest path that clears your accuracy bar is the right one; spend the big model's latency only where it earns it.
08 It's a systems problem, not a model problem
Put it together and the lesson is boring and freeing: the hard parts of clinical AI have almost nothing to do with the model. Boundaries, fallbacks, review states, audit, evals, latency. Get those right and a mid-sized model feels like magic. Get them wrong and the best model on the leaderboard still produces an incident. Build the system that has the nurse's back — and the AI gets to be the thing in her corner instead of one more thing to babysit.