PHI and AI: The Better Question Is Not Cloud or Local
- Vincent Saturnino
- 5 days ago
- 2 min read
A lot of healthcare AI conversations get framed as a simple choice: should patient data go to a cloud model, or should everything stay on-premise?
That is the wrong starting point.
The better question is: what task are we asking the model to do, what data does it need, and what controls exist around that data?
A model summarizing a full patient chart has a very different risk profile than a model extracting a medication name from a single note. A health system with an existing cloud compliance program is in a different position than a clinic that wants everything inside its own environment.
There are really three practical paths.
The first is a frontier model accessed through a compliant API or managed cloud service under a signed BAA. This is often the best option for high-value, lower-volume work where output quality matters. Prior authorization, claims appeals, clinical summaries, and complex chart review are good examples.
But the BAA detail matters. A vendor may have a HIPAA-eligible product, but that does not mean every product surface is covered. Consumer chat tools, playgrounds, beta features, and default developer tools may not be included. The compliant path is usually a specific enterprise API, a specific cloud configuration, or a hyperscaler service where HIPAA is explicitly enabled.
The second path is running open-weight models locally on compact AI hardware. This can be attractive because patient data stays inside the organization’s walls. It can work well for prototyping, internal workflows, and moderate-scale production. But teams need to be honest about the hardware. The headline compute numbers do not always translate into fast generation speed. Memory bandwidth, model size, concurrency, and operational support matter more than the marketing spec sheet.
The third path is using smaller distilled or quantized models locally. This is often the most practical local option. A smaller model can be fast, private, and good enough for focused work like document classification, de-identification, structured extraction, message triage, and routing. It may not reason as well as the largest frontier models, but it may be exactly right for repetitive healthcare operations tasks.
The mistake is trying to pick one architecture for everything.
A stronger approach is to segment by task.
High-complexity, high-impact tasks can go to a BAA-covered frontier model. High-volume, narrow tasks can stay local. Some workflows may use a hybrid cascade, where a local model redacts or structures information before a more capable model handles the reasoning step. Others may use an orchestration layer that routes requests based on sensitivity, task type, and audit requirements.
The real control point is not just where the model runs. It is the governance layer around the model.
Can staff accidentally paste PHI into a non-compliant tool?Can requests be routed based on data sensitivity?Can every inference be audited?Can the organization prove which model saw what data, under what agreement, and for what purpose?
A BAA is necessary. An air gap may be necessary. But neither one is sufficient by itself.
The practical healthcare AI stack will probably not be all cloud or all local. It will be a controlled routing system where different model types are used for different jobs, with clear audit trails and technical guardrails that prevent PHI from reaching the wrong place.
That is the architecture worth building.



Comments