A private LLM is a large language model that runs entirely inside an organization's own infrastructure, its own cloud account, or a fully isolated environment it controls, rather than through a shared third-party API. Prompts, outputs, and any data used for fine-tuning stay inside a boundary the organization defines and can audit. Nothing about normal operation sends that data to an external vendor's servers.
You need a private LLM on your own infrastructure when at least one of three conditions holds. The prompts or documents you would send to a model are regulated or commercially sensitive. Your compliance function needs a full, inspectable record of where data goes and who can access it. Or your contracts, sector rules, or client agreements prohibit sending certain categories of data to any third party, however reputable.
What actually counts as 'private'
'Private' is a spectrum, not a single setup. At one end, a model runs on hardware inside your own building, with no external network path at all. In the middle, a model runs in a cloud account you control, inside a network you configure, where the AI vendor never receives your prompts. At the far end, a 'private' label is attached to a shared API with contractual promises about data handling - which is not the same guarantee as infrastructure you actually control.
When a shared API is good enough
A shared, third-party API is a reasonable choice for public information, internal tools with no regulated inputs, and early-stage prototypes where speed matters more than infrastructure control. Most consumer-facing chat assistants and general writing tools fall into this category. The moment a prompt could contain a patient record, a contract under NDA, or personal customer data, that calculation changes.
What running your own LLM actually requires
Running a model on your own infrastructure means owning the deployment: the hardware or private cloud environment, the model weights, the serving stack, and the monitoring around it. It also means someone in the organization is responsible for keeping the model current, since there is no vendor silently upgrading it behind an API. This is real operational work, which is why it is worth doing only when the sensitivity of the data justifies it.
Organizational memory changes the calculus further
A private LLM connected to an organization's own documents, policies, and past decisions becomes more useful, and more sensitive, than a generic model. That connection is also exactly what regulated organizations want: a model that knows their business without that knowledge ever leaving their control. Building that safely is an infrastructure problem as much as a model problem.
See how this shows up in practice: OrgBrain
