How to Choose an Enterprise AI Vendor When Sensitive Data Is in Scope

If sensitive data is in scope, the only defensible vendor choice is one whose architecture makes data exfiltration physically impossible — not one whose policies promise it won't happen. Everything else in the procurement file is decoration.

Read the average enterprise AI MSA and the entire security model collapses to a single primitive: trust the contract. For commodity workloads, fine. For regulated data, classified material, M&A files, source code, or anything competitively fatal if it leaks, contractual trust is the wrong primitive. The vendor's topology either keeps your data inside your perimeter or it doesn't, and no governance dashboard changes that fact. What follows scores vendors on architecture first and paperwork second.

Certifications are a gate, not a score — and procurement keeps scoring them

SOC 2 Type II, ISO 27001, and CASA are procurement table stakes precisely because they're easy to demand and easy to supply. Ninety-two percent of IT leaders make SOC 2 Type II or ISO 27001 mandatory when selecting AI platforms ^[1]. The bar is high enough to filter out hobbyists and low enough that every serious vendor clears it. The badge tells you the vendor has documented controls. It does not tell you whether your prompts, embeddings, and source documents physically remain under your control during inference.

Certifications attest to process maturity — change management, access reviews, incident response runbooks. They are silent on the question that actually matters when sensitive data hits a model: where does the payload go, who sees it in transit, and what happens to it after the response returns. Sixty-two percent of enterprises have already experienced AI-related data exposure incidents, often tied to inadequate input controls or unclear data-handling policies ^[1]. Those enterprises were, overwhelmingly, buying from certified vendors. The certificate did not prevent the exposure because the certificate was never designed to.

Weight attestations as a gate. They eliminate negligence. They do not establish architectural fitness for regulated workloads, and ranking vendors on them is how organisations end up shipping PHI through a multi-tenant inference endpoint with a green checkmark next to it.

Map data tiers to deployment topologies — feature checklists are the wrong instrument

PHI, attorney-client material, M&A working papers, export-controlled IP, and proprietary source code each impose a different minimum-viable deployment model. Treating them as a single "sensitive" bucket produces feature matrices that compare a multi-tenant SaaS with a no-training guarantee against an on-premise stack as if they were substitutable. They are not. 39.7% of enterprise AI use already involves sensitive data ^[2] — this mapping is the current default state of most deployments, and most of them are mismatched.

The workable tiering: Tier one — public marketing copy, generic code scaffolding — runs anywhere. Tier two — internal operations, non-regulated business content — tolerates single-tenant SaaS with strong contractual carve-outs. Tier three — regulated personal data, privileged legal material, customer financial records — requires a deployment where the vendor cannot read the payload even if compelled. Tier four — classified, export-controlled, or competitively existential data — requires air-gapped or VPC-isolated inference on infrastructure the customer controls, running open-weight models the customer can inspect. Permission-aware retrieval inside a vendor cloud is the wrong topology for tier four, regardless of how it's marketed ^[3].

Stop asking "does the vendor support our compliance regime." Start asking "which tier of data can this topology legitimately hold." A vendor that runs every customer through a shared inference cluster has one honest answer, regardless of how its governance UI is configured.

Six clauses decide whether your data is safe. Most MSAs fail at least three.

Enforceable contract language is the right scoring instrument. The non-negotiable list: training-data exclusion with audit rights, sub-processor change veto (not just notification), explicit prompt and response retention windows with deletion verification, model-routing disclosure so you know which model and which region processed which request, and jurisdictional commitments that survive corporate restructuring. A vendor that commits to all six in writing, with remedies, is a serious counterparty. A vendor that commits to two and offers a dashboard for the rest is selling marketing.

Audit rights are where most negotiations quietly collapse. Vendors will agree they don't train on customer data; far fewer will let a third-party auditor verify it, and almost none will accept liability tied to verification failing. Sub-processor veto is the second pressure point. Standard MSAs grant the vendor unilateral right to swap sub-processors with thirty days' notice — meaning the model provider, the vector database host, and the observability stack you diligenced at signing can all be different companies by your second renewal, and your only recourse is to terminate.

None of this is exotic legal work. It's the baseline a regulated buyer already applies to any data processor. The category moved faster than procurement maturity, and buyers accepted SaaS-era boilerplate for a workload class that demands more. Reverse that, and the shortlist contracts dramatically — which is the point.

Agentic AI breaks the identity model your vendor is selling you

SSO, RBAC, and permission-aware retrieval were designed for a world where a human asks a question and the system returns a passage. Agentic AI broke that model the moment agents started holding API keys, OAuth scopes, and service-account credentials to act on internal systems. The blast radius is no longer what a user can see; it's what an agent can do with the credentials it has been issued, often unattended, often in a loop.

An agent that can read from a CRM, write to a ticketing system, call a payments API, and post to a shared channel is a privileged service account with a language model attached ^[4]. Most AI vendors have no coherent answer for how those identities are scoped, rotated, logged, and revoked. "The user is authenticated" is not an answer when the user delegated three tool calls to an agent that then chained into a fourth system the user never knew existed. The identity layer is where AI velocity becomes uncontrollable ^[5].

Require, in evaluation: per-agent identity issuance, per-tool scope binding, full call-chain audit logs that survive vendor-side log rotation, and a revocation path that kills an agent's credentials without taking down the human users it acts on behalf of. Vendors that cannot demonstrate this on a whiteboard in fifteen minutes do not have an agentic security model. They have an agentic feature.

Run POCs that produce packet captures, not vendor assertions

A serious POC instruments the full sensitive-data lifecycle and proves the contract with packet captures and log inspection. The lifecycle runs roughly eight stages — discovery, classification, indexing, retrieval, prompt construction, inference, response logging, downstream analytics — and at every stage there is a boundary the contract claims the data will not cross. Confirm each boundary empirically.

The tests that belong in every evaluation plan: seed the corpus with canary documents containing unique, searchable tokens and watch outbound traffic for any appearance of those tokens; run permission-trimmed retrieval against a user who should not have access to a sensitive folder and confirm the model cannot be coaxed into citing it via prompt injection; submit prompts containing synthetic PHI and inspect the vendor's retention logs to verify the deletion window matches the DPA; force model routing across regions and confirm the audit log records which model in which jurisdiction served each request. Organisations implementing AI with proper security frameworks before deployment experience 73% fewer data breaches than those bolting protection on afterwards ^[1].

If the vendor cannot produce the logs that let you run these tests, that is the finding. A vendor whose architecture does not generate inspectable evidence is asking you to substitute their word for verification, which returns you to the trust-the-contract primitive the exercise was meant to escape.

The vendor you signed with is not the vendor you'll have at renewal

Acquisitions consolidate the category every quarter. Sub-processor swaps happen on thirty-day notice. Model providers change underneath the platform you bought. Jurisdictional posture shifts when a US parent acquires a European subsidiary or vice versa. Any one of these events silently invalidates the assurances in your DPA, and the standard remedy — terminate and migrate — is operationally unavailable once the platform is embedded in workflows that thousands of employees depend on.

This is the structural argument for architectures that don't depend on vendor behaviour staying constant. If your inference runs on your own GPUs against open-weight models you have copies of, an acquisition of your vendor changes your support contract and nothing else. If your retrieval index lives on your storage, a sub-processor swap is a vendor problem, not your problem. If your audit trail is generated by infrastructure you operate, a change in the vendor's logging policy cannot retroactively erase evidence you already hold. Architectures that survive a five-year horizon are the ones whose security properties are determined by topology, not by the continued goodwill and corporate stability of a third party.

On-premise isn't the expensive option anymore — it's the cheap one

The assumption that on-premise AI is the expensive choice is a holdover from an era when running models locally meant building from scratch. Price the real cost of the SaaS alternative when sensitive data is in scope: single-tenant premiums, legal review cycles per renewal, ongoing vendor security monitoring, breach exposure that scales with corpus size, DLP overlays bolted on to compensate for architectural openness, and the operational tax of governance dashboards that exist because the underlying topology cannot be trusted. Seventy-two percent of enterprises are now running generative AI in production or core operations ^[1] — these costs are recurring line items, not pilot-budget rounding errors.

Running open-weight models on your own GPUs with citation-grade retrieval has become unremarkable engineering. The hardware amortises. The models are downloadable. The retrieval stack is well-understood. What you give up is the convenience of a vendor-hosted endpoint. What you get back is a deployment whose security properties are determined by where the wires go, not by what the contract says — and which strips most of the line items above out of the five-year TCO entirely. For organisations whose sensitive data is actually sensitive, this is the economically rational choice.

The vendors who win this decade won't be the ones with the longest compliance page. They'll be the ones who never needed your data to leave the building in the first place.

Book a Wavenetic architecture review for your tier-three and tier-four workloads — https://wavenetic.com