Open-source AI in regulated EU: 4 diligence vectors

Open-source AI is the only defensible foundation for regulated European deployments — and 'open' is not a license question. It is a four-vector procurement decision: weights access, training-data provenance, hosting topology, and downstream provider liability after fine-tuning. Every SERP-leading article on this topic stops at the AI Act's open-source exemption and calls the question settled. It is not.

The moment a hospital, bank, or ministry fine-tunes Apertus, Mistral, or Llama on internal data, it inherits full GPAI provider obligations and the carve-out evaporates. This post gives compliance officers and CIOs the diligence pass that decision actually requires — vertical by vertical, model by model, with the documentation burden priced in before the procurement memo gets signed.

The open-source exemption does not travel into deployment

The AI Act entered into force on 1 August 2024 with a phased rollout through 2 August 2027, and its GPAI obligations became effective in August 2025 ^[3]^[6]. The open-source carve-out everyone cites in procurement decks is conditional: open-source GPAI providers remain subject to some obligations, and open-source GPAI models with systemic risks are not exempt from any of them ^[3].

The exemption attaches to the act of releasing the model under a free and open-source license. It does not travel with the weights into a regulated deployment. A hospital that downloads Apertus, a bank that pulls Mistral, a ministry that runs Llama — none of them are 'using open-source AI' in the regulatory sense once those weights enter a high-risk Annex III workflow. They are deploying an AI system, and the AI system has its own obligations regardless of how the underlying weights were licensed.

The provider definition reaches any natural or legal person, public authority or body that develops or has an AI system or GPAI model developed and places it on the market under its own name — including third-country providers whose output is used in the EU ^[3]. Open weights do not change that calculus. They change the supply chain you are sitting on top of.

Four vectors that actually decide deployability

License is the least interesting axis. The four vectors that determine whether an open model is deployable in a regulated stack are weights access, training-data provenance, hosting topology, and post-fine-tune liability. Score the same candidates — Apertus, Mistral, Llama, Qwen, Kimi K2 — against those vectors and the shortlists for healthcare, finance, legal and public-sector procurement diverge sharply.

Weights access is binary: either you can load the file into your own inference runtime, or you are calling someone else's API. Training-data provenance is graded. Switzerland's Apertus release published weights, training data, and intermediate checkpoints, with training conducted under Swiss data-protection rules, Swiss copyright law, and EU AI Act transparency requirements, on a dataset filtered to remove personal data and honour website opt-outs ^[1]. Mistral and Llama publish weights but not the underlying corpora. Qwen and Kimi K2 publish weights from non-EU jurisdictions with limited public provenance documentation. These are not equivalent procurement objects.

Hosting topology is the vector most often muddled. An 'open' model running on a hyperscaler inference endpoint is, from a DORA or NIS2 perspective, a third-party ICT dependency — not a sovereign asset. Air-gapped deployment on a sealed WaveNode appliance is a different compliance posture than the same weights served from a US-headquartered cloud. Post-fine-tune liability is the vector almost nobody prices in: the customer becomes a downstream GPAI provider the moment they materially modify the model, and the documentation obligations follow.

Healthcare: EHDS and MDR override the AI Act default

Hospitals and medtech vendors do not get to optimise against the AI Act in isolation. The European Health Data Space and the Medical Device Regulation sit on top of it, and they dictate that only models with auditable training-data summaries and on-premise inference qualify for clinical workflows. The Commission's SHAIPED project, launched in March 2025, pilots AI models on HealthData@EU infrastructure under EHDS precisely because the data plane for clinical AI cannot rely on generic hyperscaler endpoints ^[8].

Apply the four-vector test to a clinical decision support shortlist and most of the field falls away. Llama and Mistral can be deployed air-gapped, but their training-data provenance is opaque — a problem when MDR clinical evaluation expects you to describe the inputs to any device that informs diagnosis or treatment. Qwen and Kimi K2 carry jurisdictional baggage hospital legal teams will not absorb. Apertus-class fully-open models, or carefully fine-tuned Mistral or Llama forks running on a WaveNode appliance inside the hospital perimeter, are the only configurations that survive the diligence pass.

The Commission's July 2025 template for GPAI training-data summaries asks for an overview of data sources including large datasets and top domain names, plus data-processing information to help rights-holders exercise EU-law rights ^[6]. For a hospital fine-tuning on pathology reports or radiology notes, that template is now the customer's problem, not the model vendor's.

Finance: DORA quietly rules out most hosted open-model APIs

DORA treats every external inference endpoint as an ICT third-party dependency requiring exit plans, concentration-risk analysis, and contractual audit rights. The consequence is counterintuitive: a Tier-1 European bank running Mistral via a hyperscaler inference API is in a worse regulatory posture than the same bank self-hosting the same weights on EU sovereign infrastructure. Open weights, hosted by a third party, are still a third-party dependency. Open weights, hosted in your own perimeter, are not.

This is where the 'open-source AI' conversation collapses into the hosting-topology vector. The bank's DORA register does not care whether Mistral is open-weight; it cares whether the inference provider is in scope as a critical ICT service provider, whether exit is feasible inside the regulatory timeline, and whether the audit clauses exist. Self-hosted open weights on sovereign infrastructure answer those questions in one move. Hosted APIs — open-weight or not — re-open them.

MiFID II record-keeping adds a second constraint: every model output that informs a regulated activity must be reconstructable. That forces the architecture toward retrieval-augmented generation with citation tracking and full audit logging — the pattern WaveOps Enterprise implements on top of self-hosted open weights.

Public sector and legal: Annex III demands a citation trail

Administration-of-justice and public-administration deployments are Annex III high-risk by default under the AI Act, alongside critical infrastructure and law enforcement ^[3]. Open weights are necessary here and profoundly insufficient. Annex III obligations include transparency to affected individuals, content labelling for generative outputs in machine-readable form, and documentation that survives a national authority audit ^[3].

The engineering implication is concrete: outputs must carry document-level citation trails tied back to source paragraphs, page numbers, and document revisions. Closed APIs cannot deliver this without exposing the regulated content to the API provider — which a ministry or court will not authorise. The architecture that survives is RAG plus citation tracking on top of a self-hosted open model. WaveOps implements exactly this: every answer references the exact source documents the model retrieved, with revision metadata captured for the audit trail. NEXUS, in production at ELES, runs the same pattern on critical-infrastructure data.

eIDAS signature requirements, national language coverage, and judicial sector data-localisation rules layer on further constraints that most non-European open models do not address out of the box. The shortlist for public-sector deployment is consequently narrower than the open-weight market suggests: an EU-trained or EU-fine-tuned open model, on EU sovereign hardware, with a citation-tracking application layer wrapped around it.

Fine-tuning flips you from deployer to provider

The single most expensive mistake in regulated open-source AI adoption is assuming the enterprise remains a deployer after fine-tuning. Under the AI Act's provider definition, anyone who places an AI system or GPAI model on the market under their own name is a provider — and substantive modification of an existing model triggers that status ^[3]. The carve-out that protected the original open-source release does not protect the fine-tuned derivative.

Once the enterprise is a downstream provider, the July 2025 training-data template applies to its model, not just to the upstream one ^[6]. That means a public summary covering source overviews, large datasets, top domain names, and data-processing information — for the internal corpus the bank or hospital used to fine-tune. Most regulated organisations have never produced a document like this for an internal asset, and the legal review cycle to do so is measured in months.

Price the downstream provider documentation burden into TCO before model selection, not after. For most use cases, RAG over a base open model — without fine-tuning the weights — is the better posture precisely because it avoids the provider flip. The enterprise stays a deployer, the upstream provider keeps the GPAI obligations, and the citation trail handles the audit requirement. This is why WaveOps defaults to retrieval-grounded answers rather than mandatory fine-tuning.

The four-step diligence pass to run before procurement

Run every candidate — Apertus, Mistral, Llama, Teuken, Qwen, Kimi K2 — through the same sequence. Step one: classify the use case against Annex III and the relevant sectoral regime (MDR, DORA, eIDAS, judicial). The classification determines whether weights access alone is enough or whether citation trails and on-premise inference are mandatory.

Step two: verify training-data provenance against the Commission's July 2025 template ^[6]. A model whose upstream provider cannot or will not publish a compliant summary is a model whose risk you inherit on first deployment. Step three: fix hosting topology before negotiating commercials. Sovereign cloud, on-premise GPU, or air-gapped WaveNode appliance are different compliance objects with different DORA, NIS2 and EHDS implications. Treat this as a binary architectural decision, not a deployment detail.

Step four: pre-commit to provider artifacts if fine-tuning is on the roadmap. Draft the training-data summary, the copyright policy, the evaluation harness, and the monitoring plan before the first fine-tuning run, not after the first audit letter. Buyers who run this four-step pass converge on a narrow architectural pattern: open weights, sovereign hosting, retrieval-grounded answers, citation trails, and an air-gap switch the compliance officer can flip without filing a change request with a US cloud provider.

The European regulated market will not be won by whoever has the most permissive open license. It will be won by whoever can hand a compliance officer the training-data summary, the citation trail, and the air-gap switch in the same afternoon.

Book a WaveNode diligence session: bring your shortlist, leave with the four-vector scorecard — https://wavenetic.com/enterprise-ai-on-premise