The Enterprise AI Software Factory: Eight Control Points Or It Doesn't Survive Its First Audit

An enterprise AI software factory is not a platform you buy or a velocity metric you chase. It is a governance-first operating model whose value is measured in auditable control points per merged change, not pull requests per hour. Every other definition currently circulating in the market is downstream of a vendor's revenue model, and all of them quietly skip the question that decides whether the factory survives its first audit: which specific human approvals, artifacts, and provenance records gate each stage from intent to production.

The rest of this post is the control-point map. Eight lifecycle stages, the artifact each must produce, the role that signs off, and a measurement discipline that holds up under regulator scrutiny. Use it as the rubric for any factory build-or-buy decision. In regulated industries, the ungoverned factory is just a faster way to manufacture liability.

The category is being defined backwards on purpose

Two distinct things are being sold under the same banner, and the conflation is deliberate. One is an AI factory in the infrastructure sense — GPU orchestration, model serving, accelerated compute. NVIDIA AI Enterprise is the canonical example, packaging microservices, frameworks, and libraries with GPU orchestration into a commercial stack ^[3], and Supermicro extends the same logic into turnkey rack-scale hardware ^[5]. The other is an AI software factory — an operating model for the SDLC in which agents capture intent, generate specs, write code, run tests, and produce deployment artifacts under human control. These are not the same product, and they do not solve the same problem.

Buyers who confuse them end up purchasing compute when what they need is a governed SDLC. The infrastructure layer is necessary — agents have to run somewhere — but no amount of GPU availability fixes a delivery process that cannot tell you who approved which prompt, against which spec, with which model version, on which retrieved context. The factory question is not tokens per second. It is whether the chain of artifacts behind every merged commit will hold up when an auditor asks to replay it.

Eight control points, or it is not a factory

A real enterprise AI software factory has eight non-negotiable gates. One: intent capture — a structured record of what was asked, by whom, against which business objective. Two: spec approval — a human-reviewed specification the agent is bound to implement. Three: an architecture decision record that pins the long-lived choices the agent is not permitted to revisit. Four: a context access policy that defines which repositories, documents, and data sources the agent may retrieve from for this task. Five: agent permissioning — which tools, which write scopes, which execution sandboxes, time-bounded. Six: automated verification thresholds covering tests, SAST, SCA, and characterization checks that must pass before a human ever sees the diff. Seven: human merge approval, with the reviewer's identity bound to the artifact bundle. Eight: deployment provenance — a signed record of model identity, prompt log, retrieved context, test results, and approver, attached to the deployed artifact.

Any stack missing one of these is a coding assistant in factory clothing. The distinction matters because agentic systems are categorically different from autocomplete: an agent can implement a feature, write the tests, run them, fix the failures, and document the result, shifting the human role from writing to steering ^[2]. Steering without control points is hoping. The eight gates are what convert agent autonomy from a productivity story into a defensible operating model.

Notice what is not on the list: dashboards, leaderboards, velocity charts. Those are observability, not control. A control point is a place where the factory stops if the artifact is missing or the signature is wrong. If your stack cannot block a merge for lack of an approved spec or a pinned ADR, you do not have eight gates. You have eight suggestions.

Speed multiples without baselines are procurement theatre

The vendor pitch is dominated by headline multiples. Factory.ai reports 7x faster feature delivery, 96.1% migration time reduction, and 95.8% time saved on on-call resolution ^[8]. itestra cites a 100,000-line C compiler developed autonomously with detailed tests in two weeks, and security and dependency updates auto-generated with an 86.5% acceptance rate ^[2]. NVIDIA claims up to 10x GPU availability, 5x utilization, and 20x workload throughput ^[3]. These numbers are not falsifiable as ROI claims without baselines: cycle time before, rework rate before, defect escape rate before, change failure rate before.

The honest measurement frame is cost per audited merge. That figure includes agent inference cost, human review minutes, rework loop count, the percentage of merges that produced a post-deploy incident, and the audit effort required to assemble evidence after the fact. A factory that ships 7x more code while doubling the change failure rate and tripling audit-prep burden is not faster — it is more expensive in the dimensions that matter to a CFO and a CISO. The ROI conversation begins by reading existing DORA metrics off the current SDLC and committing, in writing, to post-factory targets for each.

Brownfield is where most factories quietly fail

Greenfield demos make agents look brilliant. Brownfield is where they manufacture confident drift. Enterprises spend roughly 40% of their budgets on legacy maintenance ^[7], which means the realistic deployment surface for any factory is undocumented monoliths, mainframe-adjacent services, and business logic encoded in twenty years of patches no living engineer fully understands. Turn an autonomous agent loose on that codebase with a refactoring prompt and you do not get modernization. You get plausible-looking changes that quietly break behaviour the original code preserved for reasons no one can reconstruct.

The discipline that separates working brownfield factories from demos is mandatory characterization tests and pinned architecture decisions before any agent gets write access. Characterization tests capture what the system actually does today — not what the spec says it should do — and become the contract the agent cannot violate. ADRs lock the structural choices the agent is forbidden to relitigate. Undocumented business logic, long-lived architecture decisions, jurisdiction-specific compliance obligations, and a wider security attack surface when agents have execution rights across the codebase are precisely the risks broad agent autonomy amplifies ^[2]. The control points exist to close those exact gaps.

Compliance is an artifact problem, not a policy problem

GDPR, DORA, GxP, and Solvency II do not care about your governance slide. They care whether you can produce, on demand, the spec, the prompt log, the context-access record, the model identity, the human approval, and the data-residency proof for any given line of merged code. Compliance in an AI-driven SDLC reduces to whether the eight control points emit signed, timestamped, retrievable artifacts ^[2].

This is why the policy-document approach to AI governance fails. A policy that says "agents must be supervised" produces nothing an auditor can examine. A control point that refuses to merge unless a named reviewer has signed an artifact bundle containing the spec hash, the prompt transcript, the retrieved context manifest, the model version, and the test results produces an evidence trail by construction. Compliance stops being a quarterly scramble and becomes a property of the pipeline. The factory either emits the artifacts or it does not run.

The factory belongs on your infrastructure, not someone else's

An audit chain that depends on third-party cloud APIs is an audit chain you do not control. Every prompt sent to an external model is a context disclosure you cannot fully attest to. Every model version change is a provenance discontinuity you did not authorize. Every API outage is a delivery-pipeline outage you cannot resolve. For a factory whose entire value proposition is replayable, attributable, defensible change, that dependency is structurally incompatible with the goal.

The factory runs where your code, your context, and your approval records already live. Open-weight models whose identity and weights you pin and archive. Local GPU inference so retrieved context never crosses the perimeter. A runtime that operates air-gapped when the regulator or the security model demands it. Wavenetic builds exactly this stack — WaveNode hardware, runtime, open-weight models, RAG applications, and EU-based support delivered as a single on-premise system with citation tracking and audit trails built into the document layer. The factory pattern and the on-premise pattern are the same argument: provenance is only real if it never leaves your control.

Build-vs-buy is really primitives-vs-stack

Platform teams that already operate strong SDLC primitives — a mature CI system, signed artifact storage, identity-bound code review, policy-as-code, secrets management, an internal developer platform — should assemble the factory from open components. They have the substrate. What they need is the agent layer, the control-point definitions, and the artifact schemas. For these teams, a packaged factory is overhead they will route around within a quarter.

Everyone else should license a pre-integrated stack. Building eight control points from scratch while also operating a regulated business is not a reasonable use of a platform team's next two years. The selection criterion is not which vendor has the most polished dashboard or the largest reference logo wall. It is whether the control points are inspectable: can you read the artifact schema, can you export the audit bundle in a format your GRC team accepts, can you swap the model without rewriting the pipeline, can you run the entire stack inside your perimeter. Factory.ai's enterprise positioning emphasizes SSO/SAML, dedicated compute, and compliance features ^[8]; Opsera frames Forge around intent-aware, spec-based development with guardrails ^[7]. Ask either vendor the same question — show me the artifact your factory produces for a single merged change, and show me the role that signed each field.

The enterprises that win the next five years will not be the ones whose agents ship fastest. They will be the ones whose agents can be replayed, attributed, and defended on demand. Speed is a side effect of a well-governed factory. It is never the specification.

Book a WaveNode walkthrough: see the eight control points running on-premise, with citation tracking and audit bundles your GRC team can export — https://wavenetic.com