Founder & CEO, Wavenetic
Denis founded Wavenetic to build AI products that actually run inside enterprise perimeters — banks, TSOs, regulated industrial operators, defence. Engineer first, CEO second. Writes about on-premise AI, multi-agent orchestration, OCPP systems, and what the EU regulatory frame actually means for AI procurement.
Kimi K2 is not a vLLM problem. It's a sovereignty, MoE-reliability, and license-review problem — and here's the framework no vendor guide gives you.
August 2, 2026 is when EU regulators gain enforcement powers over GPAI — and when downstream banks, insurers, and hospitals get the first knock, not the model vendors.
Kimi K2's 256K context and 200-step tool stamina reshape enterprise RAG — but only if you treat them as a retrieval control plane, not prompt-stuffing.
The on-premise AI vs cloud AI question isn't philosophical. Seven workload tests produce a deterministic placement verdict — no hybrid handwave required.
Workload-classification rubric, GPU break-even math, and the CISO provenance checklist for choosing between self-hosted Kimi K2 and GPT-5 in EU regulated industries.
Why the AI Act's open-source carve-out collapses the moment a regulated buyer fine-tunes — and the four-vector diligence pass that replaces it.
Question lists let vendors win on prose. Here is the weighted scorecard, evidence rules, and POC protocol your 2026 enterprise AI RFP actually needs.
AI-native transformation is an engineering program with five hard layers, not a culture exercise. Here is the architecture CIOs need to escape pilot purgatory.
Local GPU inference economics are decided by sustained utilization and workload shape — not GPU sticker price. Here's the threshold, the VRAM math, and the routing rule.
The Foundry-versus-on-prem debate is the wrong question. The right one is which AI workloads belong in cloud, Foundry Local, or sovereign on-prem — and how to architect the seam.
Gemma 4's April launch was a spec sheet. The May multi-token prediction update is what made on-prem inference production-viable for EU CTOs in 2026.
Enterprise local LLM inference is a concurrency and SLO engineering problem, not a GPU shopping problem. Here's the workload-sizing sequence that drives every downstream decision.