Kimi K2 is not a vLLM problem. It's a sovereignty, MoE-reliability, and license-review problem — and here's the framework no vendor guide gives you.
Kimi K2's 256K context and 200-step tool stamina reshape enterprise RAG — but only if you treat them as a retrieval control plane, not prompt-stuffing.
Workload-classification rubric, GPU break-even math, and the CISO provenance checklist for choosing between self-hosted Kimi K2 and GPT-5 in EU regulated industries.