FinOps

Hybrid AI Architecture & Self-Hosted vLLM | SLM Playbook

← Series hub ← Previous | Next → In the early phase of the AI wave (2023-2024), the default architecture for most startups and enterprises was API-Centric: routing every single request to OpenAI’s GPT-4 or Anthropic’s Claude. While highly convenient for proof-of-concept (PoC) phases, this model rapidly falls apart under production loads when encountering two massive walls: data privacy regulations and astronomical operational costs. By 2026, the rise of Small Language Models (SLMs) ranging from 2B to 14B parameters has dramatically shifted the landscape. Models such as Microsoft’s Phi-4 (14B), Qwen 2.5/3.5 Coder (7B/14B), and Llama 3 8B, when properly fine-tuned, achieve performance close to—or even exceeding—commercial frontier models on domain-specific, narrow tasks. ...

Tech Radar 10/07: Cloud-Native AI Architecture — Envoy Gateway, K8s Inference Extension & Dapr Agents

Answer-first: In 2026, Platform Engineering for AI is no longer about picking the right LLM framework. The real questions are: Who controls token cost? Who routes traffic intelligently to the right GPU pod? Where does agent state go after a crash? Three CNCF projects — Envoy AI Gateway, the K8s Gateway API Inference Extension, and Dapr Agents — are converging to answer those questions at the infrastructure layer, so application code doesn’t have to. ...

Tech Radar 06/07: Edge AI, Liquid Neural Networks & WasmEdge on K3s

Answer-first: AI doesn’t have to run on massive GPU clusters in the Cloud. The combination of ultra-lightweight Liquid Neural Networks (LNNs) and the WebAssembly runtime WasmEdge on K3s delivers a cutting-edge Edge AI architecture — one that directly solves the two biggest enterprise challenges: Cloud costs (FinOps) and Data Privacy. Liquid Neural Networks (LNN): AI Without a GPU Answer-first: Unlike heavy Transformers, LNNs process information using continuous-time dynamical equations. The Closed-form Continuous-time (CfC) variant eliminates the costly ODE solver entirely, enabling inference to run directly on the CPU of an Edge node like a Raspberry Pi. ...

Part 0: Executive Summary — How Amazon Prime Video Saved 90% on Infrastructure

Part 0: Executive Summary â€” How Amazon Prime Video Saved 90% on Infrastructure Costs In the tech industry, Serverless architecture and Microservices are often hailed as the ultimate solutions for infinite scalability. However, this infinite scalability comes with massive hidden FinOps risks when traffic crosses a critical tipping point. This article synthesizes a real-world report from the engineering team at Amazon Prime Video, along with restructuring stories from Segment, Pinterest, and 37signals, to demonstrate the cost-optimizing power of the Monolithic Architecture. ...

Part 2: FinOps Cost Reality - The Hidden Tax of Microservices

Part 2: FinOps Cost Reality - The “Hidden Tax” of Microservices One of the most appealing promises of Microservices is lean Auto-scaling capability: “Only spin up servers for the service under load.” Theoretically, this saves cloud costs. However, when contrasted with the reality of cloud cost management (FinOps), companies discover the exact opposite: Microservices architectures are often many times more expensive than Monoliths. This discrepancy doesn’t stem from actual Compute capacity, but from the “Distributed Tax” â€” hidden costs incurred merely to maintain communication and monitoring between isolated components. ...

Tech Radar, April 17, 2026: GitLab Pushes Agentic DevSecOps Toward Operability, Cost Control, and Stronger Reasoning

The selected items for pipeline run 31 all point to the same strategic arc inside GitLab: the company is trying to turn AI-assisted software development from an experimental productivity layer into a governed, operationally credible platform capability. After fetching and reading the full source content directly from the original URLs, three themes stand out. First, GitLab is extending AI beyond code generation into delivery bottlenecks that developers and platform teams actually live with every day. Second, it is wrapping that expansion in explicit cost controls, which is critical if AI is to move from pilot usage to enterprise rollout. Third, it is strengthening the model layer underneath the platform so agents can handle more complex, multi-step workflows with less supervision. ...