Case study · Applied AI · Agents & evals

Running an AI agent system like production

A local LLM agent wired to a real knowledge base — and the eval suite, logging, and failure handling it took to make it dependable.

This case study is being written alongside the build. It covers the architecture, the eval suite that catches regressions before they matter, and an honest log of what broke in the first weeks of operation.

Back to all work.