Latitude is an observability and quality platform for AI agents. It's built for developers who already have AI in production and need a structured way to find, track, and fix failure modes before users hit them.
Most observability tools give you logs and traces. That's useful, but it doesn't answer the question that actually matters: what's going to break next, and why? Latitude is built around that question.
Annotation queues surface the most suspicious production traces for human review, prioritized by anomaly signals. You're not reviewing everything — just what's worth your attention.
Once you find a failure mode, you promote it into an issue. Issues have states, so you can track a problem from first sighting through annotation, eval creation, fix, and verification. The full lifecycle in one place, not scattered across Slack threads and spreadsheets.
From those issues, Latitude uses GEPA (Generative Evaluation from Production Annotations) to automatically generate evaluations. No writing evals from scratch. As your team annotates more traces, the evals refine themselves over time.
Those evals run in CI or on a schedule against your curated dataset. Latitude supports rule-based evals (assertions, regex, schema validation) and LLM-as-judge evals. It also measures eval quality over time using MCC, so you know whether your evals are actually catching what they should.
The result is a closed loop: production trace → human annotation → auto-generated eval → quality measurement. Regression testing tells you what broke in the past. Latitude tells you what will break next.
Works with any framework. Integrates with LangChain, CrewAI, OpenAI Agents, LiteLLM, LlamaIndex, and more.