Open-Source LLM Observability & Evaluation | Generated by AI

Home 2026.06

Question: What is Langfuse?

Answer:

Langfuse is an open-source LLM observability and evaluation platform. Think of it as the “Datadog for LLM apps” — it gives you tracing, evals, prompt management, and dataset tooling in one place.

Core primitives

Trace
 └── Span (generation, retrieval, tool call, etc.)
      ├── input / output
      ├── latency
      ├── token usage + cost
      └── metadata / tags

What it actually does

Feature What it gives you
Tracing Full visibility into multi-step agent/chain calls
Prompt management Version-controlled prompts, A/B test variants
Evals LLM-as-judge, human annotation, custom scoring
Datasets Curate input/output pairs for regression testing
Cost tracking Per-model, per-user, per-feature spend
Analytics Latency p50/p95, error rates, token distribution

Quick integration (Python)

from langfuse import Langfuse
from langfuse.decorators import observe, langfuse_context

langfuse = Langfuse()  # reads LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY from env

@observe()
def my_pipeline(user_query: str) -> str:
    # auto-traced as a span
    response = call_llm(user_query)

    # attach a score inline
    langfuse_context.score_current_observation(
        name="quality",
        value=0.9,
    )
    return response

Or lower-level SDK:

trace = langfuse.trace(name="rag-pipeline", user_id="user-123")
span = trace.span(name="retrieval")
generation = trace.generation(
    name="llm-call",
    model="claude-sonnet-4-6",
    input=[{"role": "user", "content": "..."}],
    output="...",
    usage={"input": 512, "output": 128}
)

Self-hostable

# docker-compose in ~5 minutes
git clone https://github.com/langfuse/langfuse
cd langfuse
docker compose up

Postgres backend, Next.js frontend. You can run it on your Ubuntu workstation at <IP_ADDRESS>.

Why it matters for your stack

Given you’re running Claude, DeepSeek, MiMo across CLI agents and banking AI systems — Langfuse lets you:

It integrates natively with LangChain, LlamaIndex, OpenAI SDK, and raw anthropic SDK. For your custom CLI agents (ww, iclaw, zz), the decorator pattern is the fastest path to instrumentation.

References:


Back Donate