NEWSDK v0.6.1 with Ollama (local LLMs) and LangGraph tracing· npm install @spanlens/sdk

One line.
Every LLM call, observed.

Spanlens is an open-source (MIT) LLM observability platform that logs every OpenAI, Anthropic, and Gemini request in one line of code. Track cost, latency, and tokens; trace multi-step agent workflows; catch anomalies and PII; and recommend cheaper models with dollar-figure savings.

$ npx @spanlens/cli init

$ pip install spanlens

TypeScript · Python · Next.js, Node, Edge · MIT · self-hostable

Try live demo →How the CLI rewrites your code →

spanlens.io / requests

LIVE

All modelsLast 24hStatus: all+ filter12,481 events / 1h

ModelEndpointLatencyTokensCostStatusAge

claude-sonnet-4.5/chat1240ms2,104$0.03122002s

gpt-4o-mini/extract410ms612$0.00092005s

gpt-4o/summarize3440ms3,218$0.04822007s

gemini-2.0-flash/rerank180ms240$0.00012009s

claude-haiku-4.5/chat680ms984$0.001820011s

gpt-4o/classify2120ms1,840$0.027642914s

What you get

The lens. Not the weight.

Spanlens sits in front of your provider. No agents to run. No SDK to rewrite. One baseURLand you're done.

01$0.0021

Request log

Every call with model, tokens, cost, latency, and full body. Filter, group, export.

02−38%

Cost tracking

Per-request breakdown, daily rollups, budget alerts before you blow the month.

0312 spans

Agent tracing

Multi-step workflows as waterfall span trees. Find the one step that took 18s.

043.1σ

Anomaly detection

3σ deviations in latency or cost vs. your 7-day baseline, flagged on arrival.

05SSN · email

PII + injection scan

Regex detection on request bodies at log time. API keys auto-masked before storage; PII patterns flagged for review.

06−$412/mo

Model recommender

"Your gpt-4o calls look like classification, try gpt-4o-mini." With numbers.

070.82 avg

Evals

LLM-as-judge scores every response 0 to 1. Know if v8 is actually better than v7, not just cheaper.

08v7 vs v8

Experiments & datasets

Replay a fixed dataset across prompt versions and models. Quality, cost, and latency side by side.

091,204 users

User analytics

Per end-user and per-session cost, volume, and errors. Find the customer burning your budget.

Cost visibility

See the bill before it arrives.

Per-team, per-model, per-route cost. Daily rollups. Budget alerts by Slack or webhook. One place to answer “why did our OpenAI bill jump?”

gpt-4o$421.80$182.40−57%

claude-sonnet-4$189.40$192.20+1.5%

gemini-2.0-flash$21.40$24.10+13%

This month · projected

$2,481−$1,218

vs. last month. 3 model-swap suggestions pending.

APR 01APR 10APR 23 ← today

Agent tracing

Find the one span
that cost you 18 seconds.

Multi-step agents as waterfall trees. Critical path, cost attribution, and latency outliers, highlighted automatically.

● critical path · 78% of wall-clock in 1 span

● cost attribution · per LLM, per tool

● retry & error spans as first-class

trace_8812· support agent · 8.24s ·critical: summarize_tickets v7

agent.run

8.24s

└ classify_intent

520ms

└ kb_search

680ms

└ summarize_tickets · v7

5.8s · critical

└ llm.sonnet-4.5

5.4s

└ format_reply

480ms

The improvement loop

Don't just watch.
Improve.

Cost and latency tell you what happened. Spanlens tells you whether it got better. Capture real traffic into datasets, score it with an LLM judge or your own team, then run the next prompt version against it before you ship.

● Evals · LLM-as-judge scores, 0 to 1, per version

● Experiments · replay a dataset across versions and models

● Annotation · human review to build golden sets

● Playground · iterate on real inputs, compare side by side

experiment_241· support-reply · 320 cases ·winner: v8

VersionQualityCost / 1kp50

gpt-4o · v70.71$4.821240ms

gpt-4o-mini · v80.82$0.31410ms

gpt-4o-mini · v60.64$0.30430ms

v8 · +0.11 quality · −94% cost · same dataset

The product

One platform. One source of truth.

Every screen reads the same span store. Move from a cost chart to the exact failing request in two clicks.

Requests12,481 / 1h

Full body, headers, cost. Filter, group, replay.

Traces842 / day

Waterfall with critical path & retry spans.

Prompts24 · v7

Versioned library, diff, A/B, gradual rollout.

Anomalies3 open · high

7-day rolling baseline, z-score triggers.

Security48 masked

PII · secrets · injection · jailbreak detectors.

Savings$7.2k / mo

Swap, cache, trim. Ranked by evidence.

Users1,204

Per end-user and session cost, volume, and error rates.

Playground4 models

Test a prompt across models and versions on real inputs.

Evals0.82 avg

LLM-as-judge scoring per prompt version. Quality, quantified.

Experimentsv7 vs v8

Run a dataset through versions and models. Compare on evidence.

Datasets320 cases

Golden test cases built from real production traffic.

Annotation58 reviewed

Human-in-the-loop review to build labels and golden sets.

Works with

OpenAIsdk · azure

Anthropicsdk · bedrock

Googlegemini · vertex

Mistralsdk · api

TypeScript SDK@spanlens/sdk

Python SDKpip · 3.9+

LangChainjs · py

LlamaIndexjs · py

Vercel AI SDKjs

Self-hostable

Your data, your VPC.

Run Spanlens in your cluster with Docker Compose or a single binary. Prompts and completions never leave your network.

Self-host docs →docker-compose.ymlSingle binary

# one-liner · docker

docker run -d --name spanlens \

-p 3001:3001 \

-e SUPABASE_URL="https://..." \

-e ENCRYPTION_KEY="$(openssl rand -base64 32)" \

ghcr.io/spanlens/spanlens-server:latest

# → curl http://localhost:3001/health

Built for teams

Ship together. Stay audited.

Projects isolate workloads, roles and invitations manage the whole team, and an audit log records every change. Wire Spanlens into your stack with webhooks and alerts.

ProjectsRoles & invitationsAudit logWebhooksAlertsSaved filters

Pricing

Simple. Flat monthly.

Free while you're small. Flat monthly fee, not per seat. Self-host is free forever.

Free

$0/mo

·50K req / mo

·14 day retention

·1 seat

·All core features

·Community support

Start free →

Reasonable questions.

How does instrumentation work?

Swap the provider SDK for our drop-in. Same surface, same types. We record the full request and response on the wire, with no extra round-trip and no sampling by default.

What about latency overhead?

p99 overhead is under 3ms. Ingestion happens async in a worker. If we ever fail, your request completes anyway. Spanlens never sits on the critical path.

How do you handle PII?

PII detectors (SSN, credit card, email, IBAN, passport, etc.) run at log time and flag matches for review in the Security dashboard, without blocking the request. API keys that slip into prompts are auto-masked before the row lands on disk. For workloads where prompt bodies must not be stored at all, opt out per-call with X-Spanlens-Log-Body: meta.

Do you support OpenTelemetry?

Yes. OTLP/HTTP ingest and export. Your existing OTel tracing flows into the same span store; LLM spans get LLM-specific attributes on top.

What's the data retention?

Free is 14 days. Pro is 90 days, Team is 365 days. Enterprise & self-hosted are configurable, including unlimited.

Can I export my data?

Anytime. JSON, CSV, Parquet. Or pipe the raw stream to S3, BigQuery, or your warehouse via our sink connectors.

Can Spanlens tell me if a prompt actually got better?

Yes. Evals scores responses with an LLM-as-judge on a 0 to 1 scale, per prompt version. Pair it with Experiments to replay a dataset across versions and models, so you compare quality, cost, and latency on the same inputs before you roll out.

Does Spanlens work for a whole team?

Projects isolate workloads, roles and invitations manage access, and audit logs record every change. Team and Enterprise add Slack, webhooks, unlimited alerts, and SSO.

See what your app is saying.

30-second setup. Your first 50,000 requests are on us. Cancel anytime. There's nothing to cancel.

Start free →Read the docs

Install

It's genuinely one line.

app/api/chat/route.ts

- import OpenAI from 'openai'
- const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY })
+ import { createOpenAI } from '@spanlens/sdk/openai'
+ const openai = createOpenAI()

  const res = await openai.chat.completions.create({ ... })

app/main.py

- from openai import OpenAI
- client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
+ from spanlens.integrations.openai import create_openai
+ client = create_openai()

  res = client.chat.completions.create(...)

One line.Every LLM call, observed.