Why Spanlens

There are good LLM observability tools out there. We've used them, learned from them, and built Spanlens to fix six specific things that bothered us. If none of these matter to you, one of the other tools will be fine.

The six things only Spanlens does end-to-end

  1. One-line baseURL proxy install.No SDK wrapping, no OTel exporter, no framework adapter. Change the base URL in your existing OpenAI / Anthropic / Gemini client and you're instrumented. Helicone has the same idea but only OpenAI; Spanlens covers all three providers and Azure OpenAI on the same shape.
  2. Cache-token billing that's actually right. Anthropic cache_read_input_tokens bills at 0.1× input; OpenAI cached_tokens bills at 0.5×. Most tools roll cache reads into prompt tokens — over-reporting cache-heavy workloads by 2–10×. We split them and price each tier separately.
  3. Critical Path on agent traces.When your agent fires five parallel tool calls, the answer to “why is this slow?” is the longest dependency chain, not the slowest single span. Spanlens computes it automatically and highlights it in the waterfall. LangSmith / Langfuse show the tree; you eyeball the critical path yourself.
  4. Prompt A/B with Welch's t-test built in. Ship v1 and v2 to a traffic split, watch them battle on cost / latency / eval score, then get a statistical significance verdict instead of staring at means. No copy-pasting into a notebook.
  5. Model-swap recommender with dollar figures.“Switch this prompt from gpt-4o to gpt-4o-mini— expected monthly saving $412.10, no quality regression.” The recommendation comes with the eval evidence and a one-click experiment to verify before you ship.
  6. Silent-loss-proof ingest.When the analytics DB hiccups, we don't drop your logs — they land in a fallback queue and replay automatically when the DB is back. Cron-driven, observable from /health/deep. Important when you bill customers based on these numbers.

How we line up against the others

Snapshot as of May 2026. Read this as “default behavior, no plugins, no enterprise tier”.

SpanlensHeliconeLangfuseLangSmithPhoenix
One-line baseURL installOpenAI onlySDK wrapSDK wrapOTel
Self-hostable (MIT)~ (EE folder)
Cache-token split billingpartialpartialpartial
Critical Path on traces
Prompt A/B with t-testmanualmanualmanual
Model-swap $ recommendation
Judge-vs-human correlation~~
OpenTelemetry ingest
SpanlensHeliconeLangfuseLangSmith
Coverage score across six differentiator axes. The bigger the polygon, the broader the out-of-box capability — read alongside the detailed checkmark table below.

When Spanlens is the wrong choice

  • You're LangChain-native and want every chain decoration first-party. We support LangChain, LangGraph, LCEL, Vercel AI SDK, and LlamaIndex via a single callback handler — but LangSmith is built by the LangChain team and goes deeper into their internals.
  • You need on-prem SOC 2 with FedRAMP yesterday. Our self-host is solid, but the cloud is SOC 2 Type II and our compliance roadmap targets ISO 27001 in 2026 Q3. For air-gapped FedRAMP today, talk to enterprise vendors.
  • You want a vector DB or RAG framework, not observability.Spanlens observes RAG pipelines (every retrieve / rerank / generate span shows up) but we don't ship a vector store. That's Pinecone / Weaviate / Qdrant territory.

Migration paths

Spanlens is a drop-in replacement at the baseURL level, so you can run it side-by-side with whatever you have today and turn the other off only when you're happy. There's no “rip out and replace” commitment. See the Quick start for both fresh-install and CLI-migration paths.


Spotted a fact about a competitor that's out of date? Email support@spanlens.io — we'll fix it within a day.