Spanlens SDK

Thin wrappers around the official OpenAI / Anthropic / Gemini SDKs that route traffic through Spanlens and add agent tracing primitives. Zero lock-in, response types and method signatures match the upstream SDKs 1:1. Available for TypeScript and Python.

Tip: use streaming for long responses

For requests with large max_tokens, slower models, or big JSON outputs, enable streaming, first byte arrives in ~200ms and total duration is unbounded. If you still want a single object back, accumulate chunks server-side and return the merged result from your route handler. See the streaming example below.

Install

npm install @spanlens/sdk
# or
pnpm add @spanlens/sdk
ts

Provider SDKs are installed on demand. For TypeScript, install openai, @anthropic-ai/sdk, or @google/generative-ai alongside Spanlens. For Python, use the matching extras shown above.

createOpenAI() — proxy mode

Constructs the official provider client with base_url pointed at the Spanlens proxy and api_key set to your Spanlens key. Your real OpenAI key never leaves the Spanlens server.

import { createOpenAI } from '@spanlens/sdk/openai'

const openai = createOpenAI({
  apiKey: process.env.SPANLENS_API_KEY,   // optional, defaults to env
  project: 'my-app',                      // optional, project scope
})

const res = await openai.chat.completions.create({
  model: 'gpt-4o-mini',
  messages: [{ role: 'user', content: 'Hi' }],
})
ts

Options

OptionTypeDefaultDescription
apiKey / api_keystringSPANLENS_API_KEY env varYour Spanlens API key (not your OpenAI key)
baseURL / base_urlstringSpanlens cloud proxyOverride for self-hosting

createAnthropic()

import { createAnthropic } from '@spanlens/sdk/anthropic'

const anthropic = createAnthropic()

const msg = await anthropic.messages.create({
  model: 'claude-haiku-4-5',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'Hi' }],
})
ts

createGemini()

Gemini doesn’t expose a per-instance base_url the way OpenAI/Anthropic do. On TypeScript we wrap GoogleGenerativeAI with a proxy. On Python the helper returns a pre-configured httpx.Client for raw REST calls; for the official Python SDK use configure_gemini() instead.

import { createGemini } from '@spanlens/sdk/gemini'

const genAI = createGemini()
const model = genAI.getGenerativeModel({ model: 'gemini-2.5-flash' })

const result = await model.generateContent('Hi')
ts

withPromptVersion(), tag a request with a prompt version

Link a logged request to a specific Prompts version so it appears in the A/B comparison table. Pass the helper as the second argument (TS) or unpack into kwargs (Python):

import { createOpenAI, withPromptVersion } from '@spanlens/sdk/openai'

const openai = createOpenAI()

const res = await openai.chat.completions.create(
  {
    model: 'gpt-4o-mini',
    messages: [{ role: 'system', content: systemPromptV3 }, { role: 'user', content: userMsg }],
  },
  withPromptVersion('chatbot-system@3'),
)
ts

Accepted formats:

  • <name>@<version>, e.g. chatbot-system@3
  • <name>@latest, auto-resolves server-side on every call
  • Raw prompt_versions.id UUID

The same helper exists on the Anthropic integration. For Gemini and any non-SDK transport, set the header directly: x-spanlens-prompt-version: <id>.

withUser() / withSession(), end-user tracking (v0.2.7+)

Tag a call with an end-user ID and session ID. The values are stored in requests.user_id / requests.session_id and can be filtered on the /requests page via ?userId= / ?sessionId=.

The same end-user ID is what per-end-user rate limits bucket on. Set a per-end-user cap from the Projects page and send withUser(id) on each call for it to apply.

import {
  createOpenAI,
  withUser,
  withSession,
  withPromptVersion,
} from '@spanlens/sdk/openai'

const openai = createOpenAI()

const res = await openai.chat.completions.create(
  { model: 'gpt-4o-mini', messages: [...] },
  {
    headers: {
      ...withUser(currentUser.id).headers,
      ...withSession(sessionId).headers,
      ...withPromptVersion('chatbot@3').headers,
    },
  },
)
ts

Each helper returns { headers: { ... } }, so multiple helpers can be spread together. The Anthropic integration exports the same helpers.

All three headers are stripped by the STRIP_PREFIXES (x-spanlens-*) policy before forwarding to upstream providers (OpenAI/Anthropic/Gemini), they are used only as Spanlens internal metadata.

withLogBody(), control body retention (v0.3.x+)

Opt out of storing request/response bodies in your dashboard while keeping token counts, cost, latency, and identifiers. Use when prompts may contain end-user PII you don't want sent to Spanlens.

Moderequest_body / response_bodytokens / cost / latency / modeluser_id / session_id
'full' (default)Stored, with API-key pattern maskingStoredStored
'meta'EmptyStoredStored
'none'EmptyStorednull

Even in 'full' mode, the server auto-masks API key patterns (sk-*, sk-proj-*, sk-ant-*, AIza*,sl_live_*) in stored bodies. See Security for the masking policy.

import { createOpenAI, withLogBody, withUser } from '@spanlens/sdk/openai'

const openai = createOpenAI()

// Single-call opt-out
const res = await openai.chat.completions.create(
  {
    model: 'gpt-4o-mini',
    messages: [{ role: 'user', content: somePromptThatMayContainPII }],
  },
  withLogBody('meta'),
)

// Combine with other helpers
const res2 = await openai.chat.completions.create(
  { model: 'gpt-4o-mini', messages: [...] },
  {
    headers: {
      ...withLogBody('meta').headers,
      ...withUser(currentUser.id).headers,
    },
  },
)
ts

Raw curl:

curl https://server.spanlens.io/proxy/openai/v1/chat/completions \
  -H "Authorization: Bearer $SPANLENS_API_KEY" \
  -H "x-spanlens-log-body: meta" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4o-mini", "messages": [...]}'

Note: withUser / withSession become no-ops when logBody: 'none' is set, the server drops those columns alongside the bodies.

sampleRate, trace sampling (v0.3.x+)

Cap the volume of trace + span ingestion without changing your application code. The decision is made per-trace at startTrace() / start_trace() time and is sticky for every span beneath that trace, so each surviving trace stays internally coherent in the dashboard (no half-sampled trees).

import { SpanlensClient } from '@spanlens/sdk'

const client = new SpanlensClient({
  apiKey: process.env.SPANLENS_API_KEY!,
  sampleRate: 0.1,   // keep 10% of successful traces; 100% of error traces
})
ts

Tail-based error bypass

Sampled-out traces buffer their span POSTs and PATCHes in memory. When the trace ends:

  • status = "error" → the buffer is replayed against the real transport (preserving FIFO order) and then the trace-end PATCH is sent. The trace appears in the dashboard identically to a sampled-in error trace.
  • status = "completed"→ the buffer is dropped silently. Zero network traffic for that trace's ingest layer.

This means you can run aggressive sampling (e.g. 0.01 = 1%) and still get every failure for debugging. The buffer is capped at 1,000 ops per trace to bound memory for long-running agents.

What it does and doesn't affect

SubsystemAffected by sampleRate?
Trace + span ingestion (/ingest/traces, /ingest/spans)Yes, this is the OTLP-equivalent agent-tracing layer
Proxy request logs (/proxy/* → ClickHouse requests)No, every LLM call is still recorded for cost / quota / anomaly tracking. Billing does not depend on the SDK's sampling decision.

Validation throws at client construction for values outside [0.0, 1.0], fail fast rather than silently dropping 100% of traces because a string was passed by accident.

observe(), agent tracing

Wrap any function to turn it into a span in an agent trace. The callback’s return value is automatically captured as the span’s output, no extra code needed. Pass input in the span options to record the inputs too.

import { SpanlensClient, observe } from '@spanlens/sdk'

const client = new SpanlensClient()
const trace = client.startTrace('answer-question')

const docs = await observe(
  trace,
  { name: 'retrieve', spanType: 'retrieval', input: { query } },
  async () => vectorDb.search(query),   // return value → auto-saved as output
)

const response = await observe(
  trace,
  { name: 'generate', spanType: 'llm' },
  async () => openai.chat.completions.create({ /* ... */ }),
)

await trace.end()
ts

Each observe() call creates a row in the spans table with timing, input/output, and a link to the parent trace. Inspect traces in /traces.

Streaming inside observe()

With stream: true you control the chunk loop, so pass the final token counts to span.end() once the stream is exhausted. The accumulated text you return is auto-captured as output.

Proxy users: output is automatic

If you route through the Spanlens proxy via createOpenAI(), createAnthropic(), or createGemini(), the proxy captures the completed response server-side and writes it to your span automatically, no extra code needed. The return accumulated pattern below is the fallback for direct (non-proxy) calls.

const text = await observe(
  trace,
  {
    name: 'gpt-4o-mini · analysis',
    spanType: 'llm',
    input: messages,           // captured at span creation
  },
  async (span) => {
    const stream = await openai.chat.completions.create({
      model: 'gpt-4o-mini',
      messages,
      stream: true,
      stream_options: { include_usage: true },
    }, { headers: span.traceHeaders() })

    let accumulated = ''
    let usage: { prompt_tokens: number; completion_tokens: number; total_tokens: number } | null = null

    for await (const chunk of stream) {
      accumulated += chunk.choices[0]?.delta?.content ?? ''
      if (chunk.usage) usage = chunk.usage
    }

    // Pass token counts manually — the SDK can't read streaming chunks
    if (usage) {
      await span.end({
        status: 'completed',
        promptTokens: usage.prompt_tokens,
        completionTokens: usage.completion_tokens,
        totalTokens: usage.total_tokens,
      })
    }

    return accumulated   // ← auto-saved as output; no need to pass output: here
  },
)
ts

observeOpenAI(), span + auto-parsed usage

Shorthand that wraps a single LLM call as a span, injects the trace headers so the proxy log can be linked to the span, and auto-parses usage from the response. Pass promptVersion in one shot:

import { observeOpenAI } from '@spanlens/sdk'

// String form — just give it a span name
const res = await observeOpenAI(trace, 'greeting', (headers) =>
  openai.chat.completions.create(
    { model: 'gpt-4o-mini', messages: [{ role: 'user', content: 'Hi' }] },
    { headers, ...withPromptVersion('greeter@latest') },
  ),
)

// Options object — pass logBody to opt out of body storage per call
const res2 = await observeOpenAI(
  trace,
  { name: 'pii-heavy-call', logBody: 'meta', promptVersion: 'greeter@latest' },
  (headers) => openai.chat.completions.create({ ... }, { headers }),
)
ts

Same pattern works with observeAnthropic() / observe_anthropic() and observeGemini() / observe_gemini(). The logBody option on the options form maps 1:1 to the withLogBody() helper.

observeOllama(), self-hosted LLMs (v0.5.0+ / 0.4.0+)

Ollama runs on your own machine (or in your VPC) and exposes an OpenAI-compatible API at http://localhost:11434/v1. Because the Spanlens proxy is hosted, it can’t reach your local Ollama, but the SDK can. Wrap your Ollama call with observeOllama() and only the trace metadata (model, tokens, latency) flows to Spanlens. Your prompts and responses never leave your machine via Spanlens.

The dashboard tags the trace provider: ollamaand leaves the cost column as “Self-hosted” (no per-token billing to compute). Usage tokens still come through because Ollama’s response includes the same usage field OpenAI does.

import OpenAI from 'openai'
import { SpanlensClient, observeOllama } from '@spanlens/sdk'

const client = new SpanlensClient({ apiKey: process.env.SPANLENS_API_KEY! })

// Point the OpenAI SDK at your local Ollama — apiKey is required by the
// SDK but ignored by Ollama itself when running locally.
const ollama = new OpenAI({
  baseURL: 'http://localhost:11434/v1',
  apiKey: 'ollama',
})

const trace = client.startTrace({ name: 'local_chat' })
const res = await observeOllama(trace, 'chat', (headers) =>
  ollama.chat.completions.create(
    {
      model: 'llama3.2',
      messages: [{ role: 'user', content: 'Hello' }],
    },
    { headers },
  ),
)
await trace.end()
ts

Other OpenAI-compatible runtimes (vLLM, LM Studio, Together, Groq, …)

For any other OpenAI-compatible endpoint, use observeOpenAI with the provider override so the dashboard labels it correctly:

await observeOpenAI(
  trace,
  { name: 'inference', provider: 'vllm' }, // or 'lm-studio', 'together', 'groq', …
  (headers) => vllmClient.chat.completions.create({ ... }, { headers }),
)
ts

Framework integrations

If you use LangChain, LangGraph (v0.6.0+ / 0.5.0+), Vercel AI SDK, or LlamaIndex, plug in the matching integration instead of wiring callbacks manually. Each one records spans automatically, tokens, latency, model name, and the full chain/tool/retriever topology , without importing from the framework itself (duck-typed, version-agnostic).

LangChain & LangGraph (v0.6.0+ / 0.5.0+)

One callback handler works for plain LangChain chains, LCEL pipelines, and LangGraph compiled graphs. Spanlens captures LLM, chain, tool, and retriever spans automatically, and uses LangChain’s built-in runId / parentRunId tracking to assemble the right span tree without any manual parent wiring.

import { SpanlensClient } from '@spanlens/sdk'
import { createSpanlensCallbackHandler } from '@spanlens/sdk/langchain'

const client = new SpanlensClient({ apiKey: process.env.SPANLENS_API_KEY! })
const handler = createSpanlensCallbackHandler({ client })

// Plain LangChain — chain, LLM, agent, retriever — all work the same way:
await chain.invoke({ input: '...' }, { callbacks: [handler] })

// LangGraph — pass the same handler to the compiled graph:
const graph = workflow.compile()
const result = await graph.invoke(
  { input: 'plan a trip to Tokyo' },
  { callbacks: [handler] },
)
ts

Resulting span tree

A 2-node LangGraph with one tool call produces this trace on the dashboard:

trace: langchain_run
└─ chain.LangGraph                       (the graph itself)
   ├─ chain.plan                         (node 1)
   │  └─ llm.ChatOpenAI                  (model + tokens captured)
   └─ chain.execute                      (node 2)
      ├─ tool.search                     (input + output captured)
      └─ llm.ChatOpenAI

Options

All defaults preserve LangGraph’s full structure. Turn things off to quiet down the dashboard:

OptionDefaultWhat it controls
captureChains / capture_chainstrueChain spans (LangGraph nodes, LCEL steps, plain chains). Off = LLM/tool/retriever spans become direct children of the trace.
captureTools / capture_toolstrueTool call spans (each tool execution).
captureRetrieval / capture_retrievaltrueRetriever spans (with documents summarised as output).
maxInputBytes / max_input_bytes16_384JSON byte cap on span.input. Larger payloads become { __truncated: true, preview, originalBytes }.
maxOutputBytes / max_output_bytes16_384Same as above, for span.output.
trace,Optional pre-existing trace to attach to. When given, the handler does not close the trace (caller owns the lifecycle).
traceName / trace_name"langchain_run"Name for auto-created traces (one per top-level run).

Notes

  • Single handler, multiple runs, concurrent invocations are tracked by LangChain’s per-run UUID, so one handler instance is safe to share across parallel graph executions.
  • Duck-typed, Spanlens does not import from @langchain/core or langchain-core. Major-version bumps in LangChain itself don’t require an SDK update.
  • Python optional dep, the handler falls back to a plain class when langchain-coreisn’t installed, so unit tests for your own code can drive it without LangChain being in the test environment.

Vercel AI SDK

Pass tracker.onStepFinish and tracker.onFinish to generateText / streamText. Works with AI SDK 4.x and 5.x.

import { createSpanlensTracker } from '@spanlens/sdk/vercel-ai'
import { SpanlensClient } from '@spanlens/sdk'
import { generateText } from 'ai'
import { openai } from '@ai-sdk/openai'

const client = new SpanlensClient({ apiKey: process.env.SPANLENS_API_KEY! })
const tracker = createSpanlensTracker({ client, modelName: 'gpt-4o' })

const result = await generateText({
  model: openai('gpt-4o'),
  messages: [{ role: 'user', content: 'Hello' }],
  onStepFinish: tracker.onStepFinish,  // records intermediate tool steps
  onFinish: tracker.onFinish,          // closes span with final token counts
})
ts

LlamaIndex TS

Hook into Settings.callbackManager before running queries. Call the returned unregister() function to detach when done.

import { registerSpanlensCallbacks } from '@spanlens/sdk/llamaindex'
import { SpanlensClient } from '@spanlens/sdk'
import { Settings } from 'llamaindex'

const client = new SpanlensClient({ apiKey: process.env.SPANLENS_API_KEY! })
const unregister = registerSpanlensCallbacks(Settings, { client })

// ... run your LlamaIndex queries ...
await queryEngine.query({ query: 'What is RAG?' })

unregister()  // remove callbacks when done (e.g. on process exit)
ts

Low-level: trace + span handles

For complex flows (parallel spans, manual timing) use the handle-based API directly. Spans end automatically on context-exit in Python; in TypeScript call span.end() explicitly.

import { SpanlensClient } from '@spanlens/sdk'

const client = new SpanlensClient()
const trace = client.startTrace('multi-agent-workflow')

const spanA = trace.startSpan('agent-a')
const spanB = trace.startSpan('agent-b')

const [resA, resB] = await Promise.all([
  runAgentA().then((r) => { spanA.end({ output: r }); return r }),
  runAgentB().then((r) => { spanB.end({ output: r }); return r }),
])

await trace.end()
ts

Graceful shutdown, client.flush()

Ingest calls run in the background. In short-lived processes, scripts, one-shot jobs, serverless cold starts, the process can exit before all POSTs complete. Call flush() before exit to drain them:

const client = new SpanlensClient({ apiKey: process.env.SPANLENS_API_KEY! })

// ... your agent logic ...

await client.flush()   // resolves when all in-flight ingest calls have settled
process.exit(0)
ts

flush() uses Promise.allSettledinternally, it resolves even if some requests failed, so a network error won't hang the process. Failed writes are silently dropped (or forwarded to your onError hook if set). Transient failures are retried up to 3 times with exponential back-off (200 ms → 400 ms → 800 ms) before giving up.

Non-blocking by design

Both SDKs do the actual ingest HTTP calls in the background, the TypeScript SDK uses the runtime’s native promise queue, while Python uses a small daemon thread pool. Either way, your hot path (the LLM call itself) is never delayed by Spanlens, and a slow / down Spanlens server never crashes your app. Failures are swallowed by default; pass silent: false (TS) or silent=False (Python) plus an onError hook to surface them.

TypeScript & Python compatibility

  • TypeScript SDK: Node 18+, Deno, Bun, Vercel Edge / Cloudflare Workers
  • Python SDK: 3.9, 3.10, 3.11, 3.12, 3.13

Next: direct proxy for languages without an SDK, or self-hosting.