Direct proxy (any language)

If you're not using the TypeScript SDK, you can still use Spanlens by pointing any OpenAI / Anthropic / Gemini client at our proxy URL. Works with Python, Ruby, Go, Rust, Java, PHP, or raw HTTP.

Use streaming for long requests

The proxy runs on Vercel Pro with a 300-second hard ceiling, and Spanlens gracefully closes streams at 290 seconds to make room for the log to flush. Long requests (large max_tokens, slow models, JSON mode with big outputs) should use stream: true, first byte arrives in ~200 ms regardless of total duration. If a stream gets cut off at the deadline, the row is logged with truncated: true (visible as a badge in /requests) so you can see when it happens and tune max_tokens accordingly. Non-streaming requests that exceed the upstream timeout return HTTP 504.

How it works

Spanlens exposes a 1:1 compatible proxy at:

https://server.spanlens.io/proxy/openai/v1
https://server.spanlens.io/proxy/anthropic
https://server.spanlens.io/proxy/gemini/v1beta
https://server.spanlens.io/proxy/azure
https://server.spanlens.io/proxy/mistral/v1
https://server.spanlens.io/proxy/openrouter/v1

Send requests exactly as you would to the real provider, with two changes:

  1. Base URL, point your SDK at the Spanlens proxy
  2. API key, use your Spanlens API key (starts with sl_live_) instead of the provider's. The real provider key registered under your Spanlens key is decrypted server-side and forwarded, your client never sees it.

Authentication transports per SDK

Each provider's SDK puts the API key on the wire differently. Spanlens accepts whichever shape the SDK sends, you don't need to override anything when using the upstream client. If you're writing a hand-rolled client (curl, raw fetch, a language without an official SDK), pick whichever transport is convenient.

SDK / clientHow the key is sentSpanlens accepts?
OpenAI (any language)Authorization: Bearer sl_live_…
Anthropic (any language)x-api-key: sl_live_…
@google/generative-ai (current)x-goog-api-key: sl_live_…
Azure OpenAI (any language)Authorization: Bearer sl_live_…

Azure note: your Spanlens key still goes on Authorization: Bearer …. The real Azure api-key header is added by the proxy after looking up the encrypted key you registered in the dashboard.

The authApiKey middleware tries them in order and the first non-empty one wins. Implementation: apps/server/src/middleware/authApiKey.ts.

Python, OpenAI

from openai import OpenAI

client = OpenAI(
    api_key=os.environ["SPANLENS_API_KEY"],
    base_url="https://server.spanlens.io/proxy/openai/v1",
)

res = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hi"}],
)
python

Python, Anthropic

from anthropic import Anthropic

client = Anthropic(
    api_key=os.environ["SPANLENS_API_KEY"],
    base_url="https://server.spanlens.io/proxy/anthropic",
)

msg = client.messages.create(
    model="claude-haiku-4-5",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hi"}],
)
python

Python, Azure OpenAI

Azure OpenAI uses Microsoft's /openai/v1/* endpoint (GA Aug 2025) so it is drop-in OpenAI-compatible, same request and response shapes, same streaming format. Register your Azure resource URL + API key under the Spanlens key in the dashboard once; the proxy then injects them at call time. Your client just talks to /proxy/azure.

from openai import OpenAI

client = OpenAI(
    api_key=os.environ["SPANLENS_API_KEY"],
    base_url="https://server.spanlens.io/proxy/azure",
)

# 'model' is your Azure deployment name, not the underlying model id.
res = client.chat.completions.create(
    model="my-gpt4o-mini-deployment",
    messages=[{"role": "user", "content": "Hi"}],
)
python

Provider key registration step: dashboard → Projects & Keys → expand a Spanlens key → Add provider key → Azure OpenAI → paste https://<your-resource>.openai.azure.com + API key 1 from Azure portal. The proxy stores the URL on the key row and injects the right api-key header on every request.

Python, Mistral

Mistral's API is OpenAI-compatible end-to-end (request shape, SSE chunk format, usage field), so the same openai Python package works with the base URL pointed at /proxy/mistral/v1. Useful when EU data residency matters (Mistral hosts in France) or when you want to A/B against OpenAI without rewriting your client.

from openai import OpenAI

client = OpenAI(
    api_key=os.environ["SPANLENS_API_KEY"],
    base_url="https://server.spanlens.io/proxy/mistral/v1",
)

res = client.chat.completions.create(
    model="mistral-small-latest",
    messages=[{"role": "user", "content": "Hi"}],
)
python

Supported models include mistral-large-latest, mistral-medium-latest, mistral-small-latest, pixtral-large-latest / pixtral-12b (multimodal), codestral-latest, the ministral-* family, open-mistral-nemo, and mistral-embed for embeddings. Cost lands on every row.

Python, OpenRouter

OpenRouter is a meta-provider: one API key, one base URL, 100+ models from 30+ providers (OpenAI, Anthropic, Mistral, Meta, DeepSeek, Qwen, Cohere, Perplexity, ...). The wire protocol is OpenAI-compatible, so the same openai client works once the base URL is pointed at /proxy/openrouter/v1. Switch models with a single string change instead of swapping clients.

from openai import OpenAI

client = OpenAI(
    api_key=os.environ["SPANLENS_API_KEY"],
    base_url="https://server.spanlens.io/proxy/openrouter/v1",
)

# Model id carries a vendor prefix
res = client.chat.completions.create(
    model="anthropic/claude-3.5-sonnet",
    messages=[{"role": "user", "content": "Hi"}],
)

# Same client, different model — no code changes
res2 = client.chat.completions.create(
    model="meta-llama/llama-3.1-70b-instruct",
    messages=[{"role": "user", "content": "Hi"}],
)
python

Cost preference: when OpenRouter reports usage.cost on the response (authoritative, includes their margin/discount) Spanlens logs that figure verbatim. When it is absent, the proxy strips the vendor prefix (anthropic/claude-3-5-sonnet claude-3-5-sonnet) and looks the model up in the same model_prices table the other providers use. Unknown model + no usage.costcost_usd lands NULL (visible in /requestsso you know to check OpenRouter's own dashboard for that row).

curl, raw HTTP

curl https://server.spanlens.io/proxy/openai/v1/chat/completions \
  -H "Authorization: Bearer $SPANLENS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hi"}]
  }'
bash

Ruby

require "openai"

client = OpenAI::Client.new(
  access_token: ENV["SPANLENS_API_KEY"],
  uri_base: "https://server.spanlens.io/proxy/openai",
)

res = client.chat(parameters: {
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "Hi" }],
})
ruby

Go

import "github.com/sashabaranov/go-openai"

config := openai.DefaultConfig(os.Getenv("SPANLENS_API_KEY"))
config.BaseURL = "https://server.spanlens.io/proxy/openai/v1"

client := openai.NewClientWithConfig(config)

res, _ := client.CreateChatCompletion(ctx, openai.ChatCompletionRequest{
    Model: "gpt-4o-mini",
    Messages: []openai.ChatCompletionMessage{
        {Role: "user", Content: "Hi"},
    },
})
go

Streaming

Server-Sent Events streaming works transparently. Spanlens tees the stream, one copy flows to you in real time, the other is parsed asynchronously to extract token usage. Latency overhead is negligible (10–50ms).

Passing project / metadata

Add an X-Spanlens-Project header to tag requests with a project scope:

-H "X-Spanlens-Project: my-backend-service"

Add an X-Spanlens-Prompt-Version header to link the request to a specific prompt version so it appears in the A/B comparison table. Accepts name@version, name@latest, or a raw UUID:

-H "X-Spanlens-Prompt-Version: chatbot-system@3"
# or
-H "X-Spanlens-Prompt-Version: chatbot-system@latest"
# or
-H "X-Spanlens-Prompt-Version: ae1c3c1e-99eb-2b98-5f05-012345678901"

Invalid or unknown values silently resolve to null, the proxy never fails because a prompt tag is stale. The request just isn't linked to a version.

Add X-Spanlens-User and X-Spanlens-Session headers to tag the request with an end-user or session ID. The values are opaque strings of your choosing (Spanlens never interprets them):

-H "X-Spanlens-User: user_abc123"
-H "X-Spanlens-Session: sess_xyz789"

Tagged requests roll up at /users (per-end-user cost / token / latency analytics) and can be filtered at /requests via ?userId=… / ?sessionId=…. See Users docs for tagging strategy and SDK helpers.

Controlling body retention, X-Spanlens-Log-Body

Spanlens stores the full request and response bodies by default (with API-key auto-masking , see below). For PII-sensitive workloads, opt out per call with the X-Spanlens-Log-Body header:

-H "X-Spanlens-Log-Body: full"   # default, store bodies (with key masking)
-H "X-Spanlens-Log-Body: meta"   # drop bodies; keep tokens/cost/latency/user/session
-H "X-Spanlens-Log-Body: none"   # same as meta + drop user_id/session_id

Unknown values fall back to full (the existing behavior) so a malformed header never silently turns logging off. SDK equivalent: withLogBody() / observeOpenAI({ logBody }).

Server-side body sanitization

Even in full mode, the server scans request_body, response_body, and error_message for API key patterns before the row is written to ClickHouse. Anything matching one of the patterns below (≥12 characters after the prefix) is replaced with <prefix>***:

  • Spanlens: sl_live_*
  • Anthropic: sk-ant-*
  • OpenAI project keys: sk-proj-*
  • OpenAI legacy keys: sk-*
  • Google: AIza*

This is pattern-based, not ML-based, it catches keys that slip into prompts/tool output/error strings, but it does not redact natural-language PII (names, emails, card numbers). For those, use X-Spanlens-Log-Body: meta. See Security docs for full details.

About prompt-cache breakdown

When Anthropic returns cache_read_input_tokens / cache_creation_input_tokens or OpenAI returns prompt_tokens_details.cached_tokens, Spanlens parses them out of the response automatically and stores the breakdown in requests.cache_read_tokens / cache_write_tokens. No header from you is required.Cost is billed at each provider's reduced cache rate (≈ 0.1× input on Anthropic, ≈ 0.5× input on OpenAI). See cost tracking for the full formula.

Rate limits and response headers

Spanlens applies a high per-organization per-minute ceiling on /proxy/* purely to stop a runaway loop, not to throttle normal production traffic. Going over it does not reject your request: the call passes through to your provider and the response carries X-Spanlens-RateLimit-Overage: trueso you can spot the spike. Your plan's monthly request quota is the limit that actually gates usage.

Every /proxy/* response carries the standard rate limit headers so a client can read the current window without guessing.

  • X-RateLimit-Limit, requests allowed in the current window for your plan.
  • X-RateLimit-Remaining, requests left in the current window.
  • X-RateLimit-Reset, unix epoch second at which the window rolls over. Use this directly rather than parsing the server clock from Date, since clock skew costs you retries.
  • X-RateLimit-Window, the window length in seconds. Currently always 60s, exposed as a header so we can change it without breaking clients that read it.

Customer-configured rate limits

You can set your own limits on a Spanlens key, a project, or an individual end-user from the Projects page. Unlike the platform ceiling above, exceeding one of your own limits does return a 429 to the caller, because you configured it precisely to throttle that traffic. The error body identifies which limit fired:

{
  "error": {
    "code": "RATE_LIMIT",
    "message": "Customer-configured rate limit exceeded (end_user): 60 requests per 60s.",
    "details": {
      "source": "customer_limit",
      "scope": "end_user",
      "limit": 60,
      "window_seconds": 60,
      "end_user_id": "user_123"
    }
  }
}

The response also carries Retry-After (the window length in seconds) and X-Spanlens-RateLimit-Scope (api_key, project, or end_user). A customer_limit 429 never includes an upgrade link, which is how you tell it apart from a platform or plan limit. Per-end-user limits bucket on the X-Spanlens-User header, so send it (the SDK withUser() helper does this) for those limits to apply.

Self-hosting

If you're running Spanlens on your own infra, replace the base URL:

https://your-spanlens-domain.com/proxy/openai/v1

See self-hosting for Docker deployment.


Next: self-hosting with Docker.