Direct proxy (any language)
If you're not using the TypeScript SDK, you can still use Spanlens by pointing any OpenAI / Anthropic / Gemini client at our proxy URL. Works with Python, Ruby, Go, Rust, Java, PHP, or raw HTTP.
Use streaming for long requests
The proxy runs on Vercel Pro with a 300-second hard ceiling, and Spanlens gracefully closes streams at 290 seconds to make room for the log to flush. Long requests (large max_tokens, slow models, JSON mode with big outputs) should use stream: true, first byte arrives in ~200 ms regardless of total duration. If a stream gets cut off at the deadline, the row is logged with truncated: true (visible as a badge in /requests) so you can see when it happens and tune max_tokens accordingly. Non-streaming requests that exceed the upstream timeout return HTTP 504.
How it works
Spanlens exposes a 1:1 compatible proxy at:
https://server.spanlens.io/proxy/openai/v1
https://server.spanlens.io/proxy/anthropic
https://server.spanlens.io/proxy/gemini/v1beta
https://server.spanlens.io/proxy/azure
https://server.spanlens.io/proxy/mistral/v1
https://server.spanlens.io/proxy/openrouter/v1Send requests exactly as you would to the real provider, with two changes:
- Base URL, point your SDK at the Spanlens proxy
- API key, use your Spanlens API key (starts with
sl_live_) instead of the provider's. The real provider key registered under your Spanlens key is decrypted server-side and forwarded, your client never sees it.
Authentication transports per SDK
Each provider's SDK puts the API key on the wire differently. Spanlens accepts whichever shape the SDK sends, you don't need to override anything when using the upstream client. If you're writing a hand-rolled client (curl, raw fetch, a language without an official SDK), pick whichever transport is convenient.
| SDK / client | How the key is sent | Spanlens accepts? |
|---|---|---|
| OpenAI (any language) | Authorization: Bearer sl_live_… | ✓ |
| Anthropic (any language) | x-api-key: sl_live_… | ✓ |
| @google/generative-ai (current) | x-goog-api-key: sl_live_… | ✓ |
| Azure OpenAI (any language) | Authorization: Bearer sl_live_… | ✓ |
Azure note: your Spanlens key still goes on Authorization: Bearer …. The real Azure api-key header is added by the proxy after looking up the encrypted key you registered in the dashboard.
The authApiKey middleware tries them in order and the first non-empty one wins. Implementation: apps/server/src/middleware/authApiKey.ts.
Python, OpenAI
from openai import OpenAI
client = OpenAI(
api_key=os.environ["SPANLENS_API_KEY"],
base_url="https://server.spanlens.io/proxy/openai/v1",
)
res = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hi"}],
)pythonPython, Anthropic
from anthropic import Anthropic
client = Anthropic(
api_key=os.environ["SPANLENS_API_KEY"],
base_url="https://server.spanlens.io/proxy/anthropic",
)
msg = client.messages.create(
model="claude-haiku-4-5",
max_tokens=1024,
messages=[{"role": "user", "content": "Hi"}],
)pythonPython, Azure OpenAI
Azure OpenAI uses Microsoft's /openai/v1/* endpoint (GA Aug 2025) so it is drop-in OpenAI-compatible, same request and response shapes, same streaming format. Register your Azure resource URL + API key under the Spanlens key in the dashboard once; the proxy then injects them at call time. Your client just talks to /proxy/azure.
from openai import OpenAI
client = OpenAI(
api_key=os.environ["SPANLENS_API_KEY"],
base_url="https://server.spanlens.io/proxy/azure",
)
# 'model' is your Azure deployment name, not the underlying model id.
res = client.chat.completions.create(
model="my-gpt4o-mini-deployment",
messages=[{"role": "user", "content": "Hi"}],
)pythonProvider key registration step: dashboard → Projects & Keys → expand a Spanlens key → Add provider key → Azure OpenAI → paste https://<your-resource>.openai.azure.com + API key 1 from Azure portal. The proxy stores the URL on the key row and injects the right api-key header on every request.
Python, Mistral
Mistral's API is OpenAI-compatible end-to-end (request shape, SSE chunk format, usage field), so the same openai Python package works with the base URL pointed at /proxy/mistral/v1. Useful when EU data residency matters (Mistral hosts in France) or when you want to A/B against OpenAI without rewriting your client.
from openai import OpenAI
client = OpenAI(
api_key=os.environ["SPANLENS_API_KEY"],
base_url="https://server.spanlens.io/proxy/mistral/v1",
)
res = client.chat.completions.create(
model="mistral-small-latest",
messages=[{"role": "user", "content": "Hi"}],
)pythonSupported models include mistral-large-latest, mistral-medium-latest, mistral-small-latest, pixtral-large-latest / pixtral-12b (multimodal), codestral-latest, the ministral-* family, open-mistral-nemo, and mistral-embed for embeddings. Cost lands on every row.
Python, OpenRouter
OpenRouter is a meta-provider: one API key, one base URL, 100+ models from 30+ providers (OpenAI, Anthropic, Mistral, Meta, DeepSeek, Qwen, Cohere, Perplexity, ...). The wire protocol is OpenAI-compatible, so the same openai client works once the base URL is pointed at /proxy/openrouter/v1. Switch models with a single string change instead of swapping clients.
from openai import OpenAI
client = OpenAI(
api_key=os.environ["SPANLENS_API_KEY"],
base_url="https://server.spanlens.io/proxy/openrouter/v1",
)
# Model id carries a vendor prefix
res = client.chat.completions.create(
model="anthropic/claude-3.5-sonnet",
messages=[{"role": "user", "content": "Hi"}],
)
# Same client, different model — no code changes
res2 = client.chat.completions.create(
model="meta-llama/llama-3.1-70b-instruct",
messages=[{"role": "user", "content": "Hi"}],
)pythonCost preference: when OpenRouter reports usage.cost on the response (authoritative, includes their margin/discount) Spanlens logs that figure verbatim. When it is absent, the proxy strips the vendor prefix (anthropic/claude-3-5-sonnet → claude-3-5-sonnet) and looks the model up in the same model_prices table the other providers use. Unknown model + no usage.cost → cost_usd lands NULL (visible in /requestsso you know to check OpenRouter's own dashboard for that row).
curl, raw HTTP
curl https://server.spanlens.io/proxy/openai/v1/chat/completions \
-H "Authorization: Bearer $SPANLENS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "Hi"}]
}'bashRuby
require "openai"
client = OpenAI::Client.new(
access_token: ENV["SPANLENS_API_KEY"],
uri_base: "https://server.spanlens.io/proxy/openai",
)
res = client.chat(parameters: {
model: "gpt-4o-mini",
messages: [{ role: "user", content: "Hi" }],
})rubyGo
import "github.com/sashabaranov/go-openai"
config := openai.DefaultConfig(os.Getenv("SPANLENS_API_KEY"))
config.BaseURL = "https://server.spanlens.io/proxy/openai/v1"
client := openai.NewClientWithConfig(config)
res, _ := client.CreateChatCompletion(ctx, openai.ChatCompletionRequest{
Model: "gpt-4o-mini",
Messages: []openai.ChatCompletionMessage{
{Role: "user", Content: "Hi"},
},
})goStreaming
Server-Sent Events streaming works transparently. Spanlens tees the stream, one copy flows to you in real time, the other is parsed asynchronously to extract token usage. Latency overhead is negligible (10–50ms).
Passing project / metadata
Add an X-Spanlens-Project header to tag requests with a project scope:
-H "X-Spanlens-Project: my-backend-service"Add an X-Spanlens-Prompt-Version header to link the request to a specific prompt version so it appears in the A/B comparison table. Accepts name@version, name@latest, or a raw UUID:
-H "X-Spanlens-Prompt-Version: chatbot-system@3"
# or
-H "X-Spanlens-Prompt-Version: chatbot-system@latest"
# or
-H "X-Spanlens-Prompt-Version: ae1c3c1e-99eb-2b98-5f05-012345678901"Invalid or unknown values silently resolve to null, the proxy never fails because a prompt tag is stale. The request just isn't linked to a version.
Add X-Spanlens-User and X-Spanlens-Session headers to tag the request with an end-user or session ID. The values are opaque strings of your choosing (Spanlens never interprets them):
-H "X-Spanlens-User: user_abc123"
-H "X-Spanlens-Session: sess_xyz789"Tagged requests roll up at /users (per-end-user cost / token / latency analytics) and can be filtered at /requests via ?userId=… / ?sessionId=…. See Users docs for tagging strategy and SDK helpers.
Controlling body retention, X-Spanlens-Log-Body
Spanlens stores the full request and response bodies by default (with API-key auto-masking , see below). For PII-sensitive workloads, opt out per call with the X-Spanlens-Log-Body header:
-H "X-Spanlens-Log-Body: full" # default, store bodies (with key masking)
-H "X-Spanlens-Log-Body: meta" # drop bodies; keep tokens/cost/latency/user/session
-H "X-Spanlens-Log-Body: none" # same as meta + drop user_id/session_idUnknown values fall back to full (the existing behavior) so a malformed header never silently turns logging off. SDK equivalent: withLogBody() / observeOpenAI({ logBody }).
Server-side body sanitization
Even in full mode, the server scans request_body, response_body, and error_message for API key patterns before the row is written to ClickHouse. Anything matching one of the patterns below (≥12 characters after the prefix) is replaced with <prefix>***:
- Spanlens:
sl_live_* - Anthropic:
sk-ant-* - OpenAI project keys:
sk-proj-* - OpenAI legacy keys:
sk-* - Google:
AIza*
This is pattern-based, not ML-based, it catches keys that slip into prompts/tool output/error strings, but it does not redact natural-language PII (names, emails, card numbers). For those, use X-Spanlens-Log-Body: meta. See Security docs for full details.
About prompt-cache breakdown
When Anthropic returns cache_read_input_tokens / cache_creation_input_tokens or OpenAI returns prompt_tokens_details.cached_tokens, Spanlens parses them out of the response automatically and stores the breakdown in requests.cache_read_tokens / cache_write_tokens. No header from you is required.Cost is billed at each provider's reduced cache rate (≈ 0.1× input on Anthropic, ≈ 0.5× input on OpenAI). See cost tracking for the full formula.
Rate limits and response headers
Spanlens applies a high per-organization per-minute ceiling on /proxy/* purely to stop a runaway loop, not to throttle normal production traffic. Going over it does not reject your request: the call passes through to your provider and the response carries X-Spanlens-RateLimit-Overage: trueso you can spot the spike. Your plan's monthly request quota is the limit that actually gates usage.
Every /proxy/* response carries the standard rate limit headers so a client can read the current window without guessing.
X-RateLimit-Limit, requests allowed in the current window for your plan.X-RateLimit-Remaining, requests left in the current window.X-RateLimit-Reset, unix epoch second at which the window rolls over. Use this directly rather than parsing the server clock fromDate, since clock skew costs you retries.X-RateLimit-Window, the window length in seconds. Currently always60s, exposed as a header so we can change it without breaking clients that read it.
Customer-configured rate limits
You can set your own limits on a Spanlens key, a project, or an individual end-user from the Projects page. Unlike the platform ceiling above, exceeding one of your own limits does return a 429 to the caller, because you configured it precisely to throttle that traffic. The error body identifies which limit fired:
{
"error": {
"code": "RATE_LIMIT",
"message": "Customer-configured rate limit exceeded (end_user): 60 requests per 60s.",
"details": {
"source": "customer_limit",
"scope": "end_user",
"limit": 60,
"window_seconds": 60,
"end_user_id": "user_123"
}
}
}The response also carries Retry-After (the window length in seconds) and X-Spanlens-RateLimit-Scope (api_key, project, or end_user). A customer_limit 429 never includes an upgrade link, which is how you tell it apart from a platform or plan limit. Per-end-user limits bucket on the X-Spanlens-User header, so send it (the SDK withUser() helper does this) for those limits to apply.
Self-hosting
If you're running Spanlens on your own infra, replace the base URL:
https://your-spanlens-domain.com/proxy/openai/v1See self-hosting for Docker deployment.
Next: self-hosting with Docker.