Tutorial: trace a multi-step agent

Thirty minutes. We trace an agent that classifies a user message, dispatches to two parallel tools, and composes a final response. By the end you can answer which step ate the wall-clock and which call ate the cost, even when steps run concurrently.

What you will end up with

  • One trace per agent invocation in /traces.
  • Parallel branches stacked vertically with their real start / end times.
  • Critical path highlighted (the longest chain that gates the final answer).
  • Per-step cost rolled up to the trace total.

The agent we are tracing

A customer-support agent. Given a user message it:

  1. Classifies intent (LLM call)
  2. Based on intent, fans out two lookups in parallel:
    • Order lookup (Shopify API tool + summarizer LLM)
    • Knowledge base lookup (Pinecone retrieval + answerer LLM)
  3. Composes a final response from both branches (LLM call)

We will write this without LangChain to keep the example portable. The same pattern works for LangGraph; see LangGraph integration for the callback handler version.

Step 1. Set up

pnpm add @spanlens/sdk
bash
SPANLENS_API_KEY=sl_live_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
env

Register your OpenAI provider key in /projects as usual.

Step 2. Bootstrap the trace

One trace per agent invocation. All work hangs off this trace.

import { SpanlensClient, observe } from '@spanlens/sdk'
import { createOpenAI } from '@spanlens/sdk/openai'

const client = new SpanlensClient()
const openai = createOpenAI()

export async function handleSupportMessage(message: string, userId: string) {
  const trace = client.startTrace({
    name: 'support-agent',
    metadata: { user_id: userId },
  })

  try {
    // ... steps go here ...
  } finally {
    await trace.end()
  }
}
ts

Step 3. Classify the intent

Wrap the classification LLM call so it shows up as a child span of the trace. The proxy already emits an LLM span automatically; observe() here gives us a meaningful name (classify_intent) and groups the call under the trace explicitly.

type Intent = 'order_status' | 'general_question'

const intent = await observe(
  trace,
  { name: 'classify_intent', spanType: 'llm', input: { message } },
  async () => {
    const res = await openai.chat.completions.create({
      model: 'gpt-4o-mini',
      messages: [
        { role: 'system', content: 'Classify the user message. Reply with exactly: order_status OR general_question.' },
        { role: 'user', content: message },
      ],
    })
    return res.choices[0].message.content?.trim() as Intent
  },
)
ts

Step 4. Fan out two parallel branches

Parallel branches in Spanlens are first-class. Each branch is its own observe() call; Promise.all runs them concurrently, and the waterfall shows them stacked with their real start / end times.

const [orderInfo, kbAnswer] = await Promise.all([
  observe(trace, { name: 'lookup_order', spanType: 'custom' }, async (span) => {
    // child span: shopify tool call
    const orderRow = await observe(
      span,
      { name: 'shopify.query', spanType: 'tool', input: { userId } },
      async () => shopify.getRecentOrder(userId),
    )

    // child span: summarizer LLM
    const summary = await observe(
      span,
      { name: 'summarize_order', spanType: 'llm' },
      async () => {
        const res = await openai.chat.completions.create({
          model: 'gpt-4o-mini',
          messages: [
            { role: 'system', content: 'Summarize the order in one sentence.' },
            { role: 'user', content: JSON.stringify(orderRow) },
          ],
        })
        return res.choices[0].message.content
      },
    )

    return summary
  }),

  observe(trace, { name: 'lookup_kb', spanType: 'custom' }, async (span) => {
    // child span: pinecone retrieval
    const docs = await observe(
      span,
      { name: 'pinecone.query', spanType: 'retrieval', input: { topK: 5 } },
      async () => pinecone.index('kb').query({ vector: await embed(message), topK: 5 }),
    )

    // child span: answerer LLM
    const answer = await observe(
      span,
      { name: 'answer_from_kb', spanType: 'llm' },
      async () => {
        const res = await openai.chat.completions.create({
          model: 'gpt-4o-mini',
          messages: [
            { role: 'system', content: 'Answer using the context.' },
            { role: 'user', content: `Context: ${docs.matches.map(m => m.metadata?.text).join('\n\n')}\n\nQuestion: ${message}` },
          ],
        })
        return res.choices[0].message.content
      },
    )

    return answer
  }),
])
ts

Two important details:

  • The second argument to observe() is the parent. Passing trace creates a top-level span; passing span inside a callback creates a child of that span. This is how the tree is built.
  • Spanlens does not enforce a foreign key on parent_span_id. If one branch finishes before the other, the late branch's spans still attach correctly when they eventually close. You do not need to await branches in order.

Step 5. Compose the final response

const final = await observe(
  trace,
  { name: 'compose_final', spanType: 'llm' },
  async () => {
    const res = await openai.chat.completions.create({
      model: 'gpt-4o-mini',
      messages: [
        { role: 'system', content: 'Write a friendly reply combining the order info and the answer.' },
        { role: 'user', content: `Order info: ${orderInfo}\n\nAnswer: ${kbAnswer}` },
      ],
    })
    return res.choices[0].message.content
  },
)

return final
ts

Step 6. What you see in /traces

Trace: support-agent  (3.2s, $0.0061, 5 LLM calls + 1 tool + 1 retrieval)
├── classify_intent          (450ms, $0.0008)
├── lookup_order  (parallel) (1.1s,  $0.0011)
│   ├── shopify.query        (840ms)
│   └── summarize_order      (240ms, $0.0005)
├── lookup_kb     (parallel) (2.7s,  $0.0035)   ← critical path
│   ├── pinecone.query       (300ms, 5 docs)
│   └── answer_from_kb       (2.3s,  $0.0023)
└── compose_final            (640ms, $0.0007)
text

The critical path annotation tells you the wall-clock-blocking chain. Even though lookup_order took 1.1 s, it ran in parallel with the slower lookup_kb branch, so trimming order lookup buys you nothing. Optimizing answer_from_kb would.

Common variations

Conditional branches

If the intent is general_question, you might skip the order lookup entirely. Just wrap the conditional in plain TypeScript:

const branches = intent === 'order_status'
  ? [orderBranch(), kbBranch()]
  : [kbBranch()]

const results = await Promise.all(branches)
ts

Skipped branches simply have no spans. The trace is whatever you actually executed.

Tagging the trace with user / session for cross-trace analysis

const trace = client.startTrace({
  name: 'support-agent',
  metadata: {
    user_id: userId,
    session_id: sessionId,
    intent,        // populated after classification, set via trace.update(...) if you prefer post-hoc
  },
})
ts

Streaming the final response

Replace chat.completions.create with stream: true. The proxy buffers and logs the full response on stream end; the span closes when the stream finishes. First byte still arrives in ~200 ms.

What you skipped that you might want later

  • Per-step evals. Score the answer_from_kb step on helpfulness. See Nightly evals tutorial.
  • LangGraph version. If your agent grows past 5+ nodes, LangGraph + the callback handler is less ceremony than threading observe() everywhere. LangGraph integration covers it.
  • Error surfaces. observe() catches throws and sets span.status='error' automatically; failed traces show in red on the /traces list.

Next tutorial: scheduled evals on production prompts.