Autonomous Research

How-to Guides

Run multi-source research from a single API call. /research handles source selection, synthesis, and citations. No orchestration code required.

Run multi-source research from a single API call. /research handles source selection, synthesis, and citations. No orchestration code required.

import Tabstack from '@tabstack/sdk'

const client = new Tabstack()

const stream = await client.agent.research({
query: 'What are the main approaches to browser automation for AI agents?',
mode: 'fast',
})

for await (const event of stream) {
if (event.event === 'complete') {
console.log(event.data.report)

    const cited = event.data.metadata.citedPages ?? []
    console.log(`\nCited ${cited.length} sources:`)
    for (const page of cited) {
      console.log(`- ${page.title ?? '(untitled)'}: ${page.url}`)
    }

}

if (event.event === 'error') {
throw new Error(event.data.error.message)
}
}

from tabstack import Tabstack

client = Tabstack()

# Primary Python pattern: iterate the stream directly. The SDK handles
# SSE framing internally.
for event in client.agent.research(
    query="What are the main approaches to browser automation for AI agents?",
    mode="fast",
):
    if event.event == "complete":
        print(event.data.report)

        cited = event.data.metadata.cited_pages or []
        print(f"\nCited {len(cited)} sources:")
        for page in cited:
            print(f"- {page.title or '(untitled)'}: {page.url}")
    elif event.event == "error":
        raise RuntimeError(event.data.error.message)

Understanding the stream

/research always streams via Server-Sent Events. Every call returns a stream; there is no non-streaming mode.

The SDK models the stream as a discriminated union: each event has an event field (a string literal) and a data payload whose shape depends on the event name. Switch on event.event and the SDK narrows event.data to the correct type automatically.

Every event’s data carries a message and a timestamp. The pink names below are the event names you switch on; the plain code after each is the extra fields that event adds to data. The stream follows a fixed lifecycle, beginning to end:

start fires once, when the run begins.
planning:start planning:end bracket the agent planning which searches to run.
A search loop repeats, once per iteration:
- iteration:start opens the iteration. Adds iteration, maxIterations, and the queries it will run.
- searching:start searching:end bracket fetching and reading sources.
- iteration:end closes the iteration. Adds isLast and an optional stopReason.
writing:start writing:end bracket synthesizing the final report.
complete fires once, at the end. Adds report and metadata, the complete payload.

At any point, error can arrive instead: a task-level failure carrying a nested error object plus an optional activity and iteration. It is delivered inside the stream, not as an HTTP error, so handle it explicitly. If you only listen for complete, a failed run produces no output. See Error handling.

Balanced mode threads a richer set of progress events through the same loop: prefetching:* analyzing:* following:* evaluating:* outlining:* judging:*. The API reference lists every variant.

Handling events

Switch on event.event and the SDK narrows event.data for each case. A minimal progress reporter looks like this (setup as in Quickstart):

TypeScript
Python

for await (const event of stream) {
  switch (event.event) {
    case 'start':
      console.log(event.data.message)
      break
    case 'iteration:start':
      console.log(`iteration ${event.data.iteration}/${event.data.maxIterations}`)
      break
    case 'complete':
      console.log('\n' + event.data.report)
      break
    case 'error':
      throw new Error(event.data.error.message)
  }
}

for event in stream:
    match event.event:
        case "start":
            print(event.data.message)
        case "iteration:start":
            print(f"iteration {event.data.iteration}/{event.data.max_iterations}")
        case "complete":
            print("\n" + event.data.report)
        case "error":
            raise RuntimeError(event.data.error.message)

The complete payload

Everything you need arrives on the single complete event. Here is the full data object, annotated. This is the canonical shape; the citations and worked-example sections below pull straight from it.

{
  // report: the synthesized report, as a markdown string. Always present.
  "report": "# Browser automation for AI agents\n\nThree families of tooling dominate...",

  // metadata: always present. citedPages is the field you'll reach for.
  "metadata": {
    // citedPages (cited_pages in Python): every source cited in the report.
    // Present when the agent cited sources; treat a missing value as [] (see Quickstart).
    "citedPages": [
      // one entry per cited source -- see "Working with citations" for a full entry
    ]
  },

  // message: human-readable status string. Always present.
  "message": "Research complete",

  // timestamp: ISO-8601 string for when the event was emitted. Always present.
  "timestamp": "2026-06-02T17:04:11.482Z"
}

report, metadata, message, and timestamp are always present on complete. Inside metadata, citedPages is the only field this guide documents. The pipeline may attach more, but don’t depend on fields you can’t see here.

The `mode` parameter

mode controls the depth-vs-speed tradeoff.

Mode	Speed	Sources consulted	Use when
`'fast'`	Faster	Fewer	Default. Time-sensitive queries where a quick answer is sufficient.
`'balanced'`	More thorough	More	High-stakes research where breadth matters. Requires a paid plan and emits additional progress events (`prefetching:`, `analyzing:`, `following:`, `evaluating:`, `outlining:`, `judging:`).

Default is 'fast'. Omitting mode produces the same result as setting mode: 'fast'.

TypeScript
Python

import Tabstack from '@tabstack/sdk'

const client = new Tabstack()

// Quick answer for time-sensitive use cases (default mode)
const fastStream = await client.agent.research({
query: 'What are the current funding rounds in AI infrastructure?',
mode: 'fast',
})

for await (const event of fastStream) {
if (event.event === 'complete') {
console.log(event.data.report)
}
if (event.event === 'error') {
throw new Error(event.data.error.message)
}
}

// Thorough answer for high-stakes research
const balancedStream = await client.agent.research({
query: 'What are the main regulatory approaches to AI in the EU and US?',
mode: 'balanced',
})
// Balanced mode uses the same iteration pattern, plus emits the richer progress events listed above.

from tabstack import Tabstack

client = Tabstack()

# Quick answer for time-sensitive use cases (default mode)
fast_stream = client.agent.research(
    query="What are the current funding rounds in AI infrastructure?",
    mode="fast",
)

for event in fast_stream:
    if event.event == "complete":
        print(event.data.report)
    elif event.event == "error":
        raise RuntimeError(event.data.error.message)

# Thorough answer for high-stakes research
balanced_stream = client.agent.research(
    query="What are the main regulatory approaches to AI in the EU and US?",
    mode="balanced",
)
# Balanced mode uses the same iteration pattern, plus emits the richer progress events listed above.

Latency and timeouts

/research runs an agentic loop: it plans, searches, reads sources, and synthesizes, iterating until it has enough to answer. Wall-clock time scales with how many iterations it runs and how many sources it consults, not with the size of the report it returns. A broad question that fans out across many sources takes longer than a narrow one, even when both produce a similar-length report.

As a rough guide:

Mode	Typical query	Notes
`fast`	Under 60 seconds	Default.
`balanced`	Up to ~4 minutes for the broadest queries	Consults more sources and emits the richer progress events.

There is no server-side timeout on the request as a whole — the agent runs the loop to completion rather than stopping at a fixed ceiling. A broad balanced query can legitimately stream for minutes. Budget for this on the client (see Client-side timeout strategy) rather than assuming the server will cut it off.

What `fetch_timeout` bounds

fetch_timeout (in the parameters table) caps a single per-page fetch, not the whole call. It limits how long the agent waits on one slow source before giving up on it and moving on; it does not cap total research time. Raise it when your sources are slow or heavy (large pages, sluggish origins) and you would rather wait than drop them. Leave it at the default for general queries.

How long a single fetch needs depends on how the source is pulled. A plain markdown or extract fetch usually resolves in 10 seconds or less, so a low fetch_timeout is fine for most queries. A heavy JSON extraction over a large schema can take far longer — up to the server-side fetch ceiling of 10 minutes. A reasonable starting point is 10 seconds; raise it toward that ceiling only when you know your sources are slow or your per-page extraction is expensive.

Client-side timeout strategy

Because the call streams, “time to first event” and “time to complete” are different numbers. The first event (start) arrives quickly; complete arrives only after the whole loop finishes. A fixed total-elapsed timeout treats a healthy long-running query the same as a stalled one, and you will cut off good research to catch the occasional bad run.

Watch for stream silence instead. Reset a timer on every event and fail only when no event has arrived for some interval. That catches a genuinely stuck stream while letting a legitimately long run proceed. The iteration and phase events (iteration:start, searching:start, writing:start) are your heartbeat.

Working with citations

The complete event’s data.metadata.citedPages (TypeScript) / data.metadata.cited_pages (Python) lists every source the agent actually cited in its report. Each entry has guaranteed id, url, claims (the specific statements drawn from that page), and sourceQueries / source_queries (the search queries that surfaced it). Fields like title, summary, relevance, and reliability are optional — present when the research pipeline populates them.

Here is a single citedPages entry from that array, with the guaranteed fields populated and the optional split made visible in the data:

{
  "id": "pg_a1b2c3",
  "url": "https://example.com/guides/browser-automation",
  "claims": [
    "Playwright and Puppeteer drive a real browser over the Chrome DevTools Protocol.",
    "CDP-based tools historically struggled with cross-browser support."
  ],
  "sourceQueries": [
    "browser automation approaches for AI agents",
    "playwright vs puppeteer cross-browser"
  ],
  "title": "Approaches to Browser Automation",
  "relevance": 0.92
  // summary and reliability are optional; this source did not populate them, so they are absent
}

id, url, claims, and sourceQueries are always present. title and relevance are optional and shown here; summary and reliability are equally optional and absent for this source. In Python the same entry reads source_queries (and the array is cited_pages).

TypeScript
Python

import Tabstack from '@tabstack/sdk'

const client = new Tabstack()

async function research(query: string) {
const stream = await client.agent.research({ query, mode: 'fast' })

for await (const event of stream) {
if (event.event === 'error') {
throw new Error(event.data.error.message)
}

    if (event.event === 'complete') {
      return {
        report: event.data.report,
        sources: event.data.metadata.citedPages ?? [],
      }
    }

}

throw new Error('Stream ended without a complete event')
}

const result = await research('What are the main approaches to browser automation for AI agents?')

console.log(result.report)
console.log(`\nCited ${result.sources.length} sources:`)
result.sources.forEach((s, i) => console.log(`${i + 1}. ${s.title ?? '(untitled)'}\n   ${s.url}`))

from tabstack import Tabstack

client = Tabstack()

def research(query: str):
    for event in client.agent.research(query=query, mode="fast"):
        if event.event == "error":
            raise RuntimeError(event.data.error.message)

        if event.event == "complete":
            return {
                "report": event.data.report,
                "sources": event.data.metadata.cited_pages or [],
            }

    raise RuntimeError("Stream ended without a complete event")


result = research("What are the main approaches to browser automation for AI agents?")

print(result["report"])
print(f"\nCited {len(result['sources'])} sources:")
for i, s in enumerate(result["sources"], 1):
    print(f"{i}. {s.title or '(untitled)'}\n   {s.url}")

A worked example

One query, end to end: the call, the report it produces, and the citations that back it. The report is abridged, and the citedPages entries are the same shape documented above.

TypeScript
Python

const stream = await client.agent.research({
  query: 'What are the main approaches to browser automation for AI agents?',
  mode: 'fast',
})

for await (const event of stream) {
if (event.event === 'complete') {
console.log(event.data.report)
console.log(event.data.metadata.citedPages)
}
if (event.event === 'error') {
throw new Error(event.data.error.message)
}
}

for event in client.agent.research(
    query="What are the main approaches to browser automation for AI agents?",
    mode="fast",
):
    if event.event == "complete":
        print(event.data.report)
        print(event.data.metadata.cited_pages)
    elif event.event == "error":
        raise RuntimeError(event.data.error.message)

The complete payload that comes back, with the report abridged:

{
  "report": "# Browser automation for AI agents\n\nThree families of tooling dominate. CDP-based drivers like Playwright and Puppeteer drive a real browser over the Chrome DevTools Protocol. WebDriver-based tools like Selenium use the W3C WebDriver standard for broader cross-browser support.\n\n[... report continues ...]",
  "metadata": {
    "citedPages": [
      {
        "id": "pg_a1b2c3",
        "url": "https://example.com/guides/browser-automation",
        "claims": [
          "Playwright and Puppeteer drive a real browser over the Chrome DevTools Protocol."
        ],
        "sourceQueries": ["browser automation approaches for AI agents"],
        "title": "Approaches to Browser Automation",
        "relevance": 0.92
      },
      {
        "id": "pg_d4e5f6",
        "url": "https://example.com/selenium-webdriver",
        "claims": [
          "Selenium uses the W3C WebDriver standard for broader cross-browser support."
        ],
        "sourceQueries": ["selenium webdriver cross-browser support"],
        "title": "WebDriver Explained"
      }
    ]
  },
  "message": "Research complete",
  "timestamp": "2026-06-02T17:04:11.482Z"
}

The link between report and citations runs through each entry’s claims: those are the exact statements the agent drew from that source, and you’ll find them in the report text. Match a sentence in the report against the claims arrays to trace it back to its origin. Note the second entry omits relevance (and every entry here omits summary and reliability) — those are optional and simply weren’t populated for those sources.

Use cases

Competitive intelligence

Research a competitor’s current pricing and limits without manually visiting their documentation:

TypeScript
Python

import Tabstack from '@tabstack/sdk'

const client = new Tabstack()

async function getPricingIntel(competitor: string) {
const stream = await client.agent.research({
query: `What are ${competitor}'s current pricing plans, rate limits, and free tier details?`,
mode: 'fast',
nocache: true, // pricing changes frequently; skip cache
})

for await (const event of stream) {
if (event.event === 'error') {
throw new Error(event.data.error.message)
}

    if (event.event === 'complete') {
      return {
        summary: event.data.report,
        sources: event.data.metadata.citedPages ?? [],
        retrievedAt: new Date().toISOString(),
      }
    }

}

throw new Error('No result returned')
}

from datetime import datetime, timezone
from tabstack import Tabstack

client = Tabstack()

def get_pricing_intel(competitor: str):
    for event in client.agent.research(
        query=f"What are {competitor}'s current pricing plans, rate limits, and free tier details?",
        mode="fast",
        nocache=True,  # pricing changes frequently; skip cache
    ):
        if event.event == "error":
            raise RuntimeError(event.data.error.message)

        if event.event == "complete":
            return {
                "summary": event.data.report,
                "sources": event.data.metadata.cited_pages or [],
                "retrieved_at": datetime.now(timezone.utc).isoformat(),
            }

    raise RuntimeError("No result returned")

Prospect research

Pull together recent activity on a company before an outreach or sales call:

TypeScript
Python

import Tabstack from '@tabstack/sdk'

const client = new Tabstack()

async function getCompanyBriefing(company: string) {
const stream = await client.agent.research({
query: `What has ${company} announced or shipped in the last 90 days? Include funding, product launches, and hiring signals.`,
mode: 'fast',
})

for await (const event of stream) {
if (event.event === 'error') {
throw new Error(event.data.error.message)
}

    if (event.event === 'complete') {
      return {
        briefing: event.data.report,
        sources: event.data.metadata.citedPages ?? [],
      }
    }

}

throw new Error('No result returned')
}

from tabstack import Tabstack

client = Tabstack()

def get_company_briefing(company: str):
    for event in client.agent.research(
        query=f"What has {company} announced or shipped in the last 90 days? Include funding, product launches, and hiring signals.",
        mode="fast",
    ):
        if event.event == "error":
            raise RuntimeError(event.data.error.message)

        if event.event == "complete":
            return {
                "briefing": event.data.report,
                "sources": event.data.metadata.cited_pages or [],
            }

    raise RuntimeError("No result returned")

Market landscape questions

Answer open-ended questions about a space where the answer spans many sources. This example also shows a simple progress indicator using the iteration events:

TypeScript
Python

import Tabstack from '@tabstack/sdk'

const client = new Tabstack()

const stream = await client.agent.research({
query: 'What are the main approaches to browser automation for AI agents, and how do they differ?',
mode: 'fast',
})

for await (const event of stream) {
if (event.event === 'iteration:start') {
process.stdout.write(`\rIteration ${event.data.iteration}/${event.data.maxIterations}...`)
}

if (event.event === 'complete') {
console.log('\n\n' + event.data.report)
}

if (event.event === 'error') {
throw new Error(event.data.error.message)
}
}

from tabstack import Tabstack

client = Tabstack()

for event in client.agent.research(
    query="What are the main approaches to browser automation for AI agents, and how do they differ?",
    mode="fast",
):
    if event.event == "iteration:start":
        print(
            f"\rIteration {event.data.iteration}/{event.data.max_iterations}...",
            end="",
            flush=True,
        )
    elif event.event == "complete":
        print("\n\n" + event.data.report)
    elif event.event == "error":
        raise RuntimeError(event.data.error.message)

When to use `/research` vs `/extract/json`

Situation	Use
You know the exact URL and want specific fields from it	`client.extract.json()`
You have a question that requires synthesizing multiple sources	`client.agent.research()`
You want clean markdown from one page	`client.extract.markdown()`
You need to answer a question about a topic, not a specific page	`client.agent.research()`
You want AI to transform content from a known URL	`client.generate.json()`

The key distinction: /research is for questions where you don’t know which sources hold the answer. /extract/json is for structured extraction when you already have the URL.

Parameters

Parameter	Type	Default	Description
`query`	`string`	required	The research question
`mode`	`'fast' \| 'balanced'`	`'fast'`	Controls depth vs. speed. `'balanced'` requires a paid plan.
`nocache`	`boolean`	`false`	Force fresh results, bypass cache
`fetch_timeout`	`number`	—	Timeout in seconds for fetching individual web pages

Error handling

Two failure modes to distinguish:

HTTP-level errors (bad API key, rate limit, permission denied) throw SDK exceptions before the stream opens. Catch them with try/except around the call.
Task-level failures arrive as error events inside the stream. event.data.error is an object with message, name, and optional stack; event.data.activity tells you which phase failed.

In rare cases the error event may arrive without a populated error field — defensively fall back when that happens. The example below uses optional chaining (TS) / getattr (Python) so an unpopulated error doesn’t crash the handler.

TypeScript
Python

import Tabstack, { RateLimitError, AuthenticationError } from '@tabstack/sdk'

const client = new Tabstack()

try {
const stream = await client.agent.research({
query: 'What are the current pricing models for cloud browser APIs?',
mode: 'fast',
})

for await (const event of stream) {
if (event.event === 'error') {
// Task-level failure -- the agent could not complete the research.
// The `error` field is typed as required but can arrive unpopulated; fall back defensively.
const message = event.data.error?.message ?? 'unknown error'
throw new Error(
`Research failed during ${event.data.activity ?? 'unknown phase'}: ${message}`,
)
}

    if (event.event === 'complete') {
      console.log(event.data.report)
    }

}
} catch (err) {
if (err instanceof RateLimitError) {
console.error('Rate limit hit -- retry after a pause')
} else if (err instanceof AuthenticationError) {
console.error('Invalid API key -- check TABSTACK_API_KEY')
} else {
throw err
}
}

from tabstack import Tabstack
from tabstack import RateLimitError, AuthenticationError

client = Tabstack()

try:
    for event in client.agent.research(
        query="What are the current pricing models for cloud browser APIs?",
        mode="fast",
    ):
        if event.event == "error":
            # Task-level failure -- the agent could not complete the research.
            # The `error` field is typed as required but can arrive as None; tolerate it.
            activity = event.data.activity or "unknown phase"
            message = getattr(event.data.error, "message", None) or "unknown error"
            raise RuntimeError(f"Research failed during {activity}: {message}")

        if event.event == "complete":
            print(event.data.report)

except RateLimitError:
    print("Rate limit hit -- retry after a pause")
except AuthenticationError:
    print("Invalid API key -- check TABSTACK_API_KEY")

Next steps

Extract JSON

Pull structured fields from a known URL when you already know the source.

Automate Events

The full streaming event model behind agentic endpoints.

API Reference

Every /research event variant and payload field.