Skip to content
Get started
How-to Guides

Autonomous Research

Run multi-source research from a single API call. /research handles source selection, synthesis, and citations. No orchestration code required.

Run multi-source research from a single API call. /research handles source selection, synthesis, and citations. No orchestration code required.


import Tabstack from '@tabstack/sdk'
const client = new Tabstack()
const stream = await client.agent.research({
query: 'What are the main approaches to browser automation for AI agents?',
mode: 'fast',
})
for await (const event of stream) {
if (event.event === 'complete') {
console.log(event.data.report)
const cited = event.data.metadata.citedPages ?? []
console.log(`\nCited ${cited.length} sources:`)
for (const page of cited) {
console.log(`- ${page.title ?? '(untitled)'}: ${page.url}`)
}
}
if (event.event === 'error') {
throw new Error(event.data.error.message)
}
}

/research always streams via Server-Sent Events. Every call returns a stream; there is no non-streaming mode.

The SDK models the stream as a discriminated union: each event has an event field (a string literal) and a data payload whose shape depends on the event name. Switch on event.event and the SDK narrows event.data to the correct type automatically.

Every event’s data carries a message and a timestamp. The pink names below are the event names you switch on; the plain code after each is the extra fields that event adds to data. The stream follows a fixed lifecycle, beginning to end:

  1. start fires once, when the run begins.
  2. planning:start planning:end bracket the agent planning which searches to run.
  3. A search loop repeats, once per iteration:
    • iteration:start opens the iteration. Adds iteration, maxIterations, and the queries it will run.
    • searching:start searching:end bracket fetching and reading sources.
    • iteration:end closes the iteration. Adds isLast and an optional stopReason.
  4. writing:start writing:end bracket synthesizing the final report.
  5. complete fires once, at the end. Adds report and metadata, the complete payload.

At any point, error can arrive instead: a task-level failure carrying a nested error object plus an optional activity and iteration. It is delivered inside the stream, not as an HTTP error, so handle it explicitly. If you only listen for complete, a failed run produces no output. See Error handling.

Balanced mode threads a richer set of progress events through the same loop: prefetching:* analyzing:* following:* evaluating:* outlining:* judging:*. The API reference lists every variant.

Switch on event.event and the SDK narrows event.data for each case. A minimal progress reporter looks like this (setup as in Quickstart):

for await (const event of stream) {
switch (event.event) {
case 'start':
console.log(event.data.message)
break
case 'iteration:start':
console.log(`iteration ${event.data.iteration}/${event.data.maxIterations}`)
break
case 'complete':
console.log('\n' + event.data.report)
break
case 'error':
throw new Error(event.data.error.message)
}
}

Everything you need arrives on the single complete event. Here is the full data object, annotated. This is the canonical shape; the citations and worked-example sections below pull straight from it.

{
// report: the synthesized report, as a markdown string. Always present.
"report": "# Browser automation for AI agents\n\nThree families of tooling dominate...",
// metadata: always present. citedPages is the field you'll reach for.
"metadata": {
// citedPages (cited_pages in Python): every source cited in the report.
// Present when the agent cited sources; treat a missing value as [] (see Quickstart).
"citedPages": [
// one entry per cited source -- see "Working with citations" for a full entry
]
},
// message: human-readable status string. Always present.
"message": "Research complete",
// timestamp: ISO-8601 string for when the event was emitted. Always present.
"timestamp": "2026-06-02T17:04:11.482Z"
}

report, metadata, message, and timestamp are always present on complete. Inside metadata, citedPages is the only field this guide documents. The pipeline may attach more, but don’t depend on fields you can’t see here.


mode controls the depth-vs-speed tradeoff.

ModeSpeedSources consultedUse when
'fast'FasterFewerDefault. Time-sensitive queries where a quick answer is sufficient.
'balanced'More thoroughMoreHigh-stakes research where breadth matters. Requires a paid plan and emits additional progress events (prefetching:*, analyzing:*, following:*, evaluating:*, outlining:*, judging:*).

Default is 'fast'. Omitting mode produces the same result as setting mode: 'fast'.

import Tabstack from '@tabstack/sdk'
const client = new Tabstack()
// Quick answer for time-sensitive use cases (default mode)
const fastStream = await client.agent.research({
query: 'What are the current funding rounds in AI infrastructure?',
mode: 'fast',
})
for await (const event of fastStream) {
if (event.event === 'complete') {
console.log(event.data.report)
}
if (event.event === 'error') {
throw new Error(event.data.error.message)
}
}
// Thorough answer for high-stakes research
const balancedStream = await client.agent.research({
query: 'What are the main regulatory approaches to AI in the EU and US?',
mode: 'balanced',
})
// Balanced mode uses the same iteration pattern, plus emits the richer progress events listed above.

/research runs an agentic loop: it plans, searches, reads sources, and synthesizes, iterating until it has enough to answer. Wall-clock time scales with how many iterations it runs and how many sources it consults, not with the size of the report it returns. A broad question that fans out across many sources takes longer than a narrow one, even when both produce a similar-length report.

As a rough guide:

ModeTypical queryNotes
fastUnder 60 secondsDefault.
balancedUp to ~4 minutes for the broadest queriesConsults more sources and emits the richer progress events.

There is no server-side timeout on the request as a whole — the agent runs the loop to completion rather than stopping at a fixed ceiling. A broad balanced query can legitimately stream for minutes. Budget for this on the client (see Client-side timeout strategy) rather than assuming the server will cut it off.

fetch_timeout (in the parameters table) caps a single per-page fetch, not the whole call. It limits how long the agent waits on one slow source before giving up on it and moving on; it does not cap total research time. Raise it when your sources are slow or heavy (large pages, sluggish origins) and you would rather wait than drop them. Leave it at the default for general queries.

How long a single fetch needs depends on how the source is pulled. A plain markdown or extract fetch usually resolves in 10 seconds or less, so a low fetch_timeout is fine for most queries. A heavy JSON extraction over a large schema can take far longer — up to the server-side fetch ceiling of 10 minutes. A reasonable starting point is 10 seconds; raise it toward that ceiling only when you know your sources are slow or your per-page extraction is expensive.

Because the call streams, “time to first event” and “time to complete” are different numbers. The first event (start) arrives quickly; complete arrives only after the whole loop finishes. A fixed total-elapsed timeout treats a healthy long-running query the same as a stalled one, and you will cut off good research to catch the occasional bad run.

Watch for stream silence instead. Reset a timer on every event and fail only when no event has arrived for some interval. That catches a genuinely stuck stream while letting a legitimately long run proceed. The iteration and phase events (iteration:start, searching:start, writing:start) are your heartbeat.


The complete event’s data.metadata.citedPages (TypeScript) / data.metadata.cited_pages (Python) lists every source the agent actually cited in its report. Each entry has guaranteed id, url, claims (the specific statements drawn from that page), and sourceQueries / source_queries (the search queries that surfaced it). Fields like title, summary, relevance, and reliability are optional — present when the research pipeline populates them.

Here is a single citedPages entry from that array, with the guaranteed fields populated and the optional split made visible in the data:

{
"id": "pg_a1b2c3",
"url": "https://example.com/guides/browser-automation",
"claims": [
"Playwright and Puppeteer drive a real browser over the Chrome DevTools Protocol.",
"CDP-based tools historically struggled with cross-browser support."
],
"sourceQueries": [
"browser automation approaches for AI agents",
"playwright vs puppeteer cross-browser"
],
"title": "Approaches to Browser Automation",
"relevance": 0.92
// summary and reliability are optional; this source did not populate them, so they are absent
}

id, url, claims, and sourceQueries are always present. title and relevance are optional and shown here; summary and reliability are equally optional and absent for this source. In Python the same entry reads source_queries (and the array is cited_pages).

import Tabstack from '@tabstack/sdk'
const client = new Tabstack()
async function research(query: string) {
const stream = await client.agent.research({ query, mode: 'fast' })
for await (const event of stream) {
if (event.event === 'error') {
throw new Error(event.data.error.message)
}
if (event.event === 'complete') {
return {
report: event.data.report,
sources: event.data.metadata.citedPages ?? [],
}
}
}
throw new Error('Stream ended without a complete event')
}
const result = await research('What are the main approaches to browser automation for AI agents?')
console.log(result.report)
console.log(`\nCited ${result.sources.length} sources:`)
result.sources.forEach((s, i) => console.log(`${i + 1}. ${s.title ?? '(untitled)'}\n ${s.url}`))

One query, end to end: the call, the report it produces, and the citations that back it. The report is abridged, and the citedPages entries are the same shape documented above.

const stream = await client.agent.research({
query: 'What are the main approaches to browser automation for AI agents?',
mode: 'fast',
})
for await (const event of stream) {
if (event.event === 'complete') {
console.log(event.data.report)
console.log(event.data.metadata.citedPages)
}
if (event.event === 'error') {
throw new Error(event.data.error.message)
}
}

The complete payload that comes back, with the report abridged:

{
"report": "# Browser automation for AI agents\n\nThree families of tooling dominate. CDP-based drivers like Playwright and Puppeteer drive a real browser over the Chrome DevTools Protocol. WebDriver-based tools like Selenium use the W3C WebDriver standard for broader cross-browser support.\n\n[... report continues ...]",
"metadata": {
"citedPages": [
{
"id": "pg_a1b2c3",
"url": "https://example.com/guides/browser-automation",
"claims": [
"Playwright and Puppeteer drive a real browser over the Chrome DevTools Protocol."
],
"sourceQueries": ["browser automation approaches for AI agents"],
"title": "Approaches to Browser Automation",
"relevance": 0.92
},
{
"id": "pg_d4e5f6",
"url": "https://example.com/selenium-webdriver",
"claims": [
"Selenium uses the W3C WebDriver standard for broader cross-browser support."
],
"sourceQueries": ["selenium webdriver cross-browser support"],
"title": "WebDriver Explained"
}
]
},
"message": "Research complete",
"timestamp": "2026-06-02T17:04:11.482Z"
}

The link between report and citations runs through each entry’s claims: those are the exact statements the agent drew from that source, and you’ll find them in the report text. Match a sentence in the report against the claims arrays to trace it back to its origin. Note the second entry omits relevance (and every entry here omits summary and reliability) — those are optional and simply weren’t populated for those sources.


Competitive intelligence

Research a competitor’s current pricing and limits without manually visiting their documentation:

import Tabstack from '@tabstack/sdk'
const client = new Tabstack()
async function getPricingIntel(competitor: string) {
const stream = await client.agent.research({
query: `What are ${competitor}'s current pricing plans, rate limits, and free tier details?`,
mode: 'fast',
nocache: true, // pricing changes frequently; skip cache
})
for await (const event of stream) {
if (event.event === 'error') {
throw new Error(event.data.error.message)
}
if (event.event === 'complete') {
return {
summary: event.data.report,
sources: event.data.metadata.citedPages ?? [],
retrievedAt: new Date().toISOString(),
}
}
}
throw new Error('No result returned')
}
Prospect research

Pull together recent activity on a company before an outreach or sales call:

import Tabstack from '@tabstack/sdk'
const client = new Tabstack()
async function getCompanyBriefing(company: string) {
const stream = await client.agent.research({
query: `What has ${company} announced or shipped in the last 90 days? Include funding, product launches, and hiring signals.`,
mode: 'fast',
})
for await (const event of stream) {
if (event.event === 'error') {
throw new Error(event.data.error.message)
}
if (event.event === 'complete') {
return {
briefing: event.data.report,
sources: event.data.metadata.citedPages ?? [],
}
}
}
throw new Error('No result returned')
}
Market landscape questions

Answer open-ended questions about a space where the answer spans many sources. This example also shows a simple progress indicator using the iteration events:

import Tabstack from '@tabstack/sdk'
const client = new Tabstack()
const stream = await client.agent.research({
query: 'What are the main approaches to browser automation for AI agents, and how do they differ?',
mode: 'fast',
})
for await (const event of stream) {
if (event.event === 'iteration:start') {
process.stdout.write(`\rIteration ${event.data.iteration}/${event.data.maxIterations}...`)
}
if (event.event === 'complete') {
console.log('\n\n' + event.data.report)
}
if (event.event === 'error') {
throw new Error(event.data.error.message)
}
}

SituationUse
You know the exact URL and want specific fields from itclient.extract.json()
You have a question that requires synthesizing multiple sourcesclient.agent.research()
You want clean markdown from one pageclient.extract.markdown()
You need to answer a question about a topic, not a specific pageclient.agent.research()
You want AI to transform content from a known URLclient.generate.json()

The key distinction: /research is for questions where you don’t know which sources hold the answer. /extract/json is for structured extraction when you already have the URL.


ParameterTypeDefaultDescription
querystringrequiredThe research question
mode'fast' | 'balanced''fast'Controls depth vs. speed. 'balanced' requires a paid plan.
nocachebooleanfalseForce fresh results, bypass cache
fetch_timeoutnumberTimeout in seconds for fetching individual web pages

Two failure modes to distinguish:

  • HTTP-level errors (bad API key, rate limit, permission denied) throw SDK exceptions before the stream opens. Catch them with try/except around the call.
  • Task-level failures arrive as error events inside the stream. event.data.error is an object with message, name, and optional stack; event.data.activity tells you which phase failed.

In rare cases the error event may arrive without a populated error field — defensively fall back when that happens. The example below uses optional chaining (TS) / getattr (Python) so an unpopulated error doesn’t crash the handler.

import Tabstack, { RateLimitError, AuthenticationError } from '@tabstack/sdk'
const client = new Tabstack()
try {
const stream = await client.agent.research({
query: 'What are the current pricing models for cloud browser APIs?',
mode: 'fast',
})
for await (const event of stream) {
if (event.event === 'error') {
// Task-level failure -- the agent could not complete the research.
// The `error` field is typed as required but can arrive unpopulated; fall back defensively.
const message = event.data.error?.message ?? 'unknown error'
throw new Error(
`Research failed during ${event.data.activity ?? 'unknown phase'}: ${message}`,
)
}
if (event.event === 'complete') {
console.log(event.data.report)
}
}
} catch (err) {
if (err instanceof RateLimitError) {
console.error('Rate limit hit -- retry after a pause')
} else if (err instanceof AuthenticationError) {
console.error('Invalid API key -- check TABSTACK_API_KEY')
} else {
throw err
}
}