Extract JSON
Pull structured fields from a known URL when you already know the source.
Run multi-source research from a single API call. /research handles source selection, synthesis, and citations. No orchestration code required.
Run multi-source research from a single API call. /research handles source selection, synthesis, and citations. No orchestration code required.
import Tabstack from '@tabstack/sdk'
const client = new Tabstack()
const stream = await client.agent.research({query: 'What are the main approaches to browser automation for AI agents?',mode: 'fast',})
for await (const event of stream) {if (event.event === 'complete') {console.log(event.data.report)
const cited = event.data.metadata.citedPages ?? [] console.log(`\nCited ${cited.length} sources:`) for (const page of cited) { console.log(`- ${page.title ?? '(untitled)'}: ${page.url}`) }
}
if (event.event === 'error') {throw new Error(event.data.error.message)}}from tabstack import Tabstack
client = Tabstack()
# Primary Python pattern: iterate the stream directly. The SDK handles# SSE framing internally.for event in client.agent.research( query="What are the main approaches to browser automation for AI agents?", mode="fast",): if event.event == "complete": print(event.data.report)
cited = event.data.metadata.cited_pages or [] print(f"\nCited {len(cited)} sources:") for page in cited: print(f"- {page.title or '(untitled)'}: {page.url}") elif event.event == "error": raise RuntimeError(event.data.error.message)/research always streams via Server-Sent Events. Every call returns a stream; there is no non-streaming mode.
The SDK models the stream as a discriminated union: each event has an event field (a string literal) and a data payload whose shape depends on the event name. Switch on event.event and the SDK narrows event.data to the correct type automatically.
Every event’s data carries a message and a timestamp. The pink names below are the event names you switch on; the plain code after each is the extra fields that event adds to data. The stream follows a fixed lifecycle, beginning to end:
start fires once, when the run begins.planning:start
planning:end bracket the agent planning which
searches to run.iteration:start opens the iteration. Adds
iteration, maxIterations, and the queries it will run.searching:start
searching:end bracket fetching and reading
sources.iteration:end closes the iteration. Adds
isLast and an optional stopReason.writing:start
writing:end bracket synthesizing the final report.complete fires once, at the end. Adds report and
metadata, the complete payload.At any point, error can arrive instead: a task-level failure carrying a nested error object plus an optional activity and iteration. It is delivered inside the stream, not as an HTTP error, so handle it explicitly. If you only listen for complete, a failed run produces no output. See Error handling.
Balanced mode threads a richer set of progress events through the same loop: prefetching:* analyzing:* following:* evaluating:* outlining:* judging:*. The API reference lists every variant.
Switch on event.event and the SDK narrows event.data for each case. A minimal progress reporter looks like this (setup as in Quickstart):
for await (const event of stream) { switch (event.event) { case 'start': console.log(event.data.message) break case 'iteration:start': console.log(`iteration ${event.data.iteration}/${event.data.maxIterations}`) break case 'complete': console.log('\n' + event.data.report) break case 'error': throw new Error(event.data.error.message) }}for event in stream: match event.event: case "start": print(event.data.message) case "iteration:start": print(f"iteration {event.data.iteration}/{event.data.max_iterations}") case "complete": print("\n" + event.data.report) case "error": raise RuntimeError(event.data.error.message)Everything you need arrives on the single complete event. Here is the full data object, annotated. This is the canonical shape; the citations and worked-example sections below pull straight from it.
{ // report: the synthesized report, as a markdown string. Always present. "report": "# Browser automation for AI agents\n\nThree families of tooling dominate...",
// metadata: always present. citedPages is the field you'll reach for. "metadata": { // citedPages (cited_pages in Python): every source cited in the report. // Present when the agent cited sources; treat a missing value as [] (see Quickstart). "citedPages": [ // one entry per cited source -- see "Working with citations" for a full entry ] },
// message: human-readable status string. Always present. "message": "Research complete",
// timestamp: ISO-8601 string for when the event was emitted. Always present. "timestamp": "2026-06-02T17:04:11.482Z"}report, metadata, message, and timestamp are always present on complete. Inside metadata, citedPages is the only field this guide documents. The pipeline may attach more, but don’t depend on fields you can’t see here.
mode parametermode controls the depth-vs-speed tradeoff.
| Mode | Speed | Sources consulted | Use when |
|---|---|---|---|
'fast' | Faster | Fewer | Default. Time-sensitive queries where a quick answer is sufficient. |
'balanced' | More thorough | More | High-stakes research where breadth matters. Requires a paid plan and emits additional progress events (prefetching:*, analyzing:*, following:*, evaluating:*, outlining:*, judging:*). |
Default is 'fast'. Omitting mode produces the same result as setting mode: 'fast'.
import Tabstack from '@tabstack/sdk'
const client = new Tabstack()
// Quick answer for time-sensitive use cases (default mode)const fastStream = await client.agent.research({query: 'What are the current funding rounds in AI infrastructure?',mode: 'fast',})
for await (const event of fastStream) {if (event.event === 'complete') {console.log(event.data.report)}if (event.event === 'error') {throw new Error(event.data.error.message)}}
// Thorough answer for high-stakes researchconst balancedStream = await client.agent.research({query: 'What are the main regulatory approaches to AI in the EU and US?',mode: 'balanced',})// Balanced mode uses the same iteration pattern, plus emits the richer progress events listed above.from tabstack import Tabstack
client = Tabstack()
# Quick answer for time-sensitive use cases (default mode)fast_stream = client.agent.research( query="What are the current funding rounds in AI infrastructure?", mode="fast",)
for event in fast_stream: if event.event == "complete": print(event.data.report) elif event.event == "error": raise RuntimeError(event.data.error.message)
# Thorough answer for high-stakes researchbalanced_stream = client.agent.research( query="What are the main regulatory approaches to AI in the EU and US?", mode="balanced",)# Balanced mode uses the same iteration pattern, plus emits the richer progress events listed above./research runs an agentic loop: it plans, searches, reads sources, and synthesizes, iterating until it has enough to answer. Wall-clock time scales with how many iterations it runs and how many sources it consults, not with the size of the report it returns. A broad question that fans out across many sources takes longer than a narrow one, even when both produce a similar-length report.
As a rough guide:
| Mode | Typical query | Notes |
|---|---|---|
fast | Under 60 seconds | Default. |
balanced | Up to ~4 minutes for the broadest queries | Consults more sources and emits the richer progress events. |
There is no server-side timeout on the request as a whole — the agent runs the loop to completion rather than stopping at a fixed ceiling. A broad balanced query can legitimately stream for minutes. Budget for this on the client (see Client-side timeout strategy) rather than assuming the server will cut it off.
fetch_timeout boundsfetch_timeout (in the parameters table) caps a single per-page fetch, not the whole call. It limits how long the agent waits on one slow source before giving up on it and moving on; it does not cap total research time. Raise it when your sources are slow or heavy (large pages, sluggish origins) and you would rather wait than drop them. Leave it at the default for general queries.
How long a single fetch needs depends on how the source is pulled. A plain markdown or extract fetch usually resolves in 10 seconds or less, so a low fetch_timeout is fine for most queries. A heavy JSON extraction over a large schema can take far longer — up to the server-side fetch ceiling of 10 minutes. A reasonable starting point is 10 seconds; raise it toward that ceiling only when you know your sources are slow or your per-page extraction is expensive.
Because the call streams, “time to first event” and “time to complete” are different numbers. The first event (start) arrives quickly; complete arrives only after the whole loop finishes. A fixed total-elapsed timeout treats a healthy long-running query the same as a stalled one, and you will cut off good research to catch the occasional bad run.
Watch for stream silence instead. Reset a timer on every event and fail only when no event has arrived for some interval. That catches a genuinely stuck stream while letting a legitimately long run proceed. The iteration and phase events (iteration:start, searching:start, writing:start) are your heartbeat.
The complete event’s data.metadata.citedPages (TypeScript) / data.metadata.cited_pages (Python) lists every source the agent actually cited in its report. Each entry has guaranteed id, url, claims (the specific statements drawn from that page), and sourceQueries / source_queries (the search queries that surfaced it). Fields like title, summary, relevance, and reliability are optional — present when the research pipeline populates them.
Here is a single citedPages entry from that array, with the guaranteed fields populated and the optional split made visible in the data:
{ "id": "pg_a1b2c3", "url": "https://example.com/guides/browser-automation", "claims": [ "Playwright and Puppeteer drive a real browser over the Chrome DevTools Protocol.", "CDP-based tools historically struggled with cross-browser support." ], "sourceQueries": [ "browser automation approaches for AI agents", "playwright vs puppeteer cross-browser" ], "title": "Approaches to Browser Automation", "relevance": 0.92 // summary and reliability are optional; this source did not populate them, so they are absent}id, url, claims, and sourceQueries are always present. title and relevance are optional and shown here; summary and reliability are equally optional and absent for this source. In Python the same entry reads source_queries (and the array is cited_pages).
import Tabstack from '@tabstack/sdk'
const client = new Tabstack()
async function research(query: string) {const stream = await client.agent.research({ query, mode: 'fast' })
for await (const event of stream) {if (event.event === 'error') {throw new Error(event.data.error.message)}
if (event.event === 'complete') { return { report: event.data.report, sources: event.data.metadata.citedPages ?? [], } }
}
throw new Error('Stream ended without a complete event')}
const result = await research('What are the main approaches to browser automation for AI agents?')
console.log(result.report)console.log(`\nCited ${result.sources.length} sources:`)result.sources.forEach((s, i) => console.log(`${i + 1}. ${s.title ?? '(untitled)'}\n ${s.url}`))from tabstack import Tabstack
client = Tabstack()
def research(query: str): for event in client.agent.research(query=query, mode="fast"): if event.event == "error": raise RuntimeError(event.data.error.message)
if event.event == "complete": return { "report": event.data.report, "sources": event.data.metadata.cited_pages or [], }
raise RuntimeError("Stream ended without a complete event")
result = research("What are the main approaches to browser automation for AI agents?")
print(result["report"])print(f"\nCited {len(result['sources'])} sources:")for i, s in enumerate(result["sources"], 1): print(f"{i}. {s.title or '(untitled)'}\n {s.url}")One query, end to end: the call, the report it produces, and the citations that back it. The report is abridged, and the citedPages entries are the same shape documented above.
const stream = await client.agent.research({ query: 'What are the main approaches to browser automation for AI agents?', mode: 'fast',})
for await (const event of stream) {if (event.event === 'complete') {console.log(event.data.report)console.log(event.data.metadata.citedPages)}if (event.event === 'error') {throw new Error(event.data.error.message)}}for event in client.agent.research( query="What are the main approaches to browser automation for AI agents?", mode="fast",): if event.event == "complete": print(event.data.report) print(event.data.metadata.cited_pages) elif event.event == "error": raise RuntimeError(event.data.error.message)The complete payload that comes back, with the report abridged:
{ "report": "# Browser automation for AI agents\n\nThree families of tooling dominate. CDP-based drivers like Playwright and Puppeteer drive a real browser over the Chrome DevTools Protocol. WebDriver-based tools like Selenium use the W3C WebDriver standard for broader cross-browser support.\n\n[... report continues ...]", "metadata": { "citedPages": [ { "id": "pg_a1b2c3", "url": "https://example.com/guides/browser-automation", "claims": [ "Playwright and Puppeteer drive a real browser over the Chrome DevTools Protocol." ], "sourceQueries": ["browser automation approaches for AI agents"], "title": "Approaches to Browser Automation", "relevance": 0.92 }, { "id": "pg_d4e5f6", "url": "https://example.com/selenium-webdriver", "claims": [ "Selenium uses the W3C WebDriver standard for broader cross-browser support." ], "sourceQueries": ["selenium webdriver cross-browser support"], "title": "WebDriver Explained" } ] }, "message": "Research complete", "timestamp": "2026-06-02T17:04:11.482Z"}The link between report and citations runs through each entry’s claims: those are the exact statements the agent drew from that source, and you’ll find them in the report text. Match a sentence in the report against the claims arrays to trace it back to its origin. Note the second entry omits relevance (and every entry here omits summary and reliability) — those are optional and simply weren’t populated for those sources.
Research a competitor’s current pricing and limits without manually visiting their documentation:
import Tabstack from '@tabstack/sdk'
const client = new Tabstack()
async function getPricingIntel(competitor: string) {const stream = await client.agent.research({query: `What are ${competitor}'s current pricing plans, rate limits, and free tier details?`,mode: 'fast',nocache: true, // pricing changes frequently; skip cache})
for await (const event of stream) {if (event.event === 'error') {throw new Error(event.data.error.message)}
if (event.event === 'complete') { return { summary: event.data.report, sources: event.data.metadata.citedPages ?? [], retrievedAt: new Date().toISOString(), } }
}
throw new Error('No result returned')}from datetime import datetime, timezonefrom tabstack import Tabstack
client = Tabstack()
def get_pricing_intel(competitor: str): for event in client.agent.research( query=f"What are {competitor}'s current pricing plans, rate limits, and free tier details?", mode="fast", nocache=True, # pricing changes frequently; skip cache ): if event.event == "error": raise RuntimeError(event.data.error.message)
if event.event == "complete": return { "summary": event.data.report, "sources": event.data.metadata.cited_pages or [], "retrieved_at": datetime.now(timezone.utc).isoformat(), }
raise RuntimeError("No result returned")Pull together recent activity on a company before an outreach or sales call:
import Tabstack from '@tabstack/sdk'
const client = new Tabstack()
async function getCompanyBriefing(company: string) {const stream = await client.agent.research({query: `What has ${company} announced or shipped in the last 90 days? Include funding, product launches, and hiring signals.`,mode: 'fast',})
for await (const event of stream) {if (event.event === 'error') {throw new Error(event.data.error.message)}
if (event.event === 'complete') { return { briefing: event.data.report, sources: event.data.metadata.citedPages ?? [], } }
}
throw new Error('No result returned')}from tabstack import Tabstack
client = Tabstack()
def get_company_briefing(company: str): for event in client.agent.research( query=f"What has {company} announced or shipped in the last 90 days? Include funding, product launches, and hiring signals.", mode="fast", ): if event.event == "error": raise RuntimeError(event.data.error.message)
if event.event == "complete": return { "briefing": event.data.report, "sources": event.data.metadata.cited_pages or [], }
raise RuntimeError("No result returned")Answer open-ended questions about a space where the answer spans many sources. This example also shows a simple progress indicator using the iteration events:
import Tabstack from '@tabstack/sdk'
const client = new Tabstack()
const stream = await client.agent.research({query: 'What are the main approaches to browser automation for AI agents, and how do they differ?',mode: 'fast',})
for await (const event of stream) {if (event.event === 'iteration:start') {process.stdout.write(`\rIteration ${event.data.iteration}/${event.data.maxIterations}...`)}
if (event.event === 'complete') {console.log('\n\n' + event.data.report)}
if (event.event === 'error') {throw new Error(event.data.error.message)}}from tabstack import Tabstack
client = Tabstack()
for event in client.agent.research( query="What are the main approaches to browser automation for AI agents, and how do they differ?", mode="fast",): if event.event == "iteration:start": print( f"\rIteration {event.data.iteration}/{event.data.max_iterations}...", end="", flush=True, ) elif event.event == "complete": print("\n\n" + event.data.report) elif event.event == "error": raise RuntimeError(event.data.error.message)/research vs /extract/json| Situation | Use |
|---|---|
| You know the exact URL and want specific fields from it | client.extract.json() |
| You have a question that requires synthesizing multiple sources | client.agent.research() |
| You want clean markdown from one page | client.extract.markdown() |
| You need to answer a question about a topic, not a specific page | client.agent.research() |
| You want AI to transform content from a known URL | client.generate.json() |
The key distinction: /research is for questions where you don’t know which sources hold the answer. /extract/json is for structured extraction when you already have the URL.
| Parameter | Type | Default | Description |
|---|---|---|---|
query | string | required | The research question |
mode | 'fast' | 'balanced' | 'fast' | Controls depth vs. speed. 'balanced' requires a paid plan. |
nocache | boolean | false | Force fresh results, bypass cache |
fetch_timeout | number | — | Timeout in seconds for fetching individual web pages |
Two failure modes to distinguish:
try/except around the call.error events inside the stream. event.data.error is an object with message, name, and optional stack; event.data.activity tells you which phase failed.In rare cases the error event may arrive without a populated error field — defensively fall back when that happens. The example below uses optional chaining (TS) / getattr (Python) so an unpopulated error doesn’t crash the handler.
import Tabstack, { RateLimitError, AuthenticationError } from '@tabstack/sdk'
const client = new Tabstack()
try {const stream = await client.agent.research({query: 'What are the current pricing models for cloud browser APIs?',mode: 'fast',})
for await (const event of stream) {if (event.event === 'error') {// Task-level failure -- the agent could not complete the research.// The `error` field is typed as required but can arrive unpopulated; fall back defensively.const message = event.data.error?.message ?? 'unknown error'throw new Error(`Research failed during ${event.data.activity ?? 'unknown phase'}: ${message}`,)}
if (event.event === 'complete') { console.log(event.data.report) }
}} catch (err) {if (err instanceof RateLimitError) {console.error('Rate limit hit -- retry after a pause')} else if (err instanceof AuthenticationError) {console.error('Invalid API key -- check TABSTACK_API_KEY')} else {throw err}}from tabstack import Tabstackfrom tabstack import RateLimitError, AuthenticationError
client = Tabstack()
try: for event in client.agent.research( query="What are the current pricing models for cloud browser APIs?", mode="fast", ): if event.event == "error": # Task-level failure -- the agent could not complete the research. # The `error` field is typed as required but can arrive as None; tolerate it. activity = event.data.activity or "unknown phase" message = getattr(event.data.error, "message", None) or "unknown error" raise RuntimeError(f"Research failed during {activity}: {message}")
if event.event == "complete": print(event.data.report)
except RateLimitError: print("Rate limit hit -- retry after a pause")except AuthenticationError: print("Invalid API key -- check TABSTACK_API_KEY")Extract JSON
Pull structured fields from a known URL when you already know the source.
Automate Events
The full streaming event model behind agentic endpoints.
API Reference