Reference

Troubleshooting and FAQ

Common failure modes across Tabstack endpoints and how to fix them: empty extractions, timeouts, JS-heavy pages, auth errors, and schema debugging.

This page collects the failure modes people hit most often and the levers that fix them. It is a starting point, not a status-code dictionary: for the exact meaning of every HTTP status and SDK exception, see the Error Reference, and for caching, retries, and timeout behavior in production, see Production Reliability.

Empty or partial extractions

You get a 200 response but the data is missing, incomplete, or wrong.

The content needs JavaScript to appear. The single most common cause. If the page is a React, Vue, Angular, or Next.js app, or loads content after the initial HTML, min and standard effort will not see it. Re-run with effort: 'max', which does full headless browser rendering. See Effort Levels.

The schema is too vague. The extraction AI reads your property description fields as instructions. Without them it only has the property name to go on. Add specific descriptions, and use enum for categorical fields. See Schema Design.

The schema depth doesn’t match the page. A flat schema pointed at a nested page (categories containing products) merges items and loses hierarchy. Mirror the page structure in your schema. See Schema Design.

You’re seeing a stale cached result. Tabstack caches by URL. If you just changed your schema or the page changed, set nocache: true to force a fresh fetch and confirm the cache isn’t masking the real result.

You get a 500. The request was accepted but something failed server-side: the page failed to fetch, was too large, or extraction could not complete. When the cause is rendering (the page was fetched but needs a full browser to produce content), retry with effort: 'max' and nocache: true as the first remediation (the pattern is shown in Production Reliability). A genuine fetch failure (DNS, connection, SSL, 404, robots) is not fixed by a retry: correct or change the URL. A 422 is different again: the URL itself is malformed or points to an inaccessible or private resource, so fix the URL rather than retrying. See the Error Reference for the exact handling.

An array comes back with only some of the expected items. There is no per-call item cap. maxItems acts only as a ceiling, never a floor. Short arrays come from one of three things: the missing items aren’t present in the fetched HTML (raise effort for JS rendering), token-limit truncation on a large result set, or an over-tight schema where required fields drop items that don’t satisfy them. It is not a hidden limit.

Timeouts and slow pages

The default request timeout is 60 seconds. effort: 'max' on complex single-page apps can approach this on slow pages.

Raise the per-request timeout. Pass a longer timeout for heavy pages rather than raising it globally. The exact pattern (TypeScript request options, with_options(timeout=...) in Python) is in Production Reliability.

Narrow the work. A more specific starting URL, a tighter schema, or dropping nocache (so a cached result can be reused) all reduce time per request.

Streaming endpoints are different. For /automate and /research, the timeout governs the initial connection, not the length of the stream. A long task can legitimately stream well past 60 seconds. Budget for this on the client rather than assuming the server will cut it off. See Streaming Patterns.

JS-heavy pages and single-page apps

If a page renders its content with JavaScript, raise the effort level.

Reach for effort: 'max' when:

The page is a React, Vue, Angular, or client-side Next.js app.
Content loads lazily or after interaction.
Pricing tables, product listings, or data grids are rendered by JavaScript.
You get empty fields with standard.

max uses full headless browser rendering: it executes JavaScript and waits for dynamic content. It is slower and costs more, so use it where you need it rather than everywhere. See Effort Levels for the full tradeoff.

To confirm a page is JS-dependent before reaching for max, open the URL in your browser with JavaScript disabled. What you see is roughly what the extractor sees at min / standard.

Authentication errors

A 401 (AuthenticationError) almost always means one of two things:

The key isn’t set in the environment that’s running. Both SDKs read TABSTACK_API_KEY from the environment. In production this commonly means the variable wasn’t set in the deployment, not just locally. Confirm it is present where the code actually runs.
The key was rotated or revoked. Generate a fresh key in the console and update the environment.

401 is not retried automatically; it needs intervention, not backoff. See the Error Reference for the full auth error shape, and the Quickstart for key setup.

Schema debugging

When /extract/json or /generate/json results are incomplete or inaccurate, work through these in order:

Add more specific descriptions. The most common fix. Tell the AI what each field means and how to handle edge cases.
Upgrade effort to max. The content may not be in the initial HTML.
Simplify the schema. Remove fields you don’t need. Fewer fields means less for the AI to get wrong.
Add nocache: true. Confirms the issue isn’t a stale cached result.
Check for JS rendering. Open the URL with JavaScript disabled to see what the extractor sees at min / standard.

Be explicit about null cases: declaring a field as ["number", "null"] with a description of when it’s null produces consistent null values instead of missing fields or invented data. Full patterns are in Schema Design.

FAQ

Why is my extraction empty when the page looks fine in my browser? Almost always JavaScript rendering. Your browser runs JS; min and standard effort do not. Re-run with effort: 'max'. See Empty or partial extractions above.

How do I force fresh data instead of a cached result? Pass nocache: true. See Production Reliability for what is cached and for how long.

Which effort level should I use? Start with standard (the default). Drop to min for static, server-rendered pages where speed matters; move to max for JS-heavy pages. See Effort Levels.

What should I do about 429 (rate limit) errors? The SDKs retry 429 automatically with backoff. If you hit them consistently you’re exceeding your plan’s per-minute limit. See Rate Limits and Production Reliability.

How do I extract content from a page that requires login? Use /automate with interactive: true. The agent pauses on a login form and emits an interactive:form_data:request event; you then POST the credentials to /v1/automate/{requestId}/input. Credentials are filled into the page directly and are never sent to the LLM. Not supported: stored or pre-authenticated credentials, sessions that persist across requests, and login on /extract or /generate. See Interactive Mode.

An endpoint returned a status code I don’t recognize. The Error Reference is the canonical list of every status code, error message, and SDK exception across all endpoints.