--- title: AI Transformation | Tabstack description: Transform the content of any page into a JSON schema you define, with AI, in a single call. --- Extract returns data that already exists on the page. Generate produces data that does not exist on the page until the model creates it. That one distinction decides which endpoint you reach for: if the value is already sitting in the HTML, extract it. If a model has to read the content and produce something new, generate it. --- ## Same URL, different answer Point both endpoints at the same article and the difference shows up in what comes back. Extract `/extract/json` lifts fields that are already on the page. ``` { "title": "Understanding Microservices", "author": "Jane Doe", "published_date": "2026-05-12" } ``` Every field here is present in the page HTML. Extract reads it and hands it back. Generate `/generate/json` returns fields the model produced from the content. ``` { "title": "Understanding Microservices", "category": "engineering", "sentiment": "neutral", "relevance_score": 8 } ``` `category`, `sentiment`, and `relevance_score` appear nowhere in the page. The model read the article and created them. `relevance_score` is the proof. Nothing in the page contains an 8, so it can only have come from generate. --- ## Does the data exist on the page? That is the whole decision. If the field is already in the HTML, use extract. If the model has to read, reason, and produce it, use generate. The Hacker News homepage makes the split obvious. Each story’s `title` is text on the page, so extract handles it. A `category` and a one-line `summary` are not on the page at all, so generate has to create them: ``` { "summaries": [ { "title": "New AI Model Released", "category": "tech", "summary": "A research lab announced a new language model that performs better on reasoning tasks." } ] } ``` `title` was extracted. `category` and `summary` did not exist until the model wrote them. The same response carries both kinds of field, which is exactly what generate is for: a schema where some values are read and others are produced. --- ## A field that cannot be read Here is one generate call built around a value the page does not contain. The model reads each article and assigns a `relevance_score`, then returns it inside your schema. - [TypeScript](#tab-panel-28) - [Python](#tab-panel-29) ``` import Tabstack from "@tabstack/sdk"; const client = new Tabstack(); const result = await client.generate.json({ url: "https://competitor.example.com/blog", json_schema: { type: "object", properties: { articles: { type: "array", items: { type: "object", properties: { title: { type: "string" }, target_audience: { type: "string" }, relevance_score: { type: "number", description: "1-10 score for a developer focused on AI agents", }, }, }, }, }, }, instructions: "For each article, identify the target audience and assign a relevance_score from 1-10 for a developer focused on AI agents.", }); console.log(JSON.stringify(result, null, 2)); ``` ``` import json from tabstack import Tabstack client = Tabstack() result = client.generate.json( url="https://competitor.example.com/blog", json_schema={ "type": "object", "properties": { "articles": { "type": "array", "items": { "type": "object", "properties": { "title": {"type": "string"}, "target_audience": {"type": "string"}, "relevance_score": { "type": "number", "description": "1-10 score for a developer focused on AI agents", }, }, }, }, }, }, instructions="For each article, identify the target audience and assign a relevance_score from 1-10 for a developer focused on AI agents.", ) print(json.dumps(result, indent=2)) ``` The response carries the generated fields back inside the shape you asked for: ``` { "articles": [ { "title": "Shipping faster with feature flags", "target_audience": "engineering leads", "relevance_score": 6 }, { "title": "Building autonomous agents with tool use", "target_audience": "AI engineers", "relevance_score": 9 } ] } ``` `title` is on the page. `target_audience` and `relevance_score` are not. The model produced both. --- ## When not to use generate If extract can get it, use extract. Both endpoints are AI-guided, but generate adds an explicit transformation step: you hand it instructions, and it produces content that is not on the page. That extra step usually makes generate slower and more expensive than extract, so reach for it only when the value has to be created. Generate earns that cost when the value has to be reasoned into existence: scoring, summarizing, categorizing, sentiment, rewriting. It is not for pulling data that is already structured on the page. A product price, a published date, a table of specs: that is extract. A relevance score, a one-line summary, a sentiment label: that is generate. This still matters when read and produced fields could share one schema. A `generate.json` call is billed as a generate request no matter how many of its fields are read-only, so folding easy-to-read fields into it saves nothing on those fields. When cost matters, split the work: pull the read-only fields with `extract.json` and use `generate.json` only for the derived outputs. For the parameters that tune a generate call, see the how-to and the SDK reference rather than this page. --- ## Next steps - [How to generate JSON](/guides/how-to-generate-json/index.md) for the step-by-step mechanics: parameters, instructions, and error handling. - [Generate Features (SDK reference)](/sdks/typescript/generate/index.md) for the `generate.json` surface in the SDKs. - [How to extract JSON](/guides/how-to-extract-json/index.md) for the other side of the contrast.