Skip to content
Get started

How to Generate JSON Data with AI

Learn how to use the Tabstack API `/v1/generate/json` endpoint to generate JSON data with AI.

Often, extracting existing data from a web page isn’t enough. You need to transform that content — summarize it, categorize it, translate it, or restructure it into a new format. This is where /v1/generate/json comes in.

Unlike /v1/extract/json, which pulls existing data, /v1/generate/json uses AI to generate new content based on your instructions.

This process is driven by two key inputs:

  1. json_schema: The “what.” A precise blueprint defining the shape of the JSON you want.
  2. instructions: The “how.” Natural language instructions telling the AI how to process the source content to populate your schema.

By combining a target URL, a schema, and clear instructions, you can build workflows for content summarization, sentiment analysis, data categorization, competitive intelligence, and more.


You need a Tabstack API key. Get one at tabstack.ai, then set it as an environment variable:

Terminal window
export TABSTACK_API_KEY="your-api-key-here"

Install the SDK:

Terminal window
npm install @tabstack/sdk

The goal: Analyze the Hacker News homepage (https://news.ycombinator.com). For each story, have the AI generate a category (e.g., “tech,” “business,” “science”) and write a new one-sentence summary.

Define the shape of your output using json_schema. We want an object containing a summaries array, where each item has title, category, and summary string properties.

{
"json_schema": {
"type": "object",
"properties": {
"summaries": {
"type": "array",
"items": {
"type": "object",
"properties": {
"title": { "type": "string" },
"category": { "type": "string" },
"summary": { "type": "string" }
}
}
}
}
}
}

Tell the AI how to populate the schema:

"For each story on the page, find its title. Then, categorize it as tech/business/science/other and write a one-sentence summary in simple terms."
import Tabstack from '@tabstack/sdk'
const client = new Tabstack()
const result = await client.generate.json({
url: 'https://news.ycombinator.com',
json_schema: {
type: 'object',
properties: {
summaries: {
type: 'array',
items: {
type: 'object',
properties: {
title: { type: 'string' },
category: { type: 'string' },
summary: { type: 'string' },
},
},
},
},
},
instructions:
'For each story, categorize it as tech/business/science/other and write a one-sentence summary in simple terms',
})
console.log(JSON.stringify(result, null, 2))

Note: The examples throughout this guide use placeholder URLs like https://competitor.example.com/pricing, https://jobs.example.com/senior-backend-engineer, and https://example.com. Replace them with real URLs of pages you want to process.

A successful request returns a 200 OK with JSON matching your schema exactly:

{
"summaries": [
{
"title": "New AI Model Released",
"category": "tech",
"summary": "A research lab announced a new language model that performs better on reasoning tasks."
},
{
"title": "Database Performance Tips",
"category": "tech",
"summary": "An engineer shares techniques that reduced database query times by 90%."
},
{
"title": "Climate Tech Startup Raises Funding",
"category": "business",
"summary": "A carbon capture company secured $50M in Series B funding."
}
]
}

The AI generated the category and summary fields based on your instructions. The title was extracted, but the other fields are newly created, AI-derived content.


  • Type: string
  • Description: The fully qualified URL of the web page to process.
  • Must be a valid, publicly accessible URL. Cannot be localhost or a private IP.
  • Type: object
  • Description: A valid JSON Schema object defining the exact structure, types, and constraints for your output. The AI strictly adheres to this schema.
  • Tips:
    • Use string, number, boolean, array, and object types.
    • Add description fields to properties — the AI uses them as hints.
    • Use enum to constrain a field to a specific set of values.
{
"json_schema": {
"type": "object",
"properties": {
"summary": {
"type": "string",
"description": "Overall summary of the content"
},
"sentiment": {
"type": "string",
"enum": ["positive", "negative", "neutral"],
"description": "Overall sentiment of the article"
}
},
"required": ["summary"]
}
}
  • Type: string (max 20,000 characters)
  • Description: Natural language instructions telling the AI how to generate data to fit your schema.
  • Tips:
    • Be specific: “Write a 3-sentence summary” beats “Summarize this.”
    • Reference schema properties by name.
    • Define edge cases: “If no author is found, set author to null.”
  • Type: 'min' | 'standard' | 'max'
  • Default: standard
  • Controls the speed vs. capability tradeoff. Use max for JS-heavy SPAs or complex pages.
  • Type: boolean
  • Default: false
  • Set to true to bypass cache and force a fresh fetch. Use for real-time content or when testing with different instructions on the same URL.
  • Type: { country: string } — ISO 3166-1 alpha-2 code (e.g. 'US', 'GB')
  • Fetches the URL from a specific geographic location.

Example 1: Content Summarization and Audience Analysis

Section titled “Example 1: Content Summarization and Audience Analysis”

Goal: Generate a “TL;DR,” a longer summary, and identify the target audience for a blog post.

Request payload:

{
"url": "https://techblog.example.com/long-article",
"json_schema": {
"type": "object",
"properties": {
"title": { "type": "string" },
"tldr": { "type": "string", "description": "One-sentence summary" },
"summary": { "type": "string", "description": "3-4 sentence summary" },
"targetAudience": { "type": "string", "description": "Who this is for" }
}
},
"instructions": "Create a TLDR (one sentence) and a longer summary (3-4 sentences) of this article. Also identify who the target audience is (e.g., developers, business leaders, general audience)."
}

Example response:

{
"title": "Understanding Microservices Architecture",
"tldr": "Microservices break large applications into small, independent services that communicate via APIs, offering better scalability and flexibility.",
"summary": "This article explains microservices architecture, where applications are built as a collection of small, loosely coupled services. Each service handles a specific business function and can be developed, deployed, and scaled independently. The approach offers better fault isolation, technology flexibility, and team autonomy, though it introduces complexity in service coordination and data consistency.",
"targetAudience": "Software developers and architects"
}

Example 2: Content Categorization and Tagging

Section titled “Example 2: Content Categorization and Tagging”

Goal: For each post on a blog homepage, generate a category, a list of tags, and a difficulty level.

Request payload:

{
"url": "https://blog.example.com",
"json_schema": {
"type": "object",
"properties": {
"posts": {
"type": "array",
"items": {
"type": "object",
"properties": {
"title": { "type": "string" },
"category": {
"type": "string",
"description": "One of: tutorial, opinion, news, case-study"
},
"tags": { "type": "array", "items": { "type": "string" } },
"difficultyLevel": {
"type": "string",
"enum": ["beginner", "intermediate", "advanced"]
}
}
}
}
}
},
"instructions": "For each blog post, assign it to a category (tutorial/opinion/news/case-study), generate 3-5 relevant tags, and determine the technical difficulty level (beginner/intermediate/advanced)."
}

Example response:

{
"posts": [
{
"title": "Getting Started with React Hooks",
"category": "tutorial",
"tags": ["react", "hooks", "javascript", "frontend", "beginner-friendly"],
"difficultyLevel": "beginner"
},
{
"title": "Optimizing Database Queries at Scale",
"category": "tutorial",
"tags": ["database", "performance", "optimization", "sql", "indexing"],
"difficultyLevel": "advanced"
},
{
"title": "Why We Migrated to Kubernetes",
"category": "case-study",
"tags": ["kubernetes", "devops", "infrastructure", "migration", "containers"],
"difficultyLevel": "intermediate"
}
]
}

Example 3: Sentiment and Trend Analysis from Reviews

Section titled “Example 3: Sentiment and Trend Analysis from Reviews”

Goal: Analyze a product review page to find overall sentiment, common themes, and a final recommendation.

Request payload:

{
"url": "https://reviews.example.com/product/xyz",
"json_schema": {
"type": "object",
"properties": {
"overallSentiment": {
"type": "string",
"enum": ["very positive", "positive", "neutral", "negative", "very negative"]
},
"rating": {
"type": "number",
"description": "Estimated average rating out of 5"
},
"commonPraises": { "type": "array", "items": { "type": "string" } },
"commonComplaints": { "type": "array", "items": { "type": "string" } },
"recommendation": { "type": "string" }
}
},
"instructions": "Analyze all customer reviews on this page and determine: the overall sentiment, an estimated rating (1-5), the top 3 most common praises, the top 3 most common complaints, and write a final recommendation (yes/no with a brief reason)."
}

Example response:

{
"overallSentiment": "positive",
"rating": 4.2,
"commonPraises": [
"Excellent build quality and durability",
"Great battery life lasting 2-3 days",
"Intuitive and easy-to-use interface"
],
"commonComplaints": [
"Price is higher than competitors",
"Limited color options available",
"Charging cable is too short"
],
"recommendation": "Yes - the product excels in quality and performance, making it worth the premium price for users who prioritize reliability over cost."
}

These examples show generate.json as an intelligence call inside an agent — fetching a URL, transforming its content, and returning structured data the agent can act on.

Fetch a competitor’s pricing page and transform it into structured competitive intelligence.

import Tabstack from '@tabstack/sdk'
const client = new Tabstack()
const competitiveSignals = await client.generate.json({
url: 'https://competitor.example.com/pricing',
json_schema: {
type: 'object',
properties: {
tiers: {
type: 'array',
items: {
type: 'object',
properties: {
name: { type: 'string', description: 'Plan or tier name' },
price_monthly: {
type: ['number', 'null'],
description: 'Monthly price in USD, null if custom or contact-us',
},
price_annual: {
type: ['number', 'null'],
description: 'Annual price in USD, null if custom or contact-us',
},
key_features: {
type: 'array',
items: { type: 'string' },
description: 'Top 3-5 features highlighted for this tier',
},
target_customer: {
type: 'string',
description: 'Who this tier appears to be aimed at',
},
value_rating: {
type: 'string',
enum: ['good', 'fair', 'poor'],
description:
"Heuristic judgement of this tier's value based on features vs price. 'good' = rich feature set relative to price; 'poor' = thin feature set for the cost.",
},
},
},
},
pricing_model: {
type: 'string',
enum: ['per-seat', 'usage-based', 'flat-rate', 'hybrid', 'unknown'],
description: 'Overall pricing structure',
},
free_tier_available: { type: 'boolean' },
},
},
instructions:
'Extract each pricing tier. For each tier, capture the name, monthly and annual prices (null if not shown), the top 3-5 highlighted features, and who it seems targeted at. Then judge the tier\'s value_rating (good/fair/poor) by weighing its feature set against its price. Also identify the overall pricing model and whether a free tier exists.',
nocache: true,
})
console.log(JSON.stringify(competitiveSignals, null, 2))

Convert a job listing into structured hiring intent signals — useful for sales intelligence, market research, or agent-driven lead qualification.

import Tabstack from '@tabstack/sdk'
const client = new Tabstack()
const hiringIntent = await client.generate.json({
url: 'https://jobs.example.com/senior-backend-engineer',
json_schema: {
type: 'object',
properties: {
role: { type: 'string', description: 'Job title as listed' },
seniority: {
type: 'string',
enum: ['intern', 'junior', 'mid', 'senior', 'staff', 'principal', 'manager', 'director'],
},
tech_stack: {
type: 'array',
items: { type: 'string' },
description: 'Technologies, frameworks, and tools mentioned in requirements',
},
team_size_signals: {
type: 'string',
description:
'Any indicators of team or company size (headcount, growth stage, etc.)',
},
pain_points: {
type: 'array',
items: { type: 'string' },
description:
'Problems or challenges implied by the job description (e.g., "scaling infrastructure", "improving developer experience")',
},
remote_policy: {
type: 'string',
enum: ['remote', 'hybrid', 'on-site', 'unknown'],
},
urgency_level: {
type: 'string',
enum: ['low', 'medium', 'high'],
description:
"Rate the hiring urgency. 'high' = the listing implies backfill or a launched-but-incomplete project; 'low' = speculative or long-term planning.",
},
company_name: { type: 'string' },
},
},
instructions:
'Parse this job listing and extract: the exact role title, seniority level, all technologies and tools mentioned in requirements or responsibilities, any signals about team or company size, implied pain points or challenges the hire is meant to solve, the remote work policy, and the company name. Then judge the urgency_level (low/medium/high) from language cues about timelines, backfill, shipping pressure, or speculative planning.',
})
console.log(JSON.stringify(hiringIntent, null, 2))

This example fetches an article, asks the AI to generate key insights and action items, then processes the structured output.

import Tabstack from '@tabstack/sdk'
const client = new Tabstack()
async function analyzeAndProcess(url: string) {
const data = await client.generate.json({
url,
json_schema: {
type: 'object',
properties: {
mainTopic: { type: 'string' },
keyInsights: {
type: 'array',
items: { type: 'string' },
},
actionItems: {
type: 'array',
items: { type: 'string' },
},
},
},
instructions:
'Identify the main topic, extract 3-5 key insights, and suggest 2-3 action items for someone reading this.',
})
console.log(`Main Topic: ${data.mainTopic}\n`)
console.log('Key Insights:')
;(data.keyInsights as string[]).forEach((insight, i) => {
console.log(` ${i + 1}. ${insight}`)
})
console.log('\nRecommended Actions:')
;(data.actionItems as string[]).forEach((action, i) => {
console.log(` ${i + 1}. ${action}`)
})
return data
}
analyzeAndProcess('https://blog.example.com/article')

The SDK throws typed errors you can catch by class.

StatusSDK error typeDescription
400BadRequestErrorMissing or malformed url, json_schema, or instructions.
401AuthenticationErrorAPI key missing, invalid, or expired.
403PermissionDeniedErrorYour key doesn’t have access to this resource.
422UnprocessableEntityErrorThe URL was malformed or points to an inaccessible resource.
429RateLimitErrorToo many requests. The SDK retries automatically with backoff.
500InternalServerErrorServer-side failure — page too large, fetch failed, or AI transformation failed.

The SDK automatically retries 408, 409, 429, and 500+ errors twice with exponential backoff.

import Tabstack, {
AuthenticationError,
BadRequestError,
RateLimitError,
UnprocessableEntityError,
InternalServerError,
} from '@tabstack/sdk'
const client = new Tabstack()
async function generateWithErrorHandling(
url: string,
schema: unknown,
instructions: string
) {
try {
const data = await client.generate.json({
url,
json_schema: schema,
instructions,
})
return data
} catch (error) {
if (error instanceof AuthenticationError) {
throw new Error('Authentication failed. Check your TABSTACK_API_KEY.')
}
if (error instanceof BadRequestError) {
throw new Error(`Bad request: ${error.message}`)
}
if (error instanceof UnprocessableEntityError) {
throw new Error(`Invalid URL: ${error.message}`)
}
if (error instanceof RateLimitError) {
throw new Error('Rate limit exceeded. Retries exhausted.')
}
if (error instanceof InternalServerError) {
throw new Error(
`Server error: ${error.message}. Try simplifying your instructions or using a different URL.`
)
}
throw error
}
}
// Usage
const schema = {
type: 'object',
properties: {
summary: { type: 'string' },
},
}
generateWithErrorHandling('https://example.com', schema, 'Create a brief summary').then(
(data) => console.log(data)
)

The quality of your output is directly proportional to the quality of your instructions.

Vague: "instructions": "Summarize this"

Better: "instructions": "Write a 2-sentence summary of this article"

Best: "instructions": "Create a concise summary for the 'summary' field. The summary should be 2-3 sentences, written in a professional but accessible style, and focus on the main argument and key findings."

The AI reads the description fields in your json_schema. Use them to provide context and constraints for each property.

Without descriptions (less reliable):

{
"type": "object",
"properties": {
"technicalLevel": {
"type": "string",
"enum": ["beginner", "intermediate", "advanced"]
}
}
}

With descriptions (more reliable):

{
"type": "object",
"properties": {
"technicalLevel": {
"type": "string",
"enum": ["beginner", "intermediate", "advanced"],
"description": "Assess the technical complexity of the content. 'Beginner' means no prior knowledge needed. 'Advanced' means deep technical expertise is required."
}
}
}

Start with a simple schema and simple instructions. Get that working, then add complexity.

  1. Start simple: "Summarize this article in 2 sentences."
  2. Add a field: "Summarize... and identify the target audience."
  3. Add more: "Summarize... identify the target audience... list 3 key takeaways... and rate the technical difficulty from 1-5."

This is easier to debug than writing a complex prompt from scratch.

For production systems, log your requests and responses. This lets you monitor output quality, identify edge cases, and refine your instructions over time.

import Tabstack from '@tabstack/sdk'
const client = new Tabstack()
async function generateWithLogging(
url: string,
schema: unknown,
instructions: string
) {
const startTime = Date.now()
try {
const data = await client.generate.json({ url, json_schema: schema, instructions })
const duration = Date.now() - startTime
console.log(
JSON.stringify({
level: 'info',
timestamp: new Date().toISOString(),
url,
status: 'success',
durationMs: duration,
instructionsLength: instructions.length,
responseSize: JSON.stringify(data).length,
})
)
return data
} catch (error) {
const duration = Date.now() - startTime
console.error(
JSON.stringify({
level: 'error',
timestamp: new Date().toISOString(),
url,
status: 'error',
durationMs: duration,
errorMessage: (error as Error).message,
})
)
throw error
}
}
await generateWithLogging(
'https://example.com',
{ type: 'object', properties: { summary: { type: 'string' } } },
'Write a 1-sentence summary of the page.',
)