Skip to main content

Developer's Guide

How to Generate JSON Data with AI

Introduction

Often, simply extracting existing data from a web page isn't enough. You need to transform that content—summarize it, categorize it, translate it, or restructure it into a new format. This is where the TABS API /v1/generate/json endpoint shines.

Unlike the /v1/extract/json endpoint, which pulls existing data, the /v1/generate/json endpoint uses AI to generate new content based on your needs.

This process is driven by two key inputs you provide:

  1. json_schema: The "what." This is a precise blueprint defining the shape of the final JSON you want.
  2. instructions: The "how." These are natural language instructions that tell the AI how to process the source content to populate your schema.

By combining a target URL, a schema, and clear instructions, you can build powerful workflows for content summarization, sentiment analysis, data categorization, and much more.

This guide will walk you through the entire process, from your first request to advanced, production-ready patterns.


Prerequisites

Before you can use the JSON generation endpoint, you'll need a valid TABS API key.

  1. Sign up at https://tabstack.ai to get your API key.
  2. The API uses Bearer Token authentication, so you'll pass your key in an Authorization header.

We recommend storing your API key as an environment variable for security and convenience.

This command shows how to set an environment variable in a Bash-compatible shell.

export TABS_API_KEY="your-api-key-here"

The export command makes the variable available to any processes or scripts you run from this shell session. TABS_API_KEY is the name of the environment variable that our code examples will look for. You should replace "your-api-key-here" with your actual, secret API key.

How to Run:

  1. Copy this command.
  2. Paste it into your terminal.
  3. Replace the placeholder with your key and press Enter.
  4. The curl, javascript (Node.js), and python examples in this guide will now work by reading this variable.

Step-by-Step: From URL to Generated JSON

Let's walk through a complete, practical example.

The Goal: We want to analyze the Hacker News homepage (https://news.ycombinator.com). We don't just want to scrape data; we want the AI to analyze each story and generate a category (e.g., "tech," "business," "science") and write a new one-sentence summary for it.

Step 1: Define the Schema

First, we define the shape of our desired output using json_schema. We want a top-level object containing a key called summaries, which should be an array. Each object in that array should have three string properties: title, category, and summary.

{
"json_schema": {
"type": "object",
"properties": {
"summaries": {
"type": "array",
"items": {
"type": "object",
"properties": {
"title": {"type": "string"},
"category": {"type": "string"},
"summary": {"type": "string"}
}
}
}
}
}
}

Step 2: Write the Instructions

Next, we write the instructions to tell the AI how to populate this schema.

"instructions": "For each story on the page, find its title. Then, categorize it as tech/business/science/other and write a one-sentence summary in simple terms."

This instruction is critical: it explicitly tells the AI to generate the new category and summary data.

Step 3: Assemble and Run the Request

Now, let's combine the target url, our json_schema, and our instructions into a single POST request to the https://api.tabstack.ai/v1/generate/json endpoint.

curl -X POST https://api.tabstack.ai/v1/generate/json \
-H "Authorization: Bearer $TABS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://news.ycombinator.com",
"json_schema": {
"type": "object",
"properties": {
"summaries": {
"type": "array",
"items": {
"type": "object",
"properties": {
"title": {"type": "string"},
"category": {"type": "string"},
"summary": {"type": "string"}
}
}
}
}
},
"instructions": "For each story, categorize it as tech/business/science/other and write a one-sentence summary in simple terms"
}'

Code Explanation (Step-by-Step):

We send a POST request to the endpoint https://api.tabstack.ai/v1/generate/json. For authentication, we include the Authorization: Bearer $TABS_API_KEY header, which uses the environment variable we set earlier. The Content-Type: application/json header tells the server that we are sending a JSON payload. The request body contains three key parameters: "url" specifies the target web page to fetch and analyze, "json_schema" provides the structured blueprint for the output, and "instructions" provides the natural language logic for the AI to follow.

How to Run:

  • curl: You can run this command directly in your terminal, assuming you've set the TABS_API_KEY variable.
  • JavaScript: Save the code as generate.js. Make sure you're in a Node.js project. Run it from your terminal using node generate.js.
  • Python: Save the code as generate.py. Ensure you have the requests library installed (pip install requests). Run it using python generate.py.

Step 4: Analyze the Response

A successful request will return a 200 OK status and the JSON payload, perfectly matching your schema.

{
"summaries": [
{
"title": "New AI Model Released",
"category": "tech",
"summary": "A research lab announced a new language model that performs better on reasoning tasks."
},
{
"title": "Database Performance Tips",
"category": "tech",
"summary": "An engineer shares techniques that reduced database query times by 90%."
},
{
"title": "Climate Tech Startup Raises Funding",
"category": "business",
"summary": "A carbon capture company secured $50M in Series B funding."
}
]
}

Response Explanation: The key difference from extraction: the AI generated the category and summary fields based on your instructions. The title was extracted, but the other fields are newly created, AI-derived content. This is what makes the generate endpoint powerful—it creates data, not just finds it.


Request Parameters

Here is a detailed breakdown of the POST request body parameters.

url (required)

  • Type: string
  • Description: The fully qualified URL of the web page you want to process.
  • Validation:
    • Must be a valid, publicly accessible URL.
    • Cannot be a localhost address or a private/internal IP.
{
"url": "https://techblog.example.com/article"
}

json_schema (required)

  • Type: object
  • Description: A valid JSON Schema object defining the exact structure, data types, and constraints for your desired output. The AI will strictly adhere to this schema.
  • Tips for Effective Schemas:
    • Be Specific: Use string, number, boolean, array, and object types.
    • Use Descriptions: Add description fields to properties. The AI uses these as hints to generate better, more relevant data.
    • Use Enums: To constrain a field to a specific list of values (e.g., for categories), use the enum keyword.
{
"json_schema": {
"type": "object",
"properties": {
"summary": {
"type": "string",
"description": "Overall summary of the content"
},
"sentiment": {
"type": "string",
"enum": ["positive", "negative", "neutral"],
"description": "Overall sentiment of the article"
}
},
"required": ["summary"]
}
}

instructions (required)

  • Type: string
  • Description: Natural language instructions that tell the AI how to generate the data to fit your schema. This is where you define the transformation logic.
  • Tips for Effective Instructions:
    • Be Clear and Specific: "Write a 3-sentence summary" is better than "Summarize this."
    • Reference Schema Properties: Mention property names (e.g., "For the sentiment field, determine if the tone is...").
    • Define Edge Cases: "If no author is found, set the author field to null."
    • Specify Format: "Extract the key points as a list of bullet points."
{
"instructions": "Read the article and create a concise summary (2-3 sentences). Determine if the overall tone is positive, negative, or neutral and assign it to the 'sentiment' field."
}

nocache (optional)

  • Type: boolean
  • Default: false
  • Description: When set to true, this forces the API to bypass any cached version of the URL and re-fetch and re-process the content.
  • When to Use:
    • Analyzing real-time or frequently updated content (e.g., news homepages, stock tickers).
    • Debugging a request with different instructions on the same URL.
{
"url": "https://news.example.com",
"json_schema": { ... },
"instructions": "Summarize the news",
"nocache": true
}

Real-World Examples

These examples show the request payload and the corresponding response. The key is to see how the instructions and json_schema work together.

Example 1: Content Summarization and Audience Analysis

Goal: Generate a "TL;DR," a longer summary, and identify the target audience for a blog post.

Request Payload:

{
"url": "https://techblog.example.com/long-article",
"json_schema": {
"type": "object",
"properties": {
"title": {"type": "string"},
"tldr": {"type": "string", "description": "One-sentence summary"},
"summary": {"type": "string", "description": "3-4 sentence summary"},
"targetAudience": {"type": "string", "description": "Who this is for"}
}
},
"instructions": "Create a TLDR (one sentence) and a longer summary (3-4 sentences) of this article. Also identify who the target audience is (e.g., developers, business leaders, general audience)."
}

Example Response:

{
"title": "Understanding Microservices Architecture",
"tldr": "Microservices break large applications into small, independent services that communicate via APIs, offering better scalability and flexibility.",
"summary": "This article explains microservices architecture, where applications are built as a collection of small, loosely coupled services. Each service handles a specific business function and can be developed, deployed, and scaled independently. The approach offers better fault isolation, technology flexibility, and team autonomy, though it introduces complexity in service coordination and data consistency.",
"targetAudience": "Software developers and architects"
}

Example 2: Content Categorization and Tagging

Goal: Scrape a blog homepage, and for each post, generate a category, a list of tags, and a difficulty level.

Request Payload:

{
"url": "https://blog.example.com",
"json_schema": {
"type": "object",
"properties": {
"posts": {
"type": "array",
"items": {
"type": "object",
"properties": {
"title": {"type": "string"},
"category": {"type": "string", "description": "One of: tutorial, opinion, news, case-study"},
"tags": {"type": "array", "items": {"type": "string"}},
"difficultyLevel": {"type": "string", "enum": ["beginner", "intermediate", "advanced"]}
}
}
}
}
},
"instructions": "For each blog post, assign it to a category (tutorial/opinion/news/case-study), generate 3-5 relevant tags, and determine the technical difficulty level (beginner/intermediate/advanced)."
}

Example Response:

{
"posts": [
{
"title": "Getting Started with React Hooks",
"category": "tutorial",
"tags": ["react", "hooks", "javascript", "frontend", "beginner-friendly"],
"difficultyLevel": "beginner"
},
{
"title": "Optimizing Database Queries at Scale",
"category": "tutorial",
"tags": ["database", "performance", "optimization", "sql", "indexing"],
"difficultyLevel": "advanced"
},
{
"title": "Why We Migrated to Kubernetes",
"category": "case-study",
"tags": ["kubernetes", "devops", "infrastructure", "migration", "containers"],
"difficultyLevel": "intermediate"
}
]
}

Example 3: Sentiment and Trend Analysis from Reviews

Goal: Analyze a page of product reviews to find the overall sentiment, common themes, and a final recommendation.

Request Payload:

{
"url": "https://reviews.example.com/product/xyz",
"json_schema": {
"type": "object",
"properties": {
"overallSentiment": {"type": "string", "enum": ["very positive", "positive", "neutral", "negative", "very negative"]},
"rating": {"type": "number", "description": "Estimated average rating out of 5"},
"commonPraises": {"type": "array", "items": {"type": "string"}},
"commonComplaints": {"type": "array", "items": {"type": "string"}},
"recommendation": {"type": "string"}
}
},
"instructions": "Analyze all customer reviews on this page and determine: the overall sentiment, an estimated rating (1-5), the top 3 most common praises, the top 3 most common complaints, and write a final recommendation (yes/no with a brief reason)."
}

Example Response:

{
"overallSentiment": "positive",
"rating": 4.2,
"commonPraises": [
"Excellent build quality and durability",
"Great battery life lasting 2-3 days",
"Intuitive and easy-to-use interface"
],
"commonComplaints": [
"Price is higher than competitors",
"Limited color options available",
"Charging cable is too short"
],
"recommendation": "Yes - the product excels in quality and performance, making it worth the premium price for users who prioritize reliability over cost."
}

Working with Responses

Here are complete, runnable examples showing how to call the API and then process the AI-generated data in your application.

Example: Processing Key Insights

This script fetches an article, asks the AI to generate key insights and action items, and then prints them in a formatted way.

async function analyzeAndProcess(url) {
const payload = {
url,
json_schema: {
type: 'object',
properties: {
mainTopic: { type: 'string' },
keyInsights: {
type: 'array',
items: { type: 'string' }
},
actionItems: {
type: 'array',
items: { type: 'string' }
}
}
},
instructions: 'Identify the main topic, extract 3-5 key insights, and suggest 2-3 action items for someone reading this.'
};

const response = await fetch('https://api.tabstack.ai/v1/generate/json', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.TABS_API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify(payload)
});

if (!response.ok) {
throw new Error(`API request failed: ${response.statusText}`);
}

const data = await response.json();

// Process the AI-generated content
console.log(`Main Topic: ${data.mainTopic}\n`);

console.log('Key Insights:');
data.keyInsights.forEach((insight, i) => {
console.log(` ${i + 1}. ${insight}`);
});

console.log('\nRecommended Actions:');
data.actionItems.forEach((action, i) => {
console.log(` ${i + 1}. ${action}`);
});

return data;
}

// Call the function
analyzeAndProcess('https://blog.example.com/article');

Code Explanation (JavaScript):

This reusable function takes a URL, sends the generation request with your schema and instructions, handles errors, and processes the structured response. The AI returns data matching your schema, which you can then display, store, or use however needed.

How to Run:

  • JavaScript: Save as analyze.js. Run with node analyze.js.
  • Python: Save as analyze.py. Run with python analyze.py. (Remember to pip install requests).

Error Handling

Well-structured error handling is essential for a robust application. The API returns standard HTTP status codes.

Common Error Status Codes

Status CodeError MessageDescription
400url is requiredThe url parameter was missing from your request body.
400json schema is requiredThe json_schema parameter was missing.
400instructions are requiredThe instructions parameter was missing.
400json schema must be a valid objectThe json_schema you provided was malformed or not valid.
401Unauthorized - Invalid tokenYour API key is missing, invalid, or expired.
422url is invalidThe url was malformed or pointed to an inaccessible resource.
500failed to fetch URLThe server had a problem fetching the provided URL.
500web page is too largeThe target page's content exceeded the maximum processing size.
500failed to transform dataThe AI failed to generate data. This can be due to overly complex instructions, a schema mismatch, or bad source content.

All error responses return a JSON object with an error field.

{
"error": "instructions are required"
}

Error Handling Example Code

This example expands our previous function with robust try...catch blocks and status-specific error messages.

async function generateWithErrorHandling(url, schema, instructions) {
try {
const response = await fetch('https://api.tabstack.ai/v1/generate/json', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.TABS_API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
url,
json_schema: schema,
instructions
})
});

const data = await response.json();

if (!response.ok) {
// Handle API-level errors
switch (response.status) {
case 400:
throw new Error(`Bad request: ${data.error}`);
case 401:
throw new Error('Authentication failed. Check your API key.');
case 422:
throw new Error(`Invalid URL: ${data.error}`);
case 500:
if (data.error.includes('too large')) {
throw new Error('Page is too large. Try a different URL.');
} else if (data.error.includes('transform')) {
throw new Error('AI transformation failed. Try simplifying your instructions.');
}
throw new Error(`Server error: ${data.error}`);
default:
throw new Error(`Request failed (${response.status}): ${data.error}`);
}
}

return data;
} catch (error) {
// Handle network errors or thrown exceptions
console.error('Error generating JSON:', error.message);
throw error;
}
}

// Usage
const schema = {
type: 'object',
properties: {
summary: { type: 'string' }
}
};

generateWithErrorHandling(
'https://example.com',
schema,
'Create a brief summary'
).then(data => console.log(data));

Code Explanation (JavaScript):

This version adds robust error handling. Parse the response first so you can access error messages, then check response.ok. Use a switch statement to provide specific error messages for different status codes. This makes debugging easier—you know immediately if the issue is authentication, a bad URL, or an AI generation failure.

How to Run:

  • This code is designed to be part of a larger application. You can test it by running the file. To trigger an error, try passing an invalid URL (e.g., http://invalid-url-123.xyz) or an empty instructions string.

Best Practices

Follow these practices to get the most reliable and accurate results from the AI.

1. Write Clear, Specific Instructions

The quality of your output is directly proportional to the quality of your instructions. Vague instructions lead to vague results.

Vague: "instructions": "Summarize this"

Better: "instructions": "Write a 2-sentence summary of this article"

Best: "instructions": "Create a concise summary of this article for the 'summary' field. The summary should be 2-3 sentences, written in a professional but accessible style, and focus on the main argument and key findings."

2. Use Schema Descriptions Effectively

The AI reads the description fields in your json_schema. Use them to provide context and hints for each property.

Schema without descriptions (less reliable):

{
"type": "object",
"properties": {
"technicalLevel": {
"type": "string",
"enum": ["beginner", "intermediate", "advanced"]
}
}
}

Schema with descriptions (more reliable):

{
"type": "object",
"properties": {
"technicalLevel": {
"type": "string",
"enum": ["beginner", "intermediate", "advanced"],
"description": "Assess the technical complexity of the content. 'Beginner' means no prior knowledge needed. 'Advanced' means deep technical expertise is required."
}
}
}

3. Test Instructions Iteratively

Start with a simple schema and simple instructions. Get that working, and then gradually add complexity.

  1. Start Simple: "instructions": "Summarize this article in 2 sentences."
  2. Test and Refine: Does it work? Good. Now add more.
  3. Add Complexity: "instructions": "Summarize this article in 2 sentences and identify the target audience."
  4. Test Again: "instructions": "Summarize... identify the target audience... list 3 key takeaways... and rate the technical difficulty from 1-5."

This iterative process is much easier to debug than writing a highly complex prompt from scratch.

4. Monitor and Log Generations

For production systems, log your requests and the AI's (successful) responses. This allows you to monitor for quality, identify edge cases where the AI struggles, and build a dataset for finetuning your instructions.

This function logs key metadata about each request.

async function generateWithLogging(url, schema, instructions) {
const startTime = Date.now();

try {
const data = await generateWithErrorHandling(url, schema, instructions); // Re-use our error handler
const duration = Date.now() - startTime;

console.log(JSON.stringify({
level: 'info',
timestamp: new Date().toISOString(),
url,
status: 'success',
durationMs: duration,
instructionsLength: instructions.length,
responseSize: JSON.stringify(data).length
}));

return data;
} catch (error) {
const duration = Date.now() - startTime;

console.error(JSON.stringify({
level: 'error',
timestamp: new Date().toISOString(),
url,
status: 'error',
durationMs: duration,
errorMessage: error.message
}));
throw error;
}
}

Code Explanation (JavaScript):

We record the time before the request using const startTime = Date.now(). Inside the try block, we call our robust generateWithErrorHandling function to make the API request. After the request completes, we calculate the request duration. We then log a structured JSON object using console.log(JSON.stringify(...)). In a real application, you would send this to a logging service like DataDog, Sentry, or your own database. The catch (error) block also logs a structured error message on failure, ensuring that both successful and failed requests are properly tracked.

How to Run:

  • Use this generateWithLogging function as your new primary entry point for making API calls.