How to Generate JSON Data with AI
Learn how to use the Tabstack API `/v1/generate/json` endpoint to generate JSON data with AI.
Often, simply extracting existing data from a web page isn’t enough. You need to transform that content—summarize it, categorize it, translate it, or restructure it into a new format. This is where the Tabstack API /v1/generate/json endpoint shines.
Unlike the /v1/extract/json endpoint, which pulls existing data, the /v1/generate/json endpoint uses AI to generate new content based on your needs.
This process is driven by two key inputs you provide:
json_schema: The “what.” This is a precise blueprint defining the shape of the final JSON you want.instructions: The “how.” These are natural language instructions that tell the AI how to process the source content to populate your schema.
By combining a target URL, a schema, and clear instructions, you can build powerful workflows for content summarization, sentiment analysis, data categorization, and much more.
This guide will walk you through the entire process, from your first request to advanced, production-ready patterns.
Prerequisites
Section titled “Prerequisites”Before you can use the JSON generation endpoint, you’ll need a valid Tabstack API key.
- Sign up at https://tabstack.ai to get your API key.
- The API uses Bearer Token authentication, so you’ll pass your key in an
Authorizationheader.
We recommend storing your API key as an environment variable for security and convenience.
This command shows how to set an environment variable in a Bash-compatible shell.
export TABSTACK_API_KEY="your-api-key-here"The export command makes the variable available to any processes or scripts you run from this shell session. TABSTACK_API_KEY is the name of the environment variable that our code examples will look for. You should replace "your-api-key-here" with your actual, secret API key.
How to Run:
- Copy this command.
- Paste it into your terminal.
- Replace the placeholder with your key and press Enter.
- The
curl,javascript(Node.js), andpythonexamples in this guide will now work by reading this variable.
Step-by-Step: From URL to Generated JSON
Section titled “Step-by-Step: From URL to Generated JSON”Let’s walk through a complete, practical example.
The Goal: We want to analyze the Hacker News homepage (https://news.ycombinator.com). We don’t just want to scrape data; we want the AI to analyze each story and generate a category (e.g., “tech,” “business,” “science”) and write a new one-sentence summary for it.
Step 1: Define the Schema
Section titled “Step 1: Define the Schema”First, we define the shape of our desired output using json_schema. We want a top-level object containing a key called summaries, which should be an array. Each object in that array should have three string properties: title, category, and summary.
{ "json_schema": { "type": "object", "properties": { "summaries": { "type": "array", "items": { "type": "object", "properties": { "title": { "type": "string" }, "category": { "type": "string" }, "summary": { "type": "string" } } } } } }}Step 2: Write the Instructions
Section titled “Step 2: Write the Instructions”Next, we write the instructions to tell the AI how to populate this schema.
"instructions": "For each story on the page, find its title. Then, categorize it as tech/business/science/other and write a one-sentence summary in simple terms."This instruction is critical: it explicitly tells the AI to generate the new category and summary data.
Step 3: Assemble and Run the Request
Section titled “Step 3: Assemble and Run the Request”Now, let’s combine the target url, our json_schema, and our instructions into a single POST request to the https://api.tabstack.ai/v1/generate/json endpoint.
curl -X POST https://api.tabstack.ai/v1/generate/json \ -H "Authorization: Bearer $TABSTACK_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://news.ycombinator.com", "json_schema": { "type": "object", "properties": { "summaries": { "type": "array", "items": { "type": "object", "properties": { "title": {"type": "string"}, "category": {"type": "string"}, "summary": {"type": "string"} } } } } }, "instructions": "For each story, categorize it as tech/business/science/other and write a one-sentence summary in simple terms" }'async function generateJson() { const response = await fetch("https://api.tabstack.ai/v1/generate/json", { method: "POST", headers: { Authorization: `Bearer ${process.env.TABSTACK_API_KEY}`, "Content-Type": "application/json", }, body: JSON.stringify({ url: "https://news.ycombinator.com", json_schema: { type: "object", properties: { summaries: { type: "array", items: { type: "object", properties: { title: { type: "string" }, category: { type: "string" }, summary: { type: "string" }, }, }, }, }, }, instructions: "For each story, categorize it as tech/business/science/other and write a one-sentence summary in simple terms", }), });
const data = await response.json(); console.log(JSON.stringify(data, null, 2)); return data;}
generateJson();import requestsimport osimport json
response = requests.post( 'https://api.tabstack.ai/v1/generate/json', headers={ 'Authorization': f'Bearer {os.environ["TABSTACK_API_KEY"]}', 'Content-Type': 'application/json' }, json={ 'url': 'https://news.ycombinator.com', 'json_schema': { 'type': 'object', 'properties': { 'summaries': { 'type': 'array', 'items': { 'type': 'object', 'properties': { 'title': {'type': 'string'}, 'category': {'type': 'string'}, 'summary': {'type': 'string'} } } } } }, 'instructions': 'For each story, categorize it as tech/business/science/other and write a one-sentence summary in simple terms' })
data = response.json()print(json.dumps(data, indent=2))Code Explanation (Step-by-Step):
We send a POST request to the endpoint https://api.tabstack.ai/v1/generate/json. For authentication, we include the Authorization: Bearer $TABSTACK_API_KEY header, which uses the environment variable we set earlier. The Content-Type: application/json header tells the server that we are sending a JSON payload. The request body contains three key parameters: "url" specifies the target web page to fetch and analyze, "json_schema" provides the structured blueprint for the output, and "instructions" provides the natural language logic for the AI to follow.
How to Run:
- curl: You can run this command directly in your terminal, assuming you’ve set the
TABSTACK_API_KEYvariable. - JavaScript: Save the code as
generate.js. Make sure you’re in a Node.js project. Run it from your terminal usingnode generate.js. - Python: Save the code as
generate.py. Ensure you have therequestslibrary installed (pip install requests). Run it usingpython generate.py.
Step 4: Analyze the Response
Section titled “Step 4: Analyze the Response”A successful request will return a 200 OK status and the JSON payload, perfectly matching your schema.
{ "summaries": [ { "title": "New AI Model Released", "category": "tech", "summary": "A research lab announced a new language model that performs better on reasoning tasks." }, { "title": "Database Performance Tips", "category": "tech", "summary": "An engineer shares techniques that reduced database query times by 90%." }, { "title": "Climate Tech Startup Raises Funding", "category": "business", "summary": "A carbon capture company secured $50M in Series B funding." } ]}Response Explanation:
The key difference from extraction: the AI generated the category and summary fields based on your instructions. The title was extracted, but the other fields are newly created, AI-derived content. This is what makes the generate endpoint powerful—it creates data, not just finds it.
Request Parameters
Section titled “Request Parameters”Here is a detailed breakdown of the POST request body parameters.
url (required)
Section titled “url (required)”- Type:
string - Description: The fully qualified URL of the web page you want to process.
- Validation:
- Must be a valid, publicly accessible URL.
- Cannot be a
localhostaddress or a private/internal IP.
{ "url": "https://techblog.example.com/article"}json_schema (required)
Section titled “json_schema (required)”- Type:
object - Description: A valid JSON Schema object defining the exact structure, data types, and constraints for your desired output. The AI will strictly adhere to this schema.
- Tips for Effective Schemas:
- Be Specific: Use
string,number,boolean,array, andobjecttypes. - Use Descriptions: Add
descriptionfields to properties. The AI uses these as hints to generate better, more relevant data. - Use Enums: To constrain a field to a specific list of values (e.g., for categories), use the
enumkeyword.
- Be Specific: Use
{ "json_schema": { "type": "object", "properties": { "summary": { "type": "string", "description": "Overall summary of the content" }, "sentiment": { "type": "string", "enum": ["positive", "negative", "neutral"], "description": "Overall sentiment of the article" } }, "required": ["summary"] }}instructions (required)
Section titled “instructions (required)”- Type:
string - Description: Natural language instructions that tell the AI how to generate the data to fit your schema. This is where you define the transformation logic.
- Tips for Effective Instructions:
- Be Clear and Specific: “Write a 3-sentence summary” is better than “Summarize this.”
- Reference Schema Properties: Mention property names (e.g., “For the
sentimentfield, determine if the tone is…”). - Define Edge Cases: “If no author is found, set the
authorfield tonull.” - Specify Format: “Extract the key points as a list of bullet points.”
{ "instructions": "Read the article and create a concise summary (2-3 sentences). Determine if the overall tone is positive, negative, or neutral and assign it to the 'sentiment' field."}nocache (optional)
Section titled “nocache (optional)”- Type:
boolean - Default:
false - Description: When set to
true, this forces the API to bypass any cached version of the URL and re-fetch and re-process the content. - When to Use:
- Analyzing real-time or frequently updated content (e.g., news homepages, stock tickers).
- Debugging a request with different instructions on the same URL.
{ "url": "https://news.example.com", "json_schema": { ... }, "instructions": "Summarize the news", "nocache": true}Real-World Examples
Section titled “Real-World Examples”These examples show the request payload and the corresponding response. The key is to see how the instructions and json_schema work together.
Example 1: Content Summarization and Audience Analysis
Section titled “Example 1: Content Summarization and Audience Analysis”Goal: Generate a “TL;DR,” a longer summary, and identify the target audience for a blog post.
Request Payload:
{ "url": "https://techblog.example.com/long-article", "json_schema": { "type": "object", "properties": { "title": { "type": "string" }, "tldr": { "type": "string", "description": "One-sentence summary" }, "summary": { "type": "string", "description": "3-4 sentence summary" }, "targetAudience": { "type": "string", "description": "Who this is for" } } }, "instructions": "Create a TLDR (one sentence) and a longer summary (3-4 sentences) of this article. Also identify who the target audience is (e.g., developers, business leaders, general audience)."}Example Response:
{ "title": "Understanding Microservices Architecture", "tldr": "Microservices break large applications into small, independent services that communicate via APIs, offering better scalability and flexibility.", "summary": "This article explains microservices architecture, where applications are built as a collection of small, loosely coupled services. Each service handles a specific business function and can be developed, deployed, and scaled independently. The approach offers better fault isolation, technology flexibility, and team autonomy, though it introduces complexity in service coordination and data consistency.", "targetAudience": "Software developers and architects"}Example 2: Content Categorization and Tagging
Section titled “Example 2: Content Categorization and Tagging”Goal: Scrape a blog homepage, and for each post, generate a category, a list of tags, and a difficulty level.
Request Payload:
{ "url": "https://blog.example.com", "json_schema": { "type": "object", "properties": { "posts": { "type": "array", "items": { "type": "object", "properties": { "title": { "type": "string" }, "category": { "type": "string", "description": "One of: tutorial, opinion, news, case-study" }, "tags": { "type": "array", "items": { "type": "string" } }, "difficultyLevel": { "type": "string", "enum": ["beginner", "intermediate", "advanced"] } } } } } }, "instructions": "For each blog post, assign it to a category (tutorial/opinion/news/case-study), generate 3-5 relevant tags, and determine the technical difficulty level (beginner/intermediate/advanced)."}Example Response:
{ "posts": [ { "title": "Getting Started with React Hooks", "category": "tutorial", "tags": ["react", "hooks", "javascript", "frontend", "beginner-friendly"], "difficultyLevel": "beginner" }, { "title": "Optimizing Database Queries at Scale", "category": "tutorial", "tags": ["database", "performance", "optimization", "sql", "indexing"], "difficultyLevel": "advanced" }, { "title": "Why We Migrated to Kubernetes", "category": "case-study", "tags": [ "kubernetes", "devops", "infrastructure", "migration", "containers" ], "difficultyLevel": "intermediate" } ]}Example 3: Sentiment and Trend Analysis from Reviews
Section titled “Example 3: Sentiment and Trend Analysis from Reviews”Goal: Analyze a page of product reviews to find the overall sentiment, common themes, and a final recommendation.
Request Payload:
{ "url": "https://reviews.example.com/product/xyz", "json_schema": { "type": "object", "properties": { "overallSentiment": { "type": "string", "enum": [ "very positive", "positive", "neutral", "negative", "very negative" ] }, "rating": { "type": "number", "description": "Estimated average rating out of 5" }, "commonPraises": { "type": "array", "items": { "type": "string" } }, "commonComplaints": { "type": "array", "items": { "type": "string" } }, "recommendation": { "type": "string" } } }, "instructions": "Analyze all customer reviews on this page and determine: the overall sentiment, an estimated rating (1-5), the top 3 most common praises, the top 3 most common complaints, and write a final recommendation (yes/no with a brief reason)."}Example Response:
{ "overallSentiment": "positive", "rating": 4.2, "commonPraises": [ "Excellent build quality and durability", "Great battery life lasting 2-3 days", "Intuitive and easy-to-use interface" ], "commonComplaints": [ "Price is higher than competitors", "Limited color options available", "Charging cable is too short" ], "recommendation": "Yes - the product excels in quality and performance, making it worth the premium price for users who prioritize reliability over cost."}Working with Responses
Section titled “Working with Responses”Here are complete, runnable examples showing how to call the API and then process the AI-generated data in your application.
Example: Processing Key Insights
Section titled “Example: Processing Key Insights”This script fetches an article, asks the AI to generate key insights and action items, and then prints them in a formatted way.
async function analyzeAndProcess(url) { const payload = { url, json_schema: { type: "object", properties: { mainTopic: { type: "string" }, keyInsights: { type: "array", items: { type: "string" }, }, actionItems: { type: "array", items: { type: "string" }, }, }, }, instructions: "Identify the main topic, extract 3-5 key insights, and suggest 2-3 action items for someone reading this.", };
const response = await fetch("https://api.tabstack.ai/v1/generate/json", { method: "POST", headers: { Authorization: `Bearer ${process.env.TABSTACK_API_KEY}`, "Content-Type": "application/json", }, body: JSON.stringify(payload), });
if (!response.ok) { throw new Error(`API request failed: ${response.statusText}`); }
const data = await response.json();
// Process the AI-generated content console.log(`Main Topic: ${data.mainTopic}\n`);
console.log("Key Insights:"); data.keyInsights.forEach((insight, i) => { console.log(` ${i + 1}. ${insight}`); });
console.log("\nRecommended Actions:"); data.actionItems.forEach((action, i) => { console.log(` ${i + 1}. ${action}`); });
return data;}
// Call the functionanalyzeAndProcess("https://blog.example.com/article");import requestsimport osimport json
def analyze_and_process(url): payload = { 'url': url, 'json_schema': { 'type': 'object', 'properties': { 'mainTopic': {'type': 'string'}, 'keyInsights': { 'type': 'array', 'items': {'type': 'string'} }, 'actionItems': { 'type': 'array', 'items': {'type': 'string'} } } }, 'instructions': 'Identify the main topic, extract 3-5 key insights, and suggest 2-3 action items for someone reading this.' }
response = requests.post( 'https://api.tabstack.ai/v1/generate/json', headers={ 'Authorization': f'Bearer {os.environ["TABSTACK_API_KEY"]}', 'Content-Type': 'application/json' }, json=payload )
response.raise_for_status() # Raises an HTTPError for bad responses data = response.json()
# Process the AI-generated content print(f"Main Topic: {data['mainTopic']}\n")
print('Key Insights:') for i, insight in enumerate(data['keyInsights'], 1): print(f" {i}. {insight}")
print('\nRecommended Actions:') for i, action in enumerate(data['actionItems'], 1): print(f" {i}. {action}")
return data
# Call the functionif __name__ == "__main__": analyze_and_process('https://blog.example.com/article')Code Explanation (JavaScript):
This reusable function takes a URL, sends the generation request with your schema and instructions, handles errors, and processes the structured response. The AI returns data matching your schema, which you can then display, store, or use however needed.
How to Run:
- JavaScript: Save as
analyze.js. Run withnode analyze.js. - Python: Save as
analyze.py. Run withpython analyze.py. (Remember topip install requests).
Error Handling
Section titled “Error Handling”Well-structured error handling is essential for a robust application. The API returns standard HTTP status codes.
Common Error Status Codes
Section titled “Common Error Status Codes”| Status Code | Error Message | Description |
|---|---|---|
| 400 | url is required | The url parameter was missing from your request body. |
| 400 | json schema is required | The json_schema parameter was missing. |
| 400 | instructions are required | The instructions parameter was missing. |
| 400 | json schema must be a valid object | The json_schema you provided was malformed or not valid. |
| 401 | Unauthorized - Invalid token | Your API key is missing, invalid, or expired. |
| 422 | url is invalid | The url was malformed or pointed to an inaccessible resource. |
| 500 | failed to fetch URL | The server had a problem fetching the provided URL. |
| 500 | web page is too large | The target page’s content exceeded the maximum processing size. |
| 500 | failed to transform data | The AI failed to generate data. This can be due to overly complex instructions, a schema mismatch, or bad source content. |
All error responses return a JSON object with an error field.
{ "error": "instructions are required"}Error Handling Example Code
Section titled “Error Handling Example Code”This example expands our previous function with robust try...catch blocks and status-specific error messages.
async function generateWithErrorHandling(url, schema, instructions) { try { const response = await fetch("https://api.tabstack.ai/v1/generate/json", { method: "POST", headers: { Authorization: `Bearer ${process.env.TABSTACK_API_KEY}`, "Content-Type": "application/json", }, body: JSON.stringify({ url, json_schema: schema, instructions, }), });
const data = await response.json();
if (!response.ok) { // Handle API-level errors switch (response.status) { case 400: throw new Error(`Bad request: ${data.error}`); case 401: throw new Error("Authentication failed. Check your API key."); case 422: throw new Error(`Invalid URL: ${data.error}`); case 500: if (data.error.includes("too large")) { throw new Error("Page is too large. Try a different URL."); } else if (data.error.includes("transform")) { throw new Error( "AI transformation failed. Try simplifying your instructions." ); } throw new Error(`Server error: ${data.error}`); default: throw new Error(`Request failed (${response.status}): ${data.error}`); } }
return data; } catch (error) { // Handle network errors or thrown exceptions console.error("Error generating JSON:", error.message); throw error; }}
// Usageconst schema = { type: "object", properties: { summary: { type: "string" }, },};
generateWithErrorHandling( "https://example.com", schema, "Create a brief summary").then((data) => console.log(data));import requestsimport osimport json
def generate_with_error_handling(url, schema, instructions): try: response = requests.post( 'https://api.tabstack.ai/v1/generate/json', headers={ 'Authorization': f'Bearer {os.environ["TABSTACK_API_KEY"]}', 'Content-Type': 'application/json' }, json={ 'url': url, 'json_schema': schema, 'instructions': instructions }, timeout=30 )
data = response.json()
if not response.ok: error_msg = data.get('error', 'Unknown error')
if response.status_code == 400: raise ValueError(f'Bad request: {error_msg}') elif response.status_code == 401: raise PermissionError('Authentication failed. Check your API key.') elif response.status_code == 422: raise ValueError(f'Invalid URL: {error_msg}') elif response.status_code == 500: if 'too large' in error_msg: raise RuntimeError('Page is too large. Try a different URL.') elif 'transform' in error_msg: raise RuntimeError('AI transformation failed. Try simplifying instructions.') raise RuntimeError(f'Server error: {error_msg}') else: response.raise_for_status()
return data
except requests.exceptions.Timeout: raise TimeoutError('The request timed out.') except requests.exceptions.HTTPError as e: print(f'HTTP Error: {e}') raise except requests.exceptions.RequestException as e: print(f'Network error: {e}') raise
# Usageschema = { 'type': 'object', 'properties': { 'summary': {'type': 'string'} }}
try: data = generate_with_error_handling( 'https://example.com', schema, 'Create a brief summary' ) print(json.dumps(data, indent=2))except Exception as e: print(f'Failed to generate: {e}')Code Explanation (JavaScript):
This version adds robust error handling. Parse the response first so you can access error messages, then check response.ok. Use a switch statement to provide specific error messages for different status codes. This makes debugging easier—you know immediately if the issue is authentication, a bad URL, or an AI generation failure.
How to Run:
- This code is designed to be part of a larger application. You can test it by running the file. To trigger an error, try passing an invalid URL (e.g.,
http://invalid-url-123.xyz) or an emptyinstructionsstring.
Best Practices
Section titled “Best Practices”Follow these practices to get the most reliable and accurate results from the AI.
1. Write Clear, Specific Instructions
Section titled “1. Write Clear, Specific Instructions”The quality of your output is directly proportional to the quality of your instructions. Vague instructions lead to vague results.
Vague:
"instructions": "Summarize this"Better:
"instructions": "Write a 2-sentence summary of this article"Best:
"instructions": "Create a concise summary of this article for the 'summary' field. The summary should be 2-3 sentences, written in a professional but accessible style, and focus on the main argument and key findings."
2. Use Schema Descriptions Effectively
Section titled “2. Use Schema Descriptions Effectively”The AI reads the description fields in your json_schema. Use them to provide context and hints for each property.
Schema without descriptions (less reliable):
{ "type": "object", "properties": { "technicalLevel": { "type": "string", "enum": ["beginner", "intermediate", "advanced"] } }}Schema with descriptions (more reliable):
{ "type": "object", "properties": { "technicalLevel": { "type": "string", "enum": ["beginner", "intermediate", "advanced"], "description": "Assess the technical complexity of the content. 'Beginner' means no prior knowledge needed. 'Advanced' means deep technical expertise is required." } }}3. Test Instructions Iteratively
Section titled “3. Test Instructions Iteratively”Start with a simple schema and simple instructions. Get that working, and then gradually add complexity.
- Start Simple:
"instructions": "Summarize this article in 2 sentences." - Test and Refine: Does it work? Good. Now add more.
- Add Complexity:
"instructions": "Summarize this article in 2 sentences and identify the target audience." - Test Again:
"instructions": "Summarize... identify the target audience... list 3 key takeaways... and rate the technical difficulty from 1-5."
This iterative process is much easier to debug than writing a highly complex prompt from scratch.
4. Monitor and Log Generations
Section titled “4. Monitor and Log Generations”For production systems, log your requests and the AI’s (successful) responses. This allows you to monitor for quality, identify edge cases where the AI struggles, and build a dataset for finetuning your instructions.
This function logs key metadata about each request.
async function generateWithLogging(url, schema, instructions) { const startTime = Date.now();
try { const data = await generateWithErrorHandling(url, schema, instructions); // Re-use our error handler const duration = Date.now() - startTime;
console.log( JSON.stringify({ level: "info", timestamp: new Date().toISOString(), url, status: "success", durationMs: duration, instructionsLength: instructions.length, responseSize: JSON.stringify(data).length, }) );
return data; } catch (error) { const duration = Date.now() - startTime;
console.error( JSON.stringify({ level: "error", timestamp: new Date().toISOString(), url, status: "error", durationMs: duration, errorMessage: error.message, }) ); throw error; }}import timeimport logging
# Configure logginglogging.basicConfig(level=logging.INFO, format='%(asctime)s - %(message)s')
def generate_with_logging(url, schema, instructions): start_time = time.time()
try: data = generate_with_error_handling(url, schema, instructions) # Re-use error handler duration_ms = (time.time() - start_time) * 1000
logging.info(json.dumps({ 'url': url, 'status': 'success', 'duration_ms': duration_ms, 'instructions_length': len(instructions), 'response_size': len(json.dumps(data)) }))
return data except Exception as e: duration_ms = (time.time() - start_time) * 1000
logging.error(json.dumps({ 'url': url, 'status': 'error', 'duration_ms': duration_ms, 'error_message': str(e) })) raiseCode Explanation (JavaScript):
We record the time before the request using const startTime = Date.now(). Inside the try block, we call our robust generateWithErrorHandling function to make the API request. After the request completes, we calculate the request duration. We then log a structured JSON object using console.log(JSON.stringify(...)). In a real application, you would send this to a logging service like DataDog, Sentry, or your own database. The catch (error) block also logs a structured error message on failure, ensuring that both successful and failed requests are properly tracked.
How to Run:
- Use this
generateWithLoggingfunction as your new primary entry point for making API calls.