Extract Features

The Extract operator converts web pages into structured, usable data. It provides three powerful methods for working with web content: converting to Markdown, generating JSON schemas, and extracting structured JSON data.

Overview

The extract operator is accessed through your TABStack client instance:

TypeScript
JavaScript

import { TABStack } from '@tabstack/sdk';

const tabs = new TABStack({
  apiKey: process.env.TABSTACK_API_KEY!
});

// Access extract methods
tabs.extract.markdown(url, options);
tabs.extract.schema(url, options);
tabs.extract.json(url, schema, options);

const { TABStack } = require('@tabstack/sdk');

const tabs = new TABStack({
  apiKey: process.env.TABSTACK_API_KEY
});

// Access extract methods
tabs.extract.markdown(url, options);
tabs.extract.schema(url, options);
tabs.extract.json(url, schema, options);

Extract Markdown

Convert any web page to clean, readable Markdown format. This is perfect for:

Creating readable content from web pages
Feeding content to LLMs
Archiving web content
Building documentation from web sources

Basic Usage

TypeScript
JavaScript

const result = await tabs.extract.markdown('https://example.com/blog/article');
console.log(result.content);

const result = await tabs.extract.markdown('https://example.com/blog/article');
console.log(result.content);

With Metadata

Extract additional page metadata (title, description, author, etc.) alongside the content:

TypeScript
JavaScript

const result = await tabs.extract.markdown('https://example.com/blog/article', {
  metadata: true
});

console.log('Content:', result.content);
console.log('Title:', result.metadata?.title);
console.log('Description:', result.metadata?.description);
console.log('Author:', result.metadata?.author);
console.log('Image:', result.metadata?.image);

const result = await tabs.extract.markdown('https://example.com/blog/article', {
  metadata: true
});

console.log('Content:', result.content);
console.log('Title:', result.metadata?.title);
console.log('Description:', result.metadata?.description);
console.log('Author:', result.metadata?.author);
console.log('Image:', result.metadata?.image);

Bypass Cache

Force a fresh fetch when you need the latest content:

TypeScript
JavaScript

const result = await tabs.extract.markdown('https://example.com/live-prices', {
  nocache: true
});

const result = await tabs.extract.markdown('https://example.com/live-prices', {
  nocache: true
});

Response Type

The markdown method returns a MarkdownResponse object:

interface MarkdownResponse {
  url: string;           // Source URL
  content: string;       // Markdown content
  metadata?: Metadata;   // Optional page metadata
}

interface Metadata {
  title?: string;
  description?: string;
  author?: string;
  publisher?: string;
  image?: string;
  siteName?: string;
  url?: string;
  type?: string;
}

Generate Schema

Automatically generate a JSON schema from any web page using AI. This is the fastest way to create schemas without writing them manually.

Basic Usage

TypeScript
JavaScript

const schema = await tabs.extract.schema('https://news.ycombinator.com', {
  instructions: 'extract top stories with title, points, and author'
});

console.log(JSON.stringify(schema, null, 2));

const schema = await tabs.extract.schema('https://news.ycombinator.com', {
  instructions: 'extract top stories with title, points, and author'
});

console.log(JSON.stringify(schema, null, 2));

Generated Schema:

{
  "type": "object",
  "properties": {
    "stories": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "title": { "type": "string" },
          "points": { "type": "number" },
          "author": { "type": "string" }
        },
        "required": ["title", "points", "author"]
      }
    }
  },
  "required": ["stories"]
}

Use Generated Schema Immediately

Chain schema generation with extraction:

TypeScript
JavaScript

// Step 1: Generate the schema
const schema = await tabs.extract.schema('https://news.ycombinator.com', {
  instructions: 'extract top 5 stories with title, points, and author'
});

// Step 2: Use it to extract data
const result = await tabs.extract.json('https://news.ycombinator.com', schema);
console.log(result.data);

// Step 1: Generate the schema
const schema = await tabs.extract.schema('https://news.ycombinator.com', {
  instructions: 'extract top 5 stories with title, points, and author'
});

// Step 2: Use it to extract data
const result = await tabs.extract.json('https://news.ycombinator.com', schema);
console.log(result.data);

Extract JSON

Extract structured data from web pages that matches your JSON schema. This is the most powerful extraction method.

Basic Usage

TypeScript
JavaScript

const schema = {
  type: 'object',
  properties: {
    name: { type: 'string' },
    price: { type: 'number' },
    inStock: { type: 'boolean' }
  },
  required: ['name', 'price', 'inStock']
};

const result = await tabs.extract.json('https://shop.com/product/123', schema);
console.log(result.data);

const schema = {
  type: 'object',
  properties: {
    name: { type: 'string' },
    price: { type: 'number' },
    inStock: { type: 'boolean' }
  },
  required: ['name', 'price', 'inStock']
};

const result = await tabs.extract.json('https://shop.com/product/123', schema);
console.log(result.data);

Type-Safe Extraction (TypeScript)

Use TypeScript generics for type-safe data access:

interface Product {
  name: string;
  price: number;
  inStock: boolean;
  features?: string[];
}

const schema = {
  type: 'object',
  properties: {
    name: { type: 'string' },
    price: { type: 'number' },
    inStock: { type: 'boolean' },
    features: { type: 'array', items: { type: 'string' } }
  },
  required: ['name', 'price', 'inStock']
};

// Type-safe response
const result = await tabs.extract.json<Product>('https://shop.com/product', schema);

// TypeScript knows the structure
console.log(result.data.name);        // string
console.log(result.data.price);       // number
console.log(result.data.inStock);     // boolean
console.log(result.data.features);    // string[] | undefined

Real-World Examples

Example 1: News Scraping

Extract articles from a news site:

TypeScript
JavaScript

interface Article {
  title: string;
  summary: string;
  url: string;
  publishedAt: string;
  category?: string;
}

interface NewsData {
  articles: Article[];
}

const schema = {
  type: 'object',
  properties: {
    articles: {
      type: 'array',
      items: {
        type: 'object',
        properties: {
          title: { type: 'string' },
          summary: { type: 'string' },
          url: { type: 'string' },
          publishedAt: { type: 'string' },
          category: { type: 'string' }
        },
        required: ['title', 'url']
      }
    }
  }
};

const result = await tabs.extract.json<NewsData>('https://news.example.com', schema);

result.data.articles.forEach(article => {
  console.log(`${article.title} (${article.category})`);
  console.log(`  ${article.url}`);
  console.log(`  Published: ${article.publishedAt}`);
});

const schema = {
  type: 'object',
  properties: {
    articles: {
      type: 'array',
      items: {
        type: 'object',
        properties: {
          title: { type: 'string' },
          summary: { type: 'string' },
          url: { type: 'string' },
          publishedAt: { type: 'string' },
          category: { type: 'string' }
        },
        required: ['title', 'url']
      }
    }
  }
};

const result = await tabs.extract.json('https://news.example.com', schema);

result.data.articles.forEach(article => {
  console.log(`${article.title} (${article.category})`);
  console.log(`  ${article.url}`);
  console.log(`  Published: ${article.publishedAt}`);
});

Example 2: E-commerce Product Extraction

Extract product information from an online store:

TypeScript
JavaScript

interface ProductCatalog {
  products: Array<{
    name: string;
    price: number;
    currency: string;
    inStock: boolean;
    rating: number;
  }>;
}

const schema = {
  type: 'object',
  properties: {
    products: {
      type: 'array',
      items: {
        type: 'object',
        properties: {
          name: { type: 'string' },
          price: { type: 'number' },
          currency: { type: 'string' },
          inStock: { type: 'boolean' },
          rating: { type: 'number' }
        },
        required: ['name', 'price']
      }
    }
  }
};

const result = await tabs.extract.json<ProductCatalog>(
  'https://shop.example.com/category/laptops',
  schema
);

// Filter in-stock products sorted by rating
const topProducts = result.data.products
  .filter(p => p.inStock)
  .sort((a, b) => (b.rating || 0) - (a.rating || 0))
  .slice(0, 5);

console.log('Top 5 In-Stock Products:');
topProducts.forEach((product, i) => {
  console.log(`${i + 1}. ${product.name} - ${product.currency}${product.price}`);
  console.log(`   Rating: ${product.rating} stars`);
});

const schema = {
  type: 'object',
  properties: {
    products: {
      type: 'array',
      items: {
        type: 'object',
        properties: {
          name: { type: 'string' },
          price: { type: 'number' },
          currency: { type: 'string' },
          inStock: { type: 'boolean' },
          rating: { type: 'number' }
        },
        required: ['name', 'price']
      }
    }
  }
};

const result = await tabs.extract.json(
  'https://shop.example.com/category/laptops',
  schema
);

// Filter in-stock products sorted by rating
const topProducts = result.data.products
  .filter(p => p.inStock)
  .sort((a, b) => (b.rating || 0) - (a.rating || 0))
  .slice(0, 5);

console.log('Top 5 In-Stock Products:');
topProducts.forEach((product, i) => {
  console.log(`${i + 1}. ${product.name} - ${product.currency}${product.price}`);
  console.log(`   Rating: ${product.rating} stars`);
});

Example 3: Multi-Page Data Extraction

Extract data from multiple pages:

TypeScript
JavaScript

const urls = [
  'https://example.com/products/page-1',
  'https://example.com/products/page-2',
  'https://example.com/products/page-3'
];

const schema = {
  type: 'object',
  properties: {
    products: {
      type: 'array',
      items: {
        type: 'object',
        properties: {
          name: { type: 'string' },
          price: { type: 'number' }
        }
      }
    }
  }
};

// Extract from all pages
const allProducts = [];
for (const url of urls) {
  const result = await tabs.extract.json(url, schema);
  allProducts.push(...result.data.products);

  // Be respectful - add a small delay between requests
  await new Promise(resolve => setTimeout(resolve, 500));
}

console.log(`Extracted ${allProducts.length} total products`);

const urls = [
  'https://example.com/products/page-1',
  'https://example.com/products/page-2',
  'https://example.com/products/page-3'
];

const schema = {
  type: 'object',
  properties: {
    products: {
      type: 'array',
      items: {
        type: 'object',
        properties: {
          name: { type: 'string' },
          price: { type: 'number' }
        }
      }
    }
  }
};

// Extract from all pages
const allProducts = [];
for (const url of urls) {
  const result = await tabs.extract.json(url, schema);
  allProducts.push(...result.data.products);

  // Be respectful - add a small delay between requests
  await new Promise(resolve => setTimeout(resolve, 500));
}

console.log(`Extracted ${allProducts.length} total products`);

Options Reference

ExtractMarkdownOptions

Option	Type	Default	Description
`metadata`	`boolean`	`false`	Include page metadata (title, description, author, etc.)
`nocache`	`boolean`	`false`	Bypass cache and force fresh fetch

ExtractSchemaOptions

Option	Type	Default	Description
`instructions`	`string`	-	Natural language description of what data to extract
`nocache`	`boolean`	`false`	Bypass cache and force fresh fetch

ExtractJsonOptions

Option	Type	Default	Description
`nocache`	`boolean`	`false`	Bypass cache and force fresh fetch

Best Practices

1. Generate Schemas First

Instead of writing complex schemas manually, use extract.schema() to generate them automatically:

// ✅ Good: Generate schema automatically
const schema = await tabs.extract.schema(url, {
  instructions: 'extract product details'
});

// ❌ Tedious: Writing complex schemas by hand
const schema = { /* hundreds of lines */ };

2. Use Type Safety in TypeScript

Define interfaces for your data and use generic types:

// ✅ Good: Type-safe with interfaces
interface MyData {
  title: string;
  items: Array<{ name: string; value: number }>;
}
const result = await tabs.extract.json<MyData>(url, schema);
// result.data is typed as MyData

// ❌ Loses type safety
const result = await tabs.extract.json(url, schema);
// result.data is typed as unknown

3. Handle Optional Fields

Make schemas flexible for real-world data:

const schema = {
  type: 'object',
  properties: {
    title: { type: 'string' },
    price: { type: ['number', 'null'] },  // Can be null
    rating: { type: 'number' }            // Optional (not in required)
  },
  required: ['title']  // Only title is required
};

4. Use Cache Strategically

Default (cached): Fast and efficient for most use cases
nocache: true: Only when you need real-time data or debugging

// ✅ Good: Cache for static content
const blog = await tabs.extract.markdown('https://blog.com/article');

// ✅ Good: No cache for live data
const prices = await tabs.extract.json('https://stocks.com/live', schema, {
  nocache: true
});

Next Steps

Generate Features: Transform and analyze extracted data with AI
Automate Features: Execute complex browser automation tasks
Error Handling: Build robust applications with proper error handling
REST API Reference: See the underlying REST API endpoints

Overview​

Extract Markdown​

Basic Usage​

With Metadata​

Bypass Cache​

Response Type​

Generate Schema​

Basic Usage​

Use Generated Schema Immediately​

Extract JSON​

Basic Usage​

Type-Safe Extraction (TypeScript)​

Real-World Examples​

Example 1: News Scraping​

Example 2: E-commerce Product Extraction​

Example 3: Multi-Page Data Extraction​

Options Reference​

ExtractMarkdownOptions​

ExtractSchemaOptions​

ExtractJsonOptions​

Best Practices​

1. Generate Schemas First​

2. Use Type Safety in TypeScript​

3. Handle Optional Fields​

4. Use Cache Strategically​

Next Steps​

Overview

Extract Markdown

Basic Usage

With Metadata

Bypass Cache

Response Type

Generate Schema

Basic Usage

Use Generated Schema Immediately

Extract JSON

Basic Usage

Type-Safe Extraction (TypeScript)

Real-World Examples

Example 1: News Scraping

Example 2: E-commerce Product Extraction

Example 3: Multi-Page Data Extraction

Options Reference

ExtractMarkdownOptions

ExtractSchemaOptions

ExtractJsonOptions

Best Practices

1. Generate Schemas First

2. Use Type Safety in TypeScript

3. Handle Optional Fields

4. Use Cache Strategically

Next Steps