Guides

Mastering the Markdown Endpoint

Web pages are full of noise — navigation, ads, sidebars, and boilerplate. When all you need is the content, the /extract/markdown endpoint fetches any public URL, strips the cruft, and returns clean, well-formatted Markdown.

Use it for:

Building content aggregation or read-it-later apps.
Preparing web content for LLM processing and RAG pipelines.
Converting blog posts or articles into a stable, storable format.
Powering documentation and content management systems.

Prerequisites & Authentication

You’ll need a Tabstack API key. Get one by signing up at https://tabstack.ai.

Install the SDK:

npm install @tabstack/sdk

pip install tabstack

export TABSTACK_API_KEY="your-api-key-here"

The SDK reads TABSTACK_API_KEY from your environment automatically.

The Basic Request

import Tabstack from '@tabstack/sdk'

const client = new Tabstack()

const result = await client.extract.markdown({
  url: 'https://example.com/blog/article'
})

console.log(result.content)

from tabstack import Tabstack

client = Tabstack()

result = client.extract.markdown(url="https://example.com/blog/article")
print(result.content)

curl -X POST https://api.tabstack.ai/v1/extract/markdown \
  -H "Authorization: Bearer $TABSTACK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/blog/article"
  }'

Note: The examples below use placeholder URLs like https://example.com/blog/article. Replace them with the URL of the page you want to convert.

Default Response: Content with Frontmatter

A successful request returns a JSON object. By default, the API embeds all extracted metadata (title, author, etc.) as YAML frontmatter at the top of the content.

{
  "url": "https://example.com/blog/article",
  "content": "---\ntitle: Example Article Title\ndescription: This is an example article...\nauthor: Example Author\nimage: https://example.com/images/article.jpg\n---\n\n# Example Article Title\n\nThis is the article content converted to markdown..."
}

The response includes the processed URL and the content with YAML frontmatter embedded — perfect for static site generators like Hugo or Jekyll that expect this format.

Getting Separate Metadata

YAML frontmatter is convenient, but when you need metadata as a structured JSON object — for populating databases or feeding other systems — add metadata: true to your request.

import Tabstack from '@tabstack/sdk'

const client = new Tabstack()

const result = await client.extract.markdown({
  url: 'https://example.com/blog/article',
  metadata: true
})

console.log(result.content)           // pure markdown, no frontmatter
console.log(result.metadata?.title)   // "Example Article Title"

from tabstack import Tabstack

client = Tabstack()

result = client.extract.markdown(
    url="https://example.com/blog/article",
    metadata=True
)

print(result.content)
print(result.metadata.title if result.metadata else None)

curl -X POST https://api.tabstack.ai/v1/extract/markdown \
  -H "Authorization: Bearer $TABSTACK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/blog/article",
    "metadata": true
  }'

Response: Clean Content + Metadata Object

With metadata: true, the response separates content and metadata into distinct fields.

{
  "url": "https://example.com/blog/article",
  "content": "# Example Article Title\n\nThis is the article content converted to markdown...",
  "metadata": {
    "title": "Example Article Title",
    "description": "This is an example article description",
    "author": "Example Author",
    "publisher": "Example Publisher",
    "image": "https://example.com/images/article.jpg",
    "site_name": "Example Blog",
    "url": "https://example.com/blog/article",
    "type": "article"
  }
}

content contains pure Markdown without frontmatter, and metadata is a structured JSON object. No YAML parsing needed.

Note: Use metadata: true for most programmatic use cases. It’s more reliable than parsing YAML, which can break if titles or descriptions contain special characters.

Forcing a Fresh Fetch

The Tabstack API caches results for a short period. For static content like blog posts, this is a performance win. For breaking news or live feeds, use nocache: true to bypass the cache.

import Tabstack from '@tabstack/sdk'

const client = new Tabstack()

const result = await client.extract.markdown({
  url: 'https://news-site.com/breaking-news',
  nocache: true
})

console.log(result.content)

from tabstack import Tabstack

client = Tabstack()

result = client.extract.markdown(
    url="https://news-site.com/breaking-news",
    nocache=True
)

print(result.content)

curl -X POST https://api.tabstack.ai/v1/extract/markdown \
  -H "Authorization: Bearer $TABSTACK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://news-site.com/breaking-news",
    "nocache": true
  }'

Note: Use nocache judiciously. Forcing a fresh fetch will result in slightly slower response times, as the API cannot serve the request from its cache.

Production-Ready Error Handling

The SDK raises typed exceptions for API errors, so you can handle specific failure modes without inspecting raw HTTP status codes.

Exception	Status	Cause
`BadRequestError`	400	Missing or malformed request body
`AuthenticationError`	401	API key missing, invalid, or expired
`UnprocessableEntityError`	422	Invalid URL or access to private resources
`RateLimitError`	429	Rate limit exceeded
`InternalServerError`	500+	Target server down or internal conversion failure

The SDK automatically retries on 429 and 500+ errors (2 attempts with exponential backoff).

Robust Error Handling Examples

import Tabstack, {
  AuthenticationError,
  BadRequestError,
  UnprocessableEntityError,
  RateLimitError,
  InternalServerError,
} from '@tabstack/sdk'

const client = new Tabstack()

async function getMarkdownFromUrl(url: string, forceFresh = false) {
  try {
    const result = await client.extract.markdown({
      url,
      metadata: true,
      nocache: forceFresh,
    })
    return result
  } catch (err) {
    if (err instanceof AuthenticationError) {
      console.error('Invalid API key — check TABSTACK_API_KEY')
    } else if (err instanceof UnprocessableEntityError) {
      console.error(`Invalid URL: ${url}`)
    } else if (err instanceof RateLimitError) {
      console.error('Rate limit exceeded — back off and retry')
    } else if (err instanceof BadRequestError) {
      console.error('Bad request — missing or malformed parameters')
    } else if (err instanceof InternalServerError) {
      console.error(`Server error fetching ${url}`)
    } else {
      throw err
    }
    return null
  }
}

const result = await getMarkdownFromUrl('https://example.com/article')
if (result) {
  console.log(`Title: ${result.metadata?.title}`)
  console.log(result.content)
}

import logging
from tabstack import Tabstack
from tabstack import (
    AuthenticationError,
    BadRequestError,
    UnprocessableEntityError,
    RateLimitError,
    InternalServerError,
)

logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s')

client = Tabstack()

def get_markdown_from_url(url: str, force_fresh: bool = False):
    try:
        result = client.extract.markdown(
            url=url,
            metadata=True,
            nocache=force_fresh,
        )
        return result
    except AuthenticationError:
        logging.error("Invalid API key — check TABSTACK_API_KEY")
    except UnprocessableEntityError:
        logging.error(f"Invalid URL: {url}")
    except RateLimitError:
        logging.error("Rate limit exceeded — back off and retry")
    except BadRequestError:
        logging.error("Bad request — missing or malformed parameters")
    except InternalServerError:
        logging.error(f"Server error fetching {url}")
    return None

result = get_markdown_from_url("https://example.com/article")
if result:
    logging.info(f"Title: {result.metadata.title if result.metadata else None}")
    print(result.content)

#!/bin/bash
# Requires: curl, jq

API_KEY="$TABSTACK_API_KEY"
URL_TO_FETCH="$1"

if [ -z "$API_KEY" ]; then
    echo "Error: TABSTACK_API_KEY environment variable not set." >&2
    exit 1
fi

if [ -z "$URL_TO_FETCH" ]; then
    echo "Usage: $0 <url-to-fetch>" >&2
    exit 1
fi

response=$(curl -s -w "\n%{http_code}" \
  -X POST https://api.tabstack.ai/v1/extract/markdown \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  --connect-timeout 10 \
  --max-time 30 \
  -d '{
        "url": "'"$URL_TO_FETCH"'",
        "metadata": true,
        "nocache": false
      }')

http_code=$(echo "$response" | tail -n1)
response_body=$(echo "$response" | sed '$d')

if [ "$http_code" -eq 200 ]; then
  echo "Success:"
  echo "$response_body" | jq .
else
  echo "Error (HTTP $http_code):" >&2
  echo "$response_body" | jq .error 2>/dev/null || echo "$response_body" >&2
  exit 1
fi

Quick Reference

Request Parameters

Parameter	Type	Required	Default	Description
`url`	string	Yes		The publicly accessible URL to convert.
`effort`	string	No	`standard`	Rendering effort: `min`, `standard`, or `max`.
`metadata`	boolean	No	`false`	If `true`, returns metadata as a separate object. If `false`, embeds metadata as YAML frontmatter in `content`.
`nocache`	boolean	No	`false`	If `true`, bypasses the cache and forces a fresh fetch.
`geo_target`	object	No		`{ country: 'US' }` — ISO 3166-1 alpha-2 country code.

Metadata Object Fields

When metadata: true is set, these fields may be present. Availability depends on what the source page provides.

Field	Type	Description
`title`	string	Page title from Open Graph or HTML `<title>`.
`description`	string	Page description from Open Graph or HTML meta tags.
`author`	string	Author information from HTML metadata.
`publisher`	string	Publisher name from Open Graph.
`image`	string	Featured image URL from Open Graph.
`site_name`	string	Website name from Open Graph.
`url`	string	Canonical URL from Open Graph.
`type`	string	Content type from Open Graph (e.g., `"article"`).
`created_at`	string	Publication date, if available.
`modified_at`	string	Last modified date, if available.
`keywords`	string[]	Keywords or tags, if available.