Developer's Guide

Mastering the Markdown Endpoint

Scraping web content is often messy. You're left with a complex tangle of HTML, boilerplate, and ads, when all you really want is the clean, structured content.

The TABS API Markdown Endpoint solves this. It's a single POST request that fetches any public URL, intelligently parses the HTML, and returns clean, well-formatted Markdown. It's the perfect tool for:

Building content aggregation or "read-it-later" apps.
Preparing web content for AI/LLM processing and RAG pipelines.
Converting blog posts or articles into a stable, storable format.
Powering documentation and content management systems.

This guide will walk you through setting up your environment, making your first request, and building a production-ready function to handle content conversion robustly.

Prerequisites & Authentication

Before you begin, you'll need a TABS API key. You can get yours by signing up at https://tabstack.ai.

The API uses Bearer token authentication. We strongly recommend storing your key as an environment variable rather than hardcoding it in your application.

First, set the variable in your terminal session.

export TABS_API_KEY="your-api-key-here"

The export TABS_API_KEY=... Bash command sets an environment variable named TABS_API_KEY for your current session. Your application code (e.g., in Python or Node.js) can then access this variable, keeping your secret key out of your source code.

The Basic Request

Let's start by converting a URL with the simplest possible request. The endpoint lives at https://api.tabstack.ai/v1/extract/markdown and expects a POST request with a JSON body.

curl
JavaScript
Python

curl -X POST https://api.tabstack.ai/v1/extract/markdown \
  -H "Authorization: Bearer $TABS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/blog/article"
  }'

const response = await fetch('https://api.tabstack.ai/v1/extract/markdown', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${process.env.TABS_API_KEY}`,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    url: 'https://example.com/blog/article'
  })
});

const data = await response.json();
console.log(data);

import requests
import os

api_key = os.environ.get("TABS_API_KEY")
endpoint_url = "https://api.tabstack.ai/v1/extract/markdown"

headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
}

payload = {
    "url": "https://example.com/blog/article"
}

response = requests.post(endpoint_url, headers=headers, json=payload)
data = response.json()

print(data)

All three examples make an authenticated POST request with a JSON body containing the target URL. The API fetches the page, extracts the main content, and returns it as clean Markdown with metadata.

Default Response: Content with Frontmatter

A successful request returns a JSON object. By default, the API cleverly embeds all extracted metadata (like title, author, etc.) as YAML frontmatter at the top of the content.

{
  "url": "https://example.com/blog/article",
  "content": "---\ntitle: Example Article Title\ndescription: This is an example article...\nauthor: Example Author\nimage: https://example.com/images/article.jpg\n---\n\n# Example Article Title\n\nThis is the article content converted to markdown..."
}

The response includes the processed URL and the content with YAML frontmatter embedded—perfect for static site generators like Hugo or Jekyll that expect this format.

Getting Separate Metadata

YAML frontmatter is great, but sometimes you want metadata as a clean, parsable JSON object, separate from the content. This is essential for populating databases or feeding structured data to other systems.

To do this, simply add the metadata: true parameter to your request.

curl
JavaScript
Python

curl -X POST https://api.tabstack.ai/v1/extract/markdown \
  -H "Authorization: Bearer $TABS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/blog/article",
    "metadata": true
  }'

const response = await fetch('https://api.tabstack.ai/v1/extract/markdown', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${process.env.TABS_API_KEY}`,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    url: 'https://example.com/blog/article',
    metadata: true  // Request separate metadata
  })
});

const data = await response.json();
console.log(data);

import requests
import os

api_key = os.environ.get("TABS_API_KEY")
endpoint_url = "https://api.tabstack.ai/v1/extract/markdown"

headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
}

payload = {
    "url": "https://example.com/blog/article",
    "metadata": True  # Request separate metadata
}

response = requests.post(endpoint_url, headers=headers, json=payload)
data = response.json()

print(data)

Adding metadata: true to the request changes the response format to separate content and metadata into distinct fields.

New Response: Clean Content + Metadata Object

By setting metadata: true, the response structure now includes a separate metadata object.

{
  "url": "https://example.com/blog/article",
  "content": "# Example Article Title\n\nThis is the article content converted to markdown...",
  "metadata": {
    "title": "Example Article Title",
    "description": "This is an example article description",
    "author": "Example Author",
    "publisher": "Example Publisher",
    "image": "https://example.com/images/article.jpg",
    "site_name": "Example Blog",
    "url": "https://example.com/blog/article",
    "type": "article"
  }
}

Now content contains pure Markdown without frontmatter, and metadata is a structured JSON object. This format is easier to work with programmatically—no YAML parsing needed.

✨ Pro Tip: We recommend using metadata: true for most programmatic use cases. It's more reliable and easier than parsing YAML, which can be brittle if descriptions or titles contain special characters.

Forcing a Fresh Fetch

For performance, the TABS API caches results for a short period. This is perfect for static content like blog posts. However, if you're scraping a breaking news site or a live feed, you'll want to bypass the cache.

You can do this using the nocache: true parameter.

curl
JavaScript
Python

curl -X POST https://api.tabstack.ai/v1/extract/markdown \
  -H "Authorization: Bearer $TABS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://news-site.com/breaking-news",
    "nocache": true
  }'

const response = await fetch('https://api.tabstack.ai/v1/extract/markdown', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${process.env.TABS_API_KEY}`,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    url: 'https://news-site.com/breaking-news',
    nocache: true  // Force a fresh fetch
  })
});

const data = await response.json();
console.log(data.content);

import requests
import os

api_key = os.environ.get("TABS_API_KEY")
endpoint_url = "https://api.tabstack.ai/v1/extract/markdown"

headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
}

payload = {
    "url": "https://news-site.com/breaking-news",
    "nocache": True  # Force a fresh fetch
}

response = requests.post(endpoint_url, headers=headers, json=payload)
data = response.json()

print(data['content'])

Setting nocache: true bypasses the cache and fetches fresh content. Use this for real-time data but expect slower responses since nothing can be reused from cache.

Note: Use nocache judiciously. Forcing a fresh fetch will result in slightly slower response times, as the API cannot serve the request from its cache.

Production-Ready Error Handling

In a real-world application, you can't assume every request will succeed. URLs may be invalid, sites may be down, or your API key might be wrong. A robust application must handle these failures gracefully.

The API uses standard HTTP status codes to indicate errors.

Status Code	Error Message	Description
400	`url is required`	The JSON body is missing the `url` parameter.
401	`Unauthorized - Invalid token`	Your API key is missing, invalid, or expired.
422	`url is invalid`	The provided URL is malformed.
422	`access to internal resources...`	You tried to access `localhost` or a private IP.
500	`failed to fetch URL`	The target server is down or blocked our request.
500	`failed to convert HTML...`	An internal error occurred during conversion.

All error responses return a simple JSON object:

{
  "error": "url is invalid"
}

Robust Error Handling Examples

Here are production-ready examples that encapsulate the logic, set timeouts, and handle potential errors correctly.

JavaScript
Python
curl (Bash)

import "dotenv/config"; // To load .env file
import { AbortSignal } from "abort-controller";

async function getMarkdownFromUrl(url, forceFresh = false) {
  const apiKey = process.env.TABS_API_KEY;
  if (!apiKey) {
    console.error("TABS_API_KEY environment variable not set.");
    return null;
  }

  const endpoint = 'https://api.tabstack.ai/v1/extract/markdown';
  
  // Set a 30-second timeout
  const controller = new AbortController();
  const timeoutId = setTimeout(() => controller.abort(), 30000);

  try {
    const response = await fetch(endpoint, {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${apiKey}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        url: url,
        metadata: true, // Always use the reliable metadata object
        nocache: forceFresh
      }),
      signal: controller.signal // Pass the AbortSignal
    });

    clearTimeout(timeoutId); // Clear the timeout if fetch succeeds

    const data = await response.json();

    if (!response.ok) {
      // Handle API errors (4xx, 5xx)
      console.warn(
        `API Error (HTTP ${response.status}) for ${url}: ${data.error || 'Unknown error'}`
      );
      return null;
    }

    return data;

  } catch (error) {
    clearTimeout(timeoutId); // Clear timeout on error
    if (error.name === 'AbortError') {
      console.error(`Request timed out for ${url}`);
    } else {
      console.error(`Network error for ${url}: ${error.message}`);
    }
    return null;
  }
}

// --- Usage ---
// (async () => {
//   const data = await getMarkdownFromUrl("https://example.com/article");
//   if (data) {
//     console.log(`Title: ${data.metadata?.title}`);
//     // console.log(data.content);
//   }
// })();

import requests
import os
import logging

# Configure logging
logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s')

def get_markdown_from_url(url: str, force_fresh: bool = False) -> dict | None:
    """
    Fetches clean markdown from a URL using the TABS API.
    Returns the parsed JSON data or None on failure.
    """
    api_key = os.environ.get("TABS_API_KEY")
    if not api_key:
        logging.error("TABS_API_KEY environment variable not set.")
        return None

    endpoint_url = "https://api.tabstack.ai/v1/extract/markdown"
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    }
    payload = {
        "url": url,
        "metadata": True,  # Always use the reliable metadata object
        "nocache": force_fresh
    }

    try:
        response = requests.post(
            endpoint_url,
            headers=headers,
            json=payload,
            timeout=30  # Always set a 30-second timeout!
        )

        # Check for HTTP errors (4xx or 5xx)
        if not response.ok:
            error_data = response.json()
            logging.warning(
                f"API Error (HTTP {response.status_code}) for {url}: {error_data.get('error')}"
            )
            return None

        return response.json()

    except requests.exceptions.Timeout:
        logging.error(f"Request timed out for {url}")
        return None
    except requests.exceptions.RequestException as e:
        # Catch-all for network/connection errors
        logging.error(f"Network error for {url}: {e}")
        return None
    except requests.exceptions.JSONDecodeError:
        # Catch error if response is not valid JSON
        logging.error(f"Failed to decode JSON response from API. Status: {response.status_code}")
        return None

# --- Usage ---
# good_url = "https://your-blog.com/some-article"
# data = get_markdown_from_url(good_url)
#
# if data:
#     logging.info(f"Title: {data['metadata'].get('title')}")
#     # print(data['content'])
#
# bad_url = "not-a-real-url"
# get_markdown_from_url(bad_url)

#!/bin/bash
# A robust bash script for error handling with curl

# Requires: curl, jq

API_KEY="$TABS_API_KEY"
URL_TO_FETCH="$1"

if [ -z "$API_KEY" ]; then
    echo "Error: TABS_API_KEY environment variable not set." >&2
    exit 1
fi

if [ -z "$URL_TO_FETCH" ]; then
    echo "Usage: $0 <url-to-fetch>" >&2
    exit 1
fi

# -s: silent
# -w "\n%{http_code}": write the http code on a new line
response=$(curl -s -w "\n%{http_code}" \
  -X POST https://api.tabstack.ai/v1/extract/markdown \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  --connect-timeout 10 \
  --max-time 30 \
  -d '{
        "url": "'"$URL_TO_FETCH"'",
        "metadata": true,
        "nocache": false
      }')

# Split response body and status code
http_code=$(echo "$response" | tail -n1)
response_body=$(echo "$response" | sed '$d')

if [ "$http_code" -eq 200 ]; then
  echo "Success:"
  echo "$response_body" | jq .
else
  echo "Error (HTTP $http_code):" >&2
  # Try to parse error with jq, fall back to plain echo
  echo "$response_body" | jq .error 2>/dev/null || echo "$response_body" >&2
  exit 1
fi

This production-ready function adds three key safety features: a guard clause that checks for the API key upfront, a 30-second timeout to prevent hanging requests, and comprehensive error handling for both HTTP errors and network failures. On errors, it logs helpful messages and returns null rather than crashing.

Quick Reference

Endpoint

URL: https://api.tabstack.ai/v1/extract/markdown
Method: POST
Authentication: Authorization: Bearer YOUR_API_KEY

Request Parameters (JSON Body)

Parameter	Type	Required	Default	Description
`url`	string	Yes		The publicly accessible URL to convert.
`metadata`	boolean	No	`false`	If `true`, returns metadata as a separate `metadata` object. If `false`, embeds metadata as YAML frontmatter in the `content`.
`nocache`	boolean	No	`false`	If `true`, bypasses the cache and forces a fresh fetch of the URL.

Metadata Object Fields

When metadata: true is used (or in frontmatter), these are the common fields you can expect.

Note: Not all fields will be present for every URL. Availability depends entirely on the metadata provided by the source website.

Field	Type	Description
`title`	string	Page title from Open Graph or HTML `<title>`.
`description`	string	Page description from Open Graph or HTML meta tags.
`author`	string	Author information from HTML metadata.
`publisher`	string	Publisher name from Open Graph.
`image`	string	Featured image URL from Open Graph.
`site_name`	string	Website name from Open Graph.
`url`	string	Canonical URL from Open Graph.
`type`	string	Content type from Open Graph (e.g., "article").

Best Practices Review

To recap, follow these rules for a smooth integration:

Secure Your Key: Never hardcode API keys. Use environment variables.
- JS: process.env.TABS_API_KEY
- Python: os.environ.get("TABS_API_KEY")
Use metadata: true: Prefer the separate metadata object for programmatic access. It's more reliable than parsing YAML.

Set Timeouts: Always set a reasonable timeout on your HTTP requests.

JavaScript
Python

// Use AbortController for fetch timeouts
const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), 30000);

fetch(url, {
  signal: controller.signal
}).finally(() => {
  clearTimeout(timeoutId);
});

# The 'requests' library makes this easy
try:
    response = requests.post(url, json=data, timeout=30)
except requests.exceptions.Timeout:
    print("Request timed out")

Handle Errors: Check for non-2xx HTTP status codes (!response.ok) and wrap your network calls in try...catch / try...except blocks.
Validate URLs: If possible, validate that a string is a valid http/https URL on your end before sending it to the API to save a request.
Use Caching: Don't use nocache: true unless you absolutely need real-time data. Let the API's cache work for you to get faster responses.

Prerequisites & Authentication​

The Basic Request​

Default Response: Content with Frontmatter​

Getting Separate Metadata​

New Response: Clean Content + Metadata Object​

Forcing a Fresh Fetch​

Production-Ready Error Handling​

Robust Error Handling Examples​

Quick Reference​

Endpoint​

Request Parameters (JSON Body)​

Metadata Object Fields​

Best Practices Review​

Related Resources​