Skip to main content

Developer's Guide

Mastering the Markdown Endpoint

Scraping web content is often messy. You're left with a complex tangle of HTML, boilerplate, and ads, when all you really want is the clean, structured content.

The TABS API Markdown Endpoint solves this. It's a single POST request that fetches any public URL, intelligently parses the HTML, and returns clean, well-formatted Markdown. It's the perfect tool for:

  • Building content aggregation or "read-it-later" apps.
  • Preparing web content for AI/LLM processing and RAG pipelines.
  • Converting blog posts or articles into a stable, storable format.
  • Powering documentation and content management systems.

This guide will walk you through setting up your environment, making your first request, and building a production-ready function to handle content conversion robustly.

Prerequisites & Authentication

Before you begin, you'll need a TABS API key. You can get yours by signing up at https://tabstack.ai.

The API uses Bearer token authentication. We strongly recommend storing your key as an environment variable rather than hardcoding it in your application.

First, set the variable in your terminal session.

export TABS_API_KEY="your-api-key-here"

The export TABS_API_KEY=... Bash command sets an environment variable named TABS_API_KEY for your current session. Your application code (e.g., in Python or Node.js) can then access this variable, keeping your secret key out of your source code.

The Basic Request

Let's start by converting a URL with the simplest possible request. The endpoint lives at https://api.tabstack.ai/v1/extract/markdown and expects a POST request with a JSON body.

curl -X POST https://api.tabstack.ai/v1/extract/markdown \
-H "Authorization: Bearer $TABS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/blog/article"
}'

All three examples make an authenticated POST request with a JSON body containing the target URL. The API fetches the page, extracts the main content, and returns it as clean Markdown with metadata.

Default Response: Content with Frontmatter

A successful request returns a JSON object. By default, the API cleverly embeds all extracted metadata (like title, author, etc.) as YAML frontmatter at the top of the content.

{
"url": "https://example.com/blog/article",
"content": "---\ntitle: Example Article Title\ndescription: This is an example article...\nauthor: Example Author\nimage: https://example.com/images/article.jpg\n---\n\n# Example Article Title\n\nThis is the article content converted to markdown..."
}

The response includes the processed URL and the content with YAML frontmatter embedded—perfect for static site generators like Hugo or Jekyll that expect this format.

Getting Separate Metadata

YAML frontmatter is great, but sometimes you want metadata as a clean, parsable JSON object, separate from the content. This is essential for populating databases or feeding structured data to other systems.

To do this, simply add the metadata: true parameter to your request.

curl -X POST https://api.tabstack.ai/v1/extract/markdown \
-H "Authorization: Bearer $TABS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/blog/article",
"metadata": true
}'

Adding metadata: true to the request changes the response format to separate content and metadata into distinct fields.

New Response: Clean Content + Metadata Object

By setting metadata: true, the response structure now includes a separate metadata object.

{
"url": "https://example.com/blog/article",
"content": "# Example Article Title\n\nThis is the article content converted to markdown...",
"metadata": {
"title": "Example Article Title",
"description": "This is an example article description",
"author": "Example Author",
"publisher": "Example Publisher",
"image": "https://example.com/images/article.jpg",
"site_name": "Example Blog",
"url": "https://example.com/blog/article",
"type": "article"
}
}

Now content contains pure Markdown without frontmatter, and metadata is a structured JSON object. This format is easier to work with programmatically—no YAML parsing needed.

✨ Pro Tip: We recommend using metadata: true for most programmatic use cases. It's more reliable and easier than parsing YAML, which can be brittle if descriptions or titles contain special characters.

Forcing a Fresh Fetch

For performance, the TABS API caches results for a short period. This is perfect for static content like blog posts. However, if you're scraping a breaking news site or a live feed, you'll want to bypass the cache.

You can do this using the nocache: true parameter.

curl -X POST https://api.tabstack.ai/v1/extract/markdown \
-H "Authorization: Bearer $TABS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://news-site.com/breaking-news",
"nocache": true
}'

Setting nocache: true bypasses the cache and fetches fresh content. Use this for real-time data but expect slower responses since nothing can be reused from cache.

Note: Use nocache judiciously. Forcing a fresh fetch will result in slightly slower response times, as the API cannot serve the request from its cache.

Production-Ready Error Handling

In a real-world application, you can't assume every request will succeed. URLs may be invalid, sites may be down, or your API key might be wrong. A robust application must handle these failures gracefully.

The API uses standard HTTP status codes to indicate errors.

Status CodeError MessageDescription
400url is requiredThe JSON body is missing the url parameter.
401Unauthorized - Invalid tokenYour API key is missing, invalid, or expired.
422url is invalidThe provided URL is malformed.
422access to internal resources...You tried to access localhost or a private IP.
500failed to fetch URLThe target server is down or blocked our request.
500failed to convert HTML...An internal error occurred during conversion.

All error responses return a simple JSON object:

{
"error": "url is invalid"
}

Robust Error Handling Examples

Here are production-ready examples that encapsulate the logic, set timeouts, and handle potential errors correctly.

import "dotenv/config"; // To load .env file
import { AbortSignal } from "abort-controller";

async function getMarkdownFromUrl(url, forceFresh = false) {
const apiKey = process.env.TABS_API_KEY;
if (!apiKey) {
console.error("TABS_API_KEY environment variable not set.");
return null;
}

const endpoint = 'https://api.tabstack.ai/v1/extract/markdown';

// Set a 30-second timeout
const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), 30000);

try {
const response = await fetch(endpoint, {
method: 'POST',
headers: {
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
url: url,
metadata: true, // Always use the reliable metadata object
nocache: forceFresh
}),
signal: controller.signal // Pass the AbortSignal
});

clearTimeout(timeoutId); // Clear the timeout if fetch succeeds

const data = await response.json();

if (!response.ok) {
// Handle API errors (4xx, 5xx)
console.warn(
`API Error (HTTP ${response.status}) for ${url}: ${data.error || 'Unknown error'}`
);
return null;
}

return data;

} catch (error) {
clearTimeout(timeoutId); // Clear timeout on error
if (error.name === 'AbortError') {
console.error(`Request timed out for ${url}`);
} else {
console.error(`Network error for ${url}: ${error.message}`);
}
return null;
}
}

// --- Usage ---
// (async () => {
// const data = await getMarkdownFromUrl("https://example.com/article");
// if (data) {
// console.log(`Title: ${data.metadata?.title}`);
// // console.log(data.content);
// }
// })();

This production-ready function adds three key safety features: a guard clause that checks for the API key upfront, a 30-second timeout to prevent hanging requests, and comprehensive error handling for both HTTP errors and network failures. On errors, it logs helpful messages and returns null rather than crashing.

Quick Reference

Endpoint

  • URL: https://api.tabstack.ai/v1/extract/markdown
  • Method: POST
  • Authentication: Authorization: Bearer YOUR_API_KEY

Request Parameters (JSON Body)

ParameterTypeRequiredDefaultDescription
urlstringYesThe publicly accessible URL to convert.
metadatabooleanNofalseIf true, returns metadata as a separate metadata object. If false, embeds metadata as YAML frontmatter in the content.
nocachebooleanNofalseIf true, bypasses the cache and forces a fresh fetch of the URL.

Metadata Object Fields

When metadata: true is used (or in frontmatter), these are the common fields you can expect.

Note: Not all fields will be present for every URL. Availability depends entirely on the metadata provided by the source website.

FieldTypeDescription
titlestringPage title from Open Graph or HTML <title>.
descriptionstringPage description from Open Graph or HTML meta tags.
authorstringAuthor information from HTML metadata.
publisherstringPublisher name from Open Graph.
imagestringFeatured image URL from Open Graph.
site_namestringWebsite name from Open Graph.
urlstringCanonical URL from Open Graph.
typestringContent type from Open Graph (e.g., "article").

Best Practices Review

To recap, follow these rules for a smooth integration:

  1. Secure Your Key: Never hardcode API keys. Use environment variables.

    • JS: process.env.TABS_API_KEY
    • Python: os.environ.get("TABS_API_KEY")
  2. Use metadata: true: Prefer the separate metadata object for programmatic access. It's more reliable than parsing YAML.

  3. Set Timeouts: Always set a reasonable timeout on your HTTP requests.

    // Use AbortController for fetch timeouts
    const controller = new AbortController();
    const timeoutId = setTimeout(() => controller.abort(), 30000);

    fetch(url, {
    signal: controller.signal
    }).finally(() => {
    clearTimeout(timeoutId);
    });
  4. Handle Errors: Check for non-2xx HTTP status codes (!response.ok) and wrap your network calls in try...catch / try...except blocks.

  5. Validate URLs: If possible, validate that a string is a valid http/https URL on your end before sending it to the API to save a request.

  6. Use Caching: Don't use nocache: true unless you absolutely need real-time data. Let the API's cache work for you to get faster responses.