Mastering the Markdown Endpoint
Convert any public URL to clean, LLM-ready Markdown with the Tabstack /extract/markdown endpoint, with optional metadata and cache control.
Web pages are full of noise: navigation, ads, sidebars, and boilerplate. When all you need is the content, the /extract/markdown endpoint fetches any public URL, strips the cruft, and returns clean, well-formatted Markdown.
Use it for:
- Building content aggregation or read-it-later apps.
- Preparing web content for LLM processing and RAG pipelines.
- Converting blog posts or articles into a stable, storable format.
- Powering documentation and content management systems.
Prerequisites & Authentication
Section titled “Prerequisites & Authentication”You’ll need a Tabstack API key. Get one by signing up at https://tabstack.ai.
Install the SDK:
npm install @tabstack/sdkpip install tabstackcurl -fsSL https://tabstack.ai/install.sh | sh# then: tabstack auth loginexport TABSTACK_API_KEY="your-api-key-here"The SDK reads TABSTACK_API_KEY from your environment automatically.
The Basic Request
Section titled “The Basic Request”import Tabstack from "@tabstack/sdk";
const client = new Tabstack();
const result = await client.extract.markdown({ url: "https://example.com/blog/article",});
console.log(result.content);from tabstack import Tabstack
client = Tabstack()
result = client.extract.markdown(url="https://example.com/blog/article")print(result.content)tabstack extract markdown https://example.com/blog/articlecurl -X POST https://api.tabstack.ai/v1/extract/markdown \ -H "Authorization: Bearer $TABSTACK_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com/blog/article" }'Note: The examples below use placeholder URLs like
https://example.com/blog/article. Replace them with the URL of the page you want to convert.
Default Response: Content with Frontmatter
Section titled “Default Response: Content with Frontmatter”A successful request returns a JSON object. By default, the API embeds all extracted metadata (title, author, etc.) as YAML frontmatter at the top of the content.
{ "url": "https://example.com/blog/article", "content": "---\ntitle: Example Article Title\ndescription: This is an example article...\nauthor: Example Author\nimage: https://example.com/images/article.jpg\n---\n\n# Example Article Title\n\nThis is the article content converted to markdown..."}The response includes the processed URL and the content with YAML frontmatter embedded, perfect for static site generators like Hugo or Jekyll that expect this format.
Getting Separate Metadata
Section titled “Getting Separate Metadata”YAML frontmatter is convenient, but when you need metadata as a structured JSON object (for populating databases or feeding other systems) add metadata: true to your request.
import Tabstack from "@tabstack/sdk";
const client = new Tabstack();
const result = await client.extract.markdown({ url: "https://example.com/blog/article", metadata: true,});
console.log(result.content); // pure markdown, no frontmatterconsole.log(result.metadata?.title); // "Example Article Title"from tabstack import Tabstack
client = Tabstack()
result = client.extract.markdown( url="https://example.com/blog/article", metadata=True)
print(result.content)print(result.metadata.title if result.metadata else None)tabstack extract markdown https://example.com/blog/article --metadatacurl -X POST https://api.tabstack.ai/v1/extract/markdown \ -H "Authorization: Bearer $TABSTACK_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com/blog/article", "metadata": true }'Response: Clean Content + Metadata Object
Section titled “Response: Clean Content + Metadata Object”With metadata: true, the response separates content and metadata into distinct fields.
{ "url": "https://example.com/blog/article", "content": "# Example Article Title\n\nThis is the article content converted to markdown...", "metadata": { "title": "Example Article Title", "description": "This is an example article description", "author": "Example Author", "publisher": "Example Publisher", "image": "https://example.com/images/article.jpg", "site_name": "Example Blog", "url": "https://example.com/blog/article", "type": "article" }}content contains pure Markdown without frontmatter, and metadata is a structured JSON object. No YAML parsing needed.
Note: Use
metadata: truefor most programmatic use cases. It’s more reliable than parsing YAML, which can break if titles or descriptions contain special characters.
Forcing a Fresh Fetch
Section titled “Forcing a Fresh Fetch”The Tabstack API caches results for a short period. For static content like blog posts, this is a performance win. For breaking news or live feeds, use nocache: true to bypass the cache.
import Tabstack from "@tabstack/sdk";
const client = new Tabstack();
const result = await client.extract.markdown({ url: "https://news-site.com/breaking-news", nocache: true,});
console.log(result.content);from tabstack import Tabstack
client = Tabstack()
result = client.extract.markdown( url="https://news-site.com/breaking-news", nocache=True)
print(result.content)tabstack extract markdown https://news-site.com/breaking-news --nocachecurl -X POST https://api.tabstack.ai/v1/extract/markdown \ -H "Authorization: Bearer $TABSTACK_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://news-site.com/breaking-news", "nocache": true }'Note: Use nocache judiciously. Forcing a fresh fetch will result in slightly slower response times, as the API cannot serve the request from its cache.
Production-Ready Error Handling
Section titled “Production-Ready Error Handling”The SDK raises typed exceptions for API errors, so you can handle specific failure modes without inspecting raw HTTP status codes.
| Exception | Status | Cause |
|---|---|---|
BadRequestError | 400 | Missing or malformed request body |
AuthenticationError | 401 | API key missing, invalid, or expired |
UnprocessableEntityError | 422 | Invalid URL or access to private resources |
RateLimitError | 429 | Rate limit exceeded |
InternalServerError | 500+ | Target server down or internal conversion failure |
The SDK automatically retries on 429 and 500+ errors (2 attempts with exponential backoff).
Robust Error Handling Examples
Section titled “Robust Error Handling Examples”import Tabstack, { AuthenticationError, BadRequestError, UnprocessableEntityError, RateLimitError, InternalServerError,} from "@tabstack/sdk";
const client = new Tabstack();
async function getMarkdownFromUrl(url: string, forceFresh = false) { try { const result = await client.extract.markdown({ url, metadata: true, nocache: forceFresh, }); return result; } catch (err) { if (err instanceof AuthenticationError) { console.error("Invalid API key: check TABSTACK_API_KEY"); } else if (err instanceof UnprocessableEntityError) { console.error(`Invalid URL: ${url}`); } else if (err instanceof RateLimitError) { console.error("Rate limit exceeded: back off and retry"); } else if (err instanceof BadRequestError) { console.error("Bad request: missing or malformed parameters"); } else if (err instanceof InternalServerError) { console.error(`Server error fetching ${url}`); } else { throw err; } return null; }}
const result = await getMarkdownFromUrl("https://example.com/article");if (result) { console.log(`Title: ${result.metadata?.title}`); console.log(result.content);}import loggingfrom tabstack import Tabstackfrom tabstack import ( AuthenticationError, BadRequestError, UnprocessableEntityError, RateLimitError, InternalServerError,)
logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s')
client = Tabstack()
def get_markdown_from_url(url: str, force_fresh: bool = False): try: result = client.extract.markdown( url=url, metadata=True, nocache=force_fresh, ) return result except AuthenticationError: logging.error("Invalid API key: check TABSTACK_API_KEY") except UnprocessableEntityError: logging.error(f"Invalid URL: {url}") except RateLimitError: logging.error("Rate limit exceeded: back off and retry") except BadRequestError: logging.error("Bad request: missing or malformed parameters") except InternalServerError: logging.error(f"Server error fetching {url}") return None
result = get_markdown_from_url("https://example.com/article")if result: logging.info(f"Title: {result.metadata.title if result.metadata else None}") print(result.content)#!/bin/bash# Requires: curl, jq
API_KEY="$TABSTACK_API_KEY"URL_TO_FETCH="$1"
if [ -z "$API_KEY" ]; then echo "Error: TABSTACK_API_KEY environment variable not set." >&2 exit 1fi
if [ -z "$URL_TO_FETCH" ]; then echo "Usage: $0 <url-to-fetch>" >&2 exit 1fi
response=$(curl -s -w "\n%{http_code}" \ -X POST https://api.tabstack.ai/v1/extract/markdown \ -H "Authorization: Bearer $API_KEY" \ -H "Content-Type: application/json" \ --connect-timeout 10 \ --max-time 30 \ -d '{ "url": "'"$URL_TO_FETCH"'", "metadata": true, "nocache": false }')
http_code=$(echo "$response" | tail -n1)response_body=$(echo "$response" | sed '$d')
if [ "$http_code" -eq 200 ]; then echo "Success:" echo "$response_body" | jq .else echo "Error (HTTP $http_code):" >&2 echo "$response_body" | jq .error 2>/dev/null || echo "$response_body" >&2 exit 1fiQuick Reference
Section titled “Quick Reference”Request Parameters
Section titled “Request Parameters”| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
url | string | Yes | The publicly accessible URL to convert. | |
effort | string | No | standard | Rendering effort: min, standard, or max. |
metadata | boolean | No | false | If true, returns metadata as a separate object. If false, embeds metadata as YAML frontmatter in content. |
nocache | boolean | No | false | If true, bypasses the cache and forces a fresh fetch. |
geo_target | object | No | { country: 'US' }, ISO 3166-1 alpha-2 country code. |
Metadata Object Fields
Section titled “Metadata Object Fields”When metadata: true is set, these fields may be present. Availability depends on what the source page provides.
| Field | Type | Description |
|---|---|---|
title | string | Page title from Open Graph or HTML <title>. |
description | string | Page description from Open Graph or HTML meta tags. |
author | string | Author information from HTML metadata. |
publisher | string | Publisher name from Open Graph. |
image | string | Featured image URL from Open Graph. |
site_name | string | Website name from Open Graph. |
url | string | Canonical URL from Open Graph. |
type | string | Content type from Open Graph (e.g., "article"). |
created_at | string | Publication date, if available. |
modified_at | string | Last modified date, if available. |
keywords | string[] | Keywords or tags, if available. |