Mastering the Markdown Endpoint
Web pages are full of noise — navigation, ads, sidebars, and boilerplate. When all you need is the content, the /extract/markdown endpoint fetches any public URL, strips the cruft, and returns clean, well-formatted Markdown.
Use it for:
- Building content aggregation or read-it-later apps.
- Preparing web content for LLM processing and RAG pipelines.
- Converting blog posts or articles into a stable, storable format.
- Powering documentation and content management systems.
Prerequisites & Authentication
Section titled “Prerequisites & Authentication”You’ll need a Tabstack API key. Get one by signing up at https://tabstack.ai.
Install the SDK:
npm install @tabstack/sdkpip install tabstackexport TABSTACK_API_KEY="your-api-key-here"The SDK reads TABSTACK_API_KEY from your environment automatically.
The Basic Request
Section titled “The Basic Request”import Tabstack from '@tabstack/sdk'
const client = new Tabstack()
const result = await client.extract.markdown({ url: 'https://example.com/blog/article'})
console.log(result.content)from tabstack import Tabstack
client = Tabstack()
result = client.extract.markdown(url="https://example.com/blog/article")print(result.content)curl -X POST https://api.tabstack.ai/v1/extract/markdown \ -H "Authorization: Bearer $TABSTACK_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com/blog/article" }'Note: The examples below use placeholder URLs like
https://example.com/blog/article. Replace them with the URL of the page you want to convert.
Default Response: Content with Frontmatter
Section titled “Default Response: Content with Frontmatter”A successful request returns a JSON object. By default, the API embeds all extracted metadata (title, author, etc.) as YAML frontmatter at the top of the content.
{ "url": "https://example.com/blog/article", "content": "---\ntitle: Example Article Title\ndescription: This is an example article...\nauthor: Example Author\nimage: https://example.com/images/article.jpg\n---\n\n# Example Article Title\n\nThis is the article content converted to markdown..."}The response includes the processed URL and the content with YAML frontmatter embedded — perfect for static site generators like Hugo or Jekyll that expect this format.
Getting Separate Metadata
Section titled “Getting Separate Metadata”YAML frontmatter is convenient, but when you need metadata as a structured JSON object — for populating databases or feeding other systems — add metadata: true to your request.
import Tabstack from '@tabstack/sdk'
const client = new Tabstack()
const result = await client.extract.markdown({ url: 'https://example.com/blog/article', metadata: true})
console.log(result.content) // pure markdown, no frontmatterconsole.log(result.metadata?.title) // "Example Article Title"from tabstack import Tabstack
client = Tabstack()
result = client.extract.markdown( url="https://example.com/blog/article", metadata=True)
print(result.content)print(result.metadata.title if result.metadata else None)curl -X POST https://api.tabstack.ai/v1/extract/markdown \ -H "Authorization: Bearer $TABSTACK_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com/blog/article", "metadata": true }'Response: Clean Content + Metadata Object
Section titled “Response: Clean Content + Metadata Object”With metadata: true, the response separates content and metadata into distinct fields.
{ "url": "https://example.com/blog/article", "content": "# Example Article Title\n\nThis is the article content converted to markdown...", "metadata": { "title": "Example Article Title", "description": "This is an example article description", "author": "Example Author", "publisher": "Example Publisher", "image": "https://example.com/images/article.jpg", "site_name": "Example Blog", "url": "https://example.com/blog/article", "type": "article" }}content contains pure Markdown without frontmatter, and metadata is a structured JSON object. No YAML parsing needed.
Note: Use
metadata: truefor most programmatic use cases. It’s more reliable than parsing YAML, which can break if titles or descriptions contain special characters.
Forcing a Fresh Fetch
Section titled “Forcing a Fresh Fetch”The Tabstack API caches results for a short period. For static content like blog posts, this is a performance win. For breaking news or live feeds, use nocache: true to bypass the cache.
import Tabstack from '@tabstack/sdk'
const client = new Tabstack()
const result = await client.extract.markdown({ url: 'https://news-site.com/breaking-news', nocache: true})
console.log(result.content)from tabstack import Tabstack
client = Tabstack()
result = client.extract.markdown( url="https://news-site.com/breaking-news", nocache=True)
print(result.content)curl -X POST https://api.tabstack.ai/v1/extract/markdown \ -H "Authorization: Bearer $TABSTACK_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://news-site.com/breaking-news", "nocache": true }'Note: Use nocache judiciously. Forcing a fresh fetch will result in slightly slower response times, as the API cannot serve the request from its cache.
Production-Ready Error Handling
Section titled “Production-Ready Error Handling”The SDK raises typed exceptions for API errors, so you can handle specific failure modes without inspecting raw HTTP status codes.
| Exception | Status | Cause |
|---|---|---|
BadRequestError | 400 | Missing or malformed request body |
AuthenticationError | 401 | API key missing, invalid, or expired |
UnprocessableEntityError | 422 | Invalid URL or access to private resources |
RateLimitError | 429 | Rate limit exceeded |
InternalServerError | 500+ | Target server down or internal conversion failure |
The SDK automatically retries on 429 and 500+ errors (2 attempts with exponential backoff).
Robust Error Handling Examples
Section titled “Robust Error Handling Examples”import Tabstack, { AuthenticationError, BadRequestError, UnprocessableEntityError, RateLimitError, InternalServerError,} from '@tabstack/sdk'
const client = new Tabstack()
async function getMarkdownFromUrl(url: string, forceFresh = false) { try { const result = await client.extract.markdown({ url, metadata: true, nocache: forceFresh, }) return result } catch (err) { if (err instanceof AuthenticationError) { console.error('Invalid API key — check TABSTACK_API_KEY') } else if (err instanceof UnprocessableEntityError) { console.error(`Invalid URL: ${url}`) } else if (err instanceof RateLimitError) { console.error('Rate limit exceeded — back off and retry') } else if (err instanceof BadRequestError) { console.error('Bad request — missing or malformed parameters') } else if (err instanceof InternalServerError) { console.error(`Server error fetching ${url}`) } else { throw err } return null }}
const result = await getMarkdownFromUrl('https://example.com/article')if (result) { console.log(`Title: ${result.metadata?.title}`) console.log(result.content)}import loggingfrom tabstack import Tabstackfrom tabstack import ( AuthenticationError, BadRequestError, UnprocessableEntityError, RateLimitError, InternalServerError,)
logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s')
client = Tabstack()
def get_markdown_from_url(url: str, force_fresh: bool = False): try: result = client.extract.markdown( url=url, metadata=True, nocache=force_fresh, ) return result except AuthenticationError: logging.error("Invalid API key — check TABSTACK_API_KEY") except UnprocessableEntityError: logging.error(f"Invalid URL: {url}") except RateLimitError: logging.error("Rate limit exceeded — back off and retry") except BadRequestError: logging.error("Bad request — missing or malformed parameters") except InternalServerError: logging.error(f"Server error fetching {url}") return None
result = get_markdown_from_url("https://example.com/article")if result: logging.info(f"Title: {result.metadata.title if result.metadata else None}") print(result.content)#!/bin/bash# Requires: curl, jq
API_KEY="$TABSTACK_API_KEY"URL_TO_FETCH="$1"
if [ -z "$API_KEY" ]; then echo "Error: TABSTACK_API_KEY environment variable not set." >&2 exit 1fi
if [ -z "$URL_TO_FETCH" ]; then echo "Usage: $0 <url-to-fetch>" >&2 exit 1fi
response=$(curl -s -w "\n%{http_code}" \ -X POST https://api.tabstack.ai/v1/extract/markdown \ -H "Authorization: Bearer $API_KEY" \ -H "Content-Type: application/json" \ --connect-timeout 10 \ --max-time 30 \ -d '{ "url": "'"$URL_TO_FETCH"'", "metadata": true, "nocache": false }')
http_code=$(echo "$response" | tail -n1)response_body=$(echo "$response" | sed '$d')
if [ "$http_code" -eq 200 ]; then echo "Success:" echo "$response_body" | jq .else echo "Error (HTTP $http_code):" >&2 echo "$response_body" | jq .error 2>/dev/null || echo "$response_body" >&2 exit 1fiQuick Reference
Section titled “Quick Reference”Request Parameters
Section titled “Request Parameters”| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
url | string | Yes | The publicly accessible URL to convert. | |
effort | string | No | standard | Rendering effort: min, standard, or max. |
metadata | boolean | No | false | If true, returns metadata as a separate object. If false, embeds metadata as YAML frontmatter in content. |
nocache | boolean | No | false | If true, bypasses the cache and forces a fresh fetch. |
geo_target | object | No | { country: 'US' } — ISO 3166-1 alpha-2 country code. |
Metadata Object Fields
Section titled “Metadata Object Fields”When metadata: true is set, these fields may be present. Availability depends on what the source page provides.
| Field | Type | Description |
|---|---|---|
title | string | Page title from Open Graph or HTML <title>. |
description | string | Page description from Open Graph or HTML meta tags. |
author | string | Author information from HTML metadata. |
publisher | string | Publisher name from Open Graph. |
image | string | Featured image URL from Open Graph. |
site_name | string | Website name from Open Graph. |
url | string | Canonical URL from Open Graph. |
type | string | Content type from Open Graph (e.g., "article"). |
created_at | string | Publication date, if available. |
modified_at | string | Last modified date, if available. |
keywords | string[] | Keywords or tags, if available. |