Skip to content
Get started

Mastering the Markdown Endpoint

Web pages are full of noise — navigation, ads, sidebars, and boilerplate. When all you need is the content, the /extract/markdown endpoint fetches any public URL, strips the cruft, and returns clean, well-formatted Markdown.

Use it for:

  • Building content aggregation or read-it-later apps.
  • Preparing web content for LLM processing and RAG pipelines.
  • Converting blog posts or articles into a stable, storable format.
  • Powering documentation and content management systems.

You’ll need a Tabstack API key. Get one by signing up at https://tabstack.ai.

Install the SDK:

Terminal window
npm install @tabstack/sdk

The SDK reads TABSTACK_API_KEY from your environment automatically.

import Tabstack from '@tabstack/sdk'
const client = new Tabstack()
const result = await client.extract.markdown({
url: 'https://example.com/blog/article'
})
console.log(result.content)

Note: The examples below use placeholder URLs like https://example.com/blog/article. Replace them with the URL of the page you want to convert.

Default Response: Content with Frontmatter

Section titled “Default Response: Content with Frontmatter”

A successful request returns a JSON object. By default, the API embeds all extracted metadata (title, author, etc.) as YAML frontmatter at the top of the content.

{
"url": "https://example.com/blog/article",
"content": "---\ntitle: Example Article Title\ndescription: This is an example article...\nauthor: Example Author\nimage: https://example.com/images/article.jpg\n---\n\n# Example Article Title\n\nThis is the article content converted to markdown..."
}

The response includes the processed URL and the content with YAML frontmatter embedded — perfect for static site generators like Hugo or Jekyll that expect this format.

YAML frontmatter is convenient, but when you need metadata as a structured JSON object — for populating databases or feeding other systems — add metadata: true to your request.

import Tabstack from '@tabstack/sdk'
const client = new Tabstack()
const result = await client.extract.markdown({
url: 'https://example.com/blog/article',
metadata: true
})
console.log(result.content) // pure markdown, no frontmatter
console.log(result.metadata?.title) // "Example Article Title"

With metadata: true, the response separates content and metadata into distinct fields.

{
"url": "https://example.com/blog/article",
"content": "# Example Article Title\n\nThis is the article content converted to markdown...",
"metadata": {
"title": "Example Article Title",
"description": "This is an example article description",
"author": "Example Author",
"publisher": "Example Publisher",
"image": "https://example.com/images/article.jpg",
"site_name": "Example Blog",
"url": "https://example.com/blog/article",
"type": "article"
}
}

content contains pure Markdown without frontmatter, and metadata is a structured JSON object. No YAML parsing needed.

Note: Use metadata: true for most programmatic use cases. It’s more reliable than parsing YAML, which can break if titles or descriptions contain special characters.

The Tabstack API caches results for a short period. For static content like blog posts, this is a performance win. For breaking news or live feeds, use nocache: true to bypass the cache.

import Tabstack from '@tabstack/sdk'
const client = new Tabstack()
const result = await client.extract.markdown({
url: 'https://news-site.com/breaking-news',
nocache: true
})
console.log(result.content)

Note: Use nocache judiciously. Forcing a fresh fetch will result in slightly slower response times, as the API cannot serve the request from its cache.

The SDK raises typed exceptions for API errors, so you can handle specific failure modes without inspecting raw HTTP status codes.

ExceptionStatusCause
BadRequestError400Missing or malformed request body
AuthenticationError401API key missing, invalid, or expired
UnprocessableEntityError422Invalid URL or access to private resources
RateLimitError429Rate limit exceeded
InternalServerError500+Target server down or internal conversion failure

The SDK automatically retries on 429 and 500+ errors (2 attempts with exponential backoff).

import Tabstack, {
AuthenticationError,
BadRequestError,
UnprocessableEntityError,
RateLimitError,
InternalServerError,
} from '@tabstack/sdk'
const client = new Tabstack()
async function getMarkdownFromUrl(url: string, forceFresh = false) {
try {
const result = await client.extract.markdown({
url,
metadata: true,
nocache: forceFresh,
})
return result
} catch (err) {
if (err instanceof AuthenticationError) {
console.error('Invalid API key — check TABSTACK_API_KEY')
} else if (err instanceof UnprocessableEntityError) {
console.error(`Invalid URL: ${url}`)
} else if (err instanceof RateLimitError) {
console.error('Rate limit exceeded — back off and retry')
} else if (err instanceof BadRequestError) {
console.error('Bad request — missing or malformed parameters')
} else if (err instanceof InternalServerError) {
console.error(`Server error fetching ${url}`)
} else {
throw err
}
return null
}
}
const result = await getMarkdownFromUrl('https://example.com/article')
if (result) {
console.log(`Title: ${result.metadata?.title}`)
console.log(result.content)
}
ParameterTypeRequiredDefaultDescription
urlstringYesThe publicly accessible URL to convert.
effortstringNostandardRendering effort: min, standard, or max.
metadatabooleanNofalseIf true, returns metadata as a separate object. If false, embeds metadata as YAML frontmatter in content.
nocachebooleanNofalseIf true, bypasses the cache and forces a fresh fetch.
geo_targetobjectNo{ country: 'US' } — ISO 3166-1 alpha-2 country code.

When metadata: true is set, these fields may be present. Availability depends on what the source page provides.

FieldTypeDescription
titlestringPage title from Open Graph or HTML <title>.
descriptionstringPage description from Open Graph or HTML meta tags.
authorstringAuthor information from HTML metadata.
publisherstringPublisher name from Open Graph.
imagestringFeatured image URL from Open Graph.
site_namestringWebsite name from Open Graph.
urlstringCanonical URL from Open Graph.
typestringContent type from Open Graph (e.g., "article").
created_atstringPublication date, if available.
modified_atstringLast modified date, if available.
keywordsstring[]Keywords or tags, if available.