---
title: Mastering the Markdown Endpoint | Tabstack
---

Scraping web content is often messy. You’re left with a complex tangle of HTML, boilerplate, and ads, when all you really want is the clean, structured content.

The Tabstack API Markdown Endpoint solves this. It’s a single `POST` request that fetches any public URL, intelligently parses the HTML, and returns clean, well-formatted Markdown. It’s the perfect tool for:

- Building content aggregation or “read-it-later” apps.
- Preparing web content for AI/LLM processing and RAG pipelines.
- Converting blog posts or articles into a stable, storable format.
- Powering documentation and content management systems.

This guide will walk you through setting up your environment, making your first request, and building a production-ready function to handle content conversion robustly.

## Prerequisites & Authentication

Before you begin, you’ll need a Tabstack API key. You can get yours by signing up at <https://tabstack.ai>.

The API uses Bearer token authentication. We **strongly recommend** storing your key as an environment variable rather than hardcoding it in your application.

First, set the variable in your terminal session.

Terminal window

```
export TABSTACK_API_KEY="your-api-key-here"
```

The `export TABSTACK_API_KEY=...` Bash command sets an environment variable named `TABSTACK_API_KEY` for your current session. Your application code (e.g., in Python or Node.js) can then access this variable, keeping your secret key out of your source code.

## The Basic Request

Let’s start by converting a URL with the simplest possible request. The endpoint lives at `https://api.tabstack.ai/v1/extract/markdown` and expects a `POST` request with a JSON body.

- [curl](#tab-panel-36)
- [JavaScript](#tab-panel-37)
- [Python](#tab-panel-38)

Terminal window

```
curl -X POST https://api.tabstack.ai/v1/extract/markdown \
  -H "Authorization: Bearer $TABSTACK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/blog/article"
  }'
```

```
const response = await fetch("https://api.tabstack.ai/v1/extract/markdown", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${process.env.TABSTACK_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    url: "https://example.com/blog/article",
  }),
});


const data = await response.json();
console.log(data);
```

```
import requests
import os


api_key = os.environ.get("TABSTACK_API_KEY")
endpoint_url = "https://api.tabstack.ai/v1/extract/markdown"


headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
}


payload = {
    "url": "https://example.com/blog/article"
}


response = requests.post(endpoint_url, headers=headers, json=payload)
data = response.json()


print(data)
```

All three examples make an authenticated POST request with a JSON body containing the target URL. The API fetches the page, extracts the main content, and returns it as clean Markdown with metadata.

### Default Response: Content with Frontmatter

A successful request returns a JSON object. By default, the API cleverly embeds all extracted metadata (like title, author, etc.) as **YAML frontmatter** at the top of the content.

```
{
  "url": "https://example.com/blog/article",
  "content": "---\ntitle: Example Article Title\ndescription: This is an example article...\nauthor: Example Author\nimage: https://example.com/images/article.jpg\n---\n\n# Example Article Title\n\nThis is the article content converted to markdown..."
}
```

The response includes the processed URL and the content with YAML frontmatter embedded—perfect for static site generators like Hugo or Jekyll that expect this format.

## Getting Separate Metadata

YAML frontmatter is great, but sometimes you want metadata as a clean, parsable JSON object, separate from the content. This is essential for populating databases or feeding structured data to other systems.

To do this, simply add the `metadata: true` parameter to your request.

- [curl](#tab-panel-39)
- [JavaScript](#tab-panel-40)
- [Python](#tab-panel-41)

Terminal window

```
curl -X POST https://api.tabstack.ai/v1/extract/markdown \
  -H "Authorization: Bearer $TABSTACK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/blog/article",
    "metadata": true
  }'
```

```
const response = await fetch("https://api.tabstack.ai/v1/extract/markdown", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${process.env.TABSTACK_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    url: "https://example.com/blog/article",
    metadata: true, // Request separate metadata
  }),
});


const data = await response.json();
console.log(data);
```

```
import requests
import os


api_key = os.environ.get("TABSTACK_API_KEY")
endpoint_url = "https://api.tabstack.ai/v1/extract/markdown"


headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
}


payload = {
    "url": "https://example.com/blog/article",
    "metadata": True  # Request separate metadata
}


response = requests.post(endpoint_url, headers=headers, json=payload)
data = response.json()


print(data)
```

Adding `metadata: true` to the request changes the response format to separate content and metadata into distinct fields.

### New Response: Clean Content + Metadata Object

By setting `metadata: true`, the response structure now includes a separate `metadata` object.

```
{
  "url": "https://example.com/blog/article",
  "content": "# Example Article Title\n\nThis is the article content converted to markdown...",
  "metadata": {
    "title": "Example Article Title",
    "description": "This is an example article description",
    "author": "Example Author",
    "publisher": "Example Publisher",
    "image": "https://example.com/images/article.jpg",
    "site_name": "Example Blog",
    "url": "https://example.com/blog/article",
    "type": "article"
  }
}
```

Now `content` contains pure Markdown without frontmatter, and `metadata` is a structured JSON object. This format is easier to work with programmatically—no YAML parsing needed.

> **✨ Pro Tip:** We recommend using **`metadata: true`** for most programmatic use cases. It’s more reliable and easier than parsing YAML, which can be brittle if descriptions or titles contain special characters.

## Forcing a Fresh Fetch

For performance, the Tabstack API caches results for a short period. This is perfect for static content like blog posts. However, if you’re scraping a breaking news site or a live feed, you’ll want to bypass the cache.

You can do this using the `nocache: true` parameter.

- [curl](#tab-panel-42)
- [JavaScript](#tab-panel-43)
- [Python](#tab-panel-44)

Terminal window

```
curl -X POST https://api.tabstack.ai/v1/extract/markdown \
  -H "Authorization: Bearer $TABSTACK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://news-site.com/breaking-news",
    "nocache": true
  }'
```

```
const response = await fetch("https://api.tabstack.ai/v1/extract/markdown", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${process.env.TABSTACK_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    url: "https://news-site.com/breaking-news",
    nocache: true, // Force a fresh fetch
  }),
});


const data = await response.json();
console.log(data.content);
```

```
import requests
import os


api_key = os.environ.get("TABSTACK_API_KEY")
endpoint_url = "https://api.tabstack.ai/v1/extract/markdown"


headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
}


payload = {
    "url": "https://news-site.com/breaking-news",
    "nocache": True  # Force a fresh fetch
}


response = requests.post(endpoint_url, headers=headers, json=payload)
data = response.json()


print(data['content'])
```

Setting `nocache: true` bypasses the cache and fetches fresh content. Use this for real-time data but expect slower responses since nothing can be reused from cache.

**Note:** Use `nocache` judiciously. Forcing a fresh fetch will result in slightly slower response times, as the API cannot serve the request from its cache.

## Production-Ready Error Handling

In a real-world application, you can’t assume every request will succeed. URLs may be invalid, sites may be down, or your API key might be wrong. A robust application must handle these failures gracefully.

The API uses standard HTTP status codes to indicate errors.

| Status Code | Error Message                     | Description                                       |
| ----------- | --------------------------------- | ------------------------------------------------- |
| 400         | `url is required`                 | The JSON body is missing the `url` parameter.     |
| 401         | `Unauthorized - Invalid token`    | Your API key is missing, invalid, or expired.     |
| 422         | `url is invalid`                  | The provided URL is malformed.                    |
| 422         | `access to internal resources...` | You tried to access `localhost` or a private IP.  |
| 500         | `failed to fetch URL`             | The target server is down or blocked our request. |
| 500         | `failed to convert HTML...`       | An internal error occurred during conversion.     |

All error responses return a simple JSON object:

```
{
  "error": "url is invalid"
}
```

### Robust Error Handling Examples

Here are production-ready examples that encapsulate the logic, set timeouts, and handle potential errors correctly.

- [JavaScript](#tab-panel-45)
- [Python](#tab-panel-46)
- [curl (Bash)](#tab-panel-47)

```
import "dotenv/config"; // To load .env file
import { AbortSignal } from "abort-controller";


async function getMarkdownFromUrl(url, forceFresh = false) {
  const apiKey = process.env.TABSTACK_API_KEY;
  if (!apiKey) {
    console.error("TABSTACK_API_KEY environment variable not set.");
    return null;
  }


  const endpoint = "https://api.tabstack.ai/v1/extract/markdown";


  // Set a 30-second timeout
  const controller = new AbortController();
  const timeoutId = setTimeout(() => controller.abort(), 30000);


  try {
    const response = await fetch(endpoint, {
      method: "POST",
      headers: {
        Authorization: `Bearer ${apiKey}`,
        "Content-Type": "application/json",
      },
      body: JSON.stringify({
        url: url,
        metadata: true, // Always use the reliable metadata object
        nocache: forceFresh,
      }),
      signal: controller.signal, // Pass the AbortSignal
    });


    clearTimeout(timeoutId); // Clear the timeout if fetch succeeds


    const data = await response.json();


    if (!response.ok) {
      // Handle API errors (4xx, 5xx)
      console.warn(
        `API Error (HTTP ${response.status}) for ${url}: ${data.error || "Unknown error"}`
      );
      return null;
    }


    return data;
  } catch (error) {
    clearTimeout(timeoutId); // Clear timeout on error
    if (error.name === "AbortError") {
      console.error(`Request timed out for ${url}`);
    } else {
      console.error(`Network error for ${url}: ${error.message}`);
    }
    return null;
  }
}


// --- Usage ---
// (async () => {
//   const data = await getMarkdownFromUrl("https://example.com/article");
//   if (data) {
//     console.log(`Title: ${data.metadata?.title}`);
//     // console.log(data.content);
//   }
// })();
```

```
import requests
import os
import logging


# Configure logging
logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s')


def get_markdown_from_url(url: str, force_fresh: bool = False) -> dict | None:
    """
    Fetches clean markdown from a URL using the Tabstack API.
    Returns the parsed JSON data or None on failure.
    """
    api_key = os.environ.get("TABSTACK_API_KEY")
    if not api_key:
        logging.error("TABSTACK_API_KEY environment variable not set.")
        return None


    endpoint_url = "https://api.tabstack.ai/v1/extract/markdown"
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    }
    payload = {
        "url": url,
        "metadata": True,  # Always use the reliable metadata object
        "nocache": force_fresh
    }


    try:
        response = requests.post(
            endpoint_url,
            headers=headers,
            json=payload,
            timeout=30  # Always set a 30-second timeout!
        )


        # Check for HTTP errors (4xx or 5xx)
        if not response.ok:
            error_data = response.json()
            logging.warning(
                f"API Error (HTTP {response.status_code}) for {url}: {error_data.get('error')}"
            )
            return None


        return response.json()


    except requests.exceptions.Timeout:
        logging.error(f"Request timed out for {url}")
        return None
    except requests.exceptions.RequestException as e:
        # Catch-all for network/connection errors
        logging.error(f"Network error for {url}: {e}")
        return None
    except requests.exceptions.JSONDecodeError:
        # Catch error if response is not valid JSON
        logging.error(f"Failed to decode JSON response from API. Status: {response.status_code}")
        return None


# --- Usage ---
# good_url = "https://your-blog.com/some-article"
# data = get_markdown_from_url(good_url)
#
# if data:
#     logging.info(f"Title: {data['metadata'].get('title')}")
#     # print(data['content'])
#
# bad_url = "not-a-real-url"
# get_markdown_from_url(bad_url)
```

```
#!/bin/bash
# A robust bash script for error handling with curl


# Requires: curl, jq


API_KEY="$TABSTACK_API_KEY"
URL_TO_FETCH="$1"


if [ -z "$API_KEY" ]; then
    echo "Error: TABSTACK_API_KEY environment variable not set." >&2
    exit 1
fi


if [ -z "$URL_TO_FETCH" ]; then
    echo "Usage: $0 <url-to-fetch>" >&2
    exit 1
fi


# -s: silent
# -w "\n%{http_code}": write the http code on a new line
response=$(curl -s -w "\n%{http_code}" \
  -X POST https://api.tabstack.ai/v1/extract/markdown \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  --connect-timeout 10 \
  --max-time 30 \
  -d '{
        "url": "'"$URL_TO_FETCH"'",
        "metadata": true,
        "nocache": false
      }')


# Split response body and status code
http_code=$(echo "$response" | tail -n1)
response_body=$(echo "$response" | sed '$d')


if [ "$http_code" -eq 200 ]; then
  echo "Success:"
  echo "$response_body" | jq .
else
  echo "Error (HTTP $http_code):" >&2
  # Try to parse error with jq, fall back to plain echo
  echo "$response_body" | jq .error 2>/dev/null || echo "$response_body" >&2
  exit 1
fi
```

This production-ready function adds three key safety features: a guard clause that checks for the API key upfront, a 30-second timeout to prevent hanging requests, and comprehensive error handling for both HTTP errors and network failures. On errors, it logs helpful messages and returns null rather than crashing.

## Quick Reference

### Endpoint

- **URL:** `https://api.tabstack.ai/v1/extract/markdown`
- **Method:** `POST`
- **Authentication:** `Authorization: Bearer YOUR_API_KEY`

### Request Parameters (JSON Body)

| Parameter  | Type    | Required | Default | Description                                                                                                                    |
| ---------- | ------- | -------- | ------- | ------------------------------------------------------------------------------------------------------------------------------ |
| `url`      | string  | **Yes**  |         | The publicly accessible URL to convert.                                                                                        |
| `metadata` | boolean | No       | `false` | If `true`, returns metadata as a separate `metadata` object. If `false`, embeds metadata as YAML frontmatter in the `content`. |
| `nocache`  | boolean | No       | `false` | If `true`, bypasses the cache and forces a fresh fetch of the URL.                                                             |

### Metadata Object Fields

When `metadata: true` is used (or in frontmatter), these are the common fields you can expect.

> **Note:** Not all fields will be present for every URL. Availability depends entirely on the metadata provided by the source website.

| Field         | Type   | Description                                         |
| ------------- | ------ | --------------------------------------------------- |
| `title`       | string | Page title from Open Graph or HTML `<title>`.       |
| `description` | string | Page description from Open Graph or HTML meta tags. |
| `author`      | string | Author information from HTML metadata.              |
| `publisher`   | string | Publisher name from Open Graph.                     |
| `image`       | string | Featured image URL from Open Graph.                 |
| `site_name`   | string | Website name from Open Graph.                       |
| `url`         | string | Canonical URL from Open Graph.                      |
| `type`        | string | Content type from Open Graph (e.g., “article”).     |

## Best Practices Review

To recap, follow these rules for a smooth integration:

1. **Secure Your Key:** Never hardcode API keys. Use environment variables.

   - **JS:** `process.env.TABSTACK_API_KEY`
   - **Python:** `os.environ.get("TABSTACK_API_KEY")`

2. **Use `metadata: true`:** Prefer the separate `metadata` object for programmatic access. It’s more reliable than parsing YAML.

3. **Set Timeouts:** Always set a reasonable timeout on your HTTP requests.

   - [JavaScript](#tab-panel-48)
   - [Python](#tab-panel-49)

   ```
   // Use AbortController for fetch timeouts
   const controller = new AbortController();
   const timeoutId = setTimeout(() => controller.abort(), 30000);


   fetch(url, {
     signal: controller.signal,
   }).finally(() => {
     clearTimeout(timeoutId);
   });
   ```

   ```
   # The 'requests' library makes this easy
   try:
       response = requests.post(url, json=data, timeout=30)
   except requests.exceptions.Timeout:
       print("Request timed out")
   ```

4. **Handle Errors:** Check for non-2xx HTTP status codes (`!response.ok`) and wrap your network calls in `try...catch` / `try...except` blocks.

5. **Validate URLs:** If possible, validate that a string is a valid `http/https` URL on your end *before* sending it to the API to save a request.

6. **Use Caching:** Don’t use `nocache: true` unless you absolutely need real-time data. Let the API’s cache work for you to get faster responses.

---

## Related Resources

- [API Reference: Extract Markdown Endpoint](/api/resources/extract/methods/convert_to_markdown/index.md)
- [Quick Start Guide](/getting-started/quick-start/index.md)
- [Build Your First Tabstack App](/getting-started/build-your-first-tabs-app/index.md)