--- title: Mastering the Markdown Endpoint | Tabstack --- Scraping web content is often messy. You’re left with a complex tangle of HTML, boilerplate, and ads, when all you really want is the clean, structured content. The Tabstack API Markdown Endpoint solves this. It’s a single `POST` request that fetches any public URL, intelligently parses the HTML, and returns clean, well-formatted Markdown. It’s the perfect tool for: - Building content aggregation or “read-it-later” apps. - Preparing web content for AI/LLM processing and RAG pipelines. - Converting blog posts or articles into a stable, storable format. - Powering documentation and content management systems. This guide will walk you through setting up your environment, making your first request, and building a production-ready function to handle content conversion robustly. ## Prerequisites & Authentication Before you begin, you’ll need a Tabstack API key. You can get yours by signing up at . The API uses Bearer token authentication. We **strongly recommend** storing your key as an environment variable rather than hardcoding it in your application. First, set the variable in your terminal session. Terminal window ``` export TABSTACK_API_KEY="your-api-key-here" ``` The `export TABSTACK_API_KEY=...` Bash command sets an environment variable named `TABSTACK_API_KEY` for your current session. Your application code (e.g., in Python or Node.js) can then access this variable, keeping your secret key out of your source code. ## The Basic Request Let’s start by converting a URL with the simplest possible request. The endpoint lives at `https://api.tabstack.ai/v1/extract/markdown` and expects a `POST` request with a JSON body. - [curl](#tab-panel-36) - [JavaScript](#tab-panel-37) - [Python](#tab-panel-38) Terminal window ``` curl -X POST https://api.tabstack.ai/v1/extract/markdown \ -H "Authorization: Bearer $TABSTACK_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com/blog/article" }' ``` ``` const response = await fetch("https://api.tabstack.ai/v1/extract/markdown", { method: "POST", headers: { Authorization: `Bearer ${process.env.TABSTACK_API_KEY}`, "Content-Type": "application/json", }, body: JSON.stringify({ url: "https://example.com/blog/article", }), }); const data = await response.json(); console.log(data); ``` ``` import requests import os api_key = os.environ.get("TABSTACK_API_KEY") endpoint_url = "https://api.tabstack.ai/v1/extract/markdown" headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json" } payload = { "url": "https://example.com/blog/article" } response = requests.post(endpoint_url, headers=headers, json=payload) data = response.json() print(data) ``` All three examples make an authenticated POST request with a JSON body containing the target URL. The API fetches the page, extracts the main content, and returns it as clean Markdown with metadata. ### Default Response: Content with Frontmatter A successful request returns a JSON object. By default, the API cleverly embeds all extracted metadata (like title, author, etc.) as **YAML frontmatter** at the top of the content. ``` { "url": "https://example.com/blog/article", "content": "---\ntitle: Example Article Title\ndescription: This is an example article...\nauthor: Example Author\nimage: https://example.com/images/article.jpg\n---\n\n# Example Article Title\n\nThis is the article content converted to markdown..." } ``` The response includes the processed URL and the content with YAML frontmatter embedded—perfect for static site generators like Hugo or Jekyll that expect this format. ## Getting Separate Metadata YAML frontmatter is great, but sometimes you want metadata as a clean, parsable JSON object, separate from the content. This is essential for populating databases or feeding structured data to other systems. To do this, simply add the `metadata: true` parameter to your request. - [curl](#tab-panel-39) - [JavaScript](#tab-panel-40) - [Python](#tab-panel-41) Terminal window ``` curl -X POST https://api.tabstack.ai/v1/extract/markdown \ -H "Authorization: Bearer $TABSTACK_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com/blog/article", "metadata": true }' ``` ``` const response = await fetch("https://api.tabstack.ai/v1/extract/markdown", { method: "POST", headers: { Authorization: `Bearer ${process.env.TABSTACK_API_KEY}`, "Content-Type": "application/json", }, body: JSON.stringify({ url: "https://example.com/blog/article", metadata: true, // Request separate metadata }), }); const data = await response.json(); console.log(data); ``` ``` import requests import os api_key = os.environ.get("TABSTACK_API_KEY") endpoint_url = "https://api.tabstack.ai/v1/extract/markdown" headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json" } payload = { "url": "https://example.com/blog/article", "metadata": True # Request separate metadata } response = requests.post(endpoint_url, headers=headers, json=payload) data = response.json() print(data) ``` Adding `metadata: true` to the request changes the response format to separate content and metadata into distinct fields. ### New Response: Clean Content + Metadata Object By setting `metadata: true`, the response structure now includes a separate `metadata` object. ``` { "url": "https://example.com/blog/article", "content": "# Example Article Title\n\nThis is the article content converted to markdown...", "metadata": { "title": "Example Article Title", "description": "This is an example article description", "author": "Example Author", "publisher": "Example Publisher", "image": "https://example.com/images/article.jpg", "site_name": "Example Blog", "url": "https://example.com/blog/article", "type": "article" } } ``` Now `content` contains pure Markdown without frontmatter, and `metadata` is a structured JSON object. This format is easier to work with programmatically—no YAML parsing needed. > **✨ Pro Tip:** We recommend using **`metadata: true`** for most programmatic use cases. It’s more reliable and easier than parsing YAML, which can be brittle if descriptions or titles contain special characters. ## Forcing a Fresh Fetch For performance, the Tabstack API caches results for a short period. This is perfect for static content like blog posts. However, if you’re scraping a breaking news site or a live feed, you’ll want to bypass the cache. You can do this using the `nocache: true` parameter. - [curl](#tab-panel-42) - [JavaScript](#tab-panel-43) - [Python](#tab-panel-44) Terminal window ``` curl -X POST https://api.tabstack.ai/v1/extract/markdown \ -H "Authorization: Bearer $TABSTACK_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://news-site.com/breaking-news", "nocache": true }' ``` ``` const response = await fetch("https://api.tabstack.ai/v1/extract/markdown", { method: "POST", headers: { Authorization: `Bearer ${process.env.TABSTACK_API_KEY}`, "Content-Type": "application/json", }, body: JSON.stringify({ url: "https://news-site.com/breaking-news", nocache: true, // Force a fresh fetch }), }); const data = await response.json(); console.log(data.content); ``` ``` import requests import os api_key = os.environ.get("TABSTACK_API_KEY") endpoint_url = "https://api.tabstack.ai/v1/extract/markdown" headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json" } payload = { "url": "https://news-site.com/breaking-news", "nocache": True # Force a fresh fetch } response = requests.post(endpoint_url, headers=headers, json=payload) data = response.json() print(data['content']) ``` Setting `nocache: true` bypasses the cache and fetches fresh content. Use this for real-time data but expect slower responses since nothing can be reused from cache. **Note:** Use `nocache` judiciously. Forcing a fresh fetch will result in slightly slower response times, as the API cannot serve the request from its cache. ## Production-Ready Error Handling In a real-world application, you can’t assume every request will succeed. URLs may be invalid, sites may be down, or your API key might be wrong. A robust application must handle these failures gracefully. The API uses standard HTTP status codes to indicate errors. | Status Code | Error Message | Description | | ----------- | --------------------------------- | ------------------------------------------------- | | 400 | `url is required` | The JSON body is missing the `url` parameter. | | 401 | `Unauthorized - Invalid token` | Your API key is missing, invalid, or expired. | | 422 | `url is invalid` | The provided URL is malformed. | | 422 | `access to internal resources...` | You tried to access `localhost` or a private IP. | | 500 | `failed to fetch URL` | The target server is down or blocked our request. | | 500 | `failed to convert HTML...` | An internal error occurred during conversion. | All error responses return a simple JSON object: ``` { "error": "url is invalid" } ``` ### Robust Error Handling Examples Here are production-ready examples that encapsulate the logic, set timeouts, and handle potential errors correctly. - [JavaScript](#tab-panel-45) - [Python](#tab-panel-46) - [curl (Bash)](#tab-panel-47) ``` import "dotenv/config"; // To load .env file import { AbortSignal } from "abort-controller"; async function getMarkdownFromUrl(url, forceFresh = false) { const apiKey = process.env.TABSTACK_API_KEY; if (!apiKey) { console.error("TABSTACK_API_KEY environment variable not set."); return null; } const endpoint = "https://api.tabstack.ai/v1/extract/markdown"; // Set a 30-second timeout const controller = new AbortController(); const timeoutId = setTimeout(() => controller.abort(), 30000); try { const response = await fetch(endpoint, { method: "POST", headers: { Authorization: `Bearer ${apiKey}`, "Content-Type": "application/json", }, body: JSON.stringify({ url: url, metadata: true, // Always use the reliable metadata object nocache: forceFresh, }), signal: controller.signal, // Pass the AbortSignal }); clearTimeout(timeoutId); // Clear the timeout if fetch succeeds const data = await response.json(); if (!response.ok) { // Handle API errors (4xx, 5xx) console.warn( `API Error (HTTP ${response.status}) for ${url}: ${data.error || "Unknown error"}` ); return null; } return data; } catch (error) { clearTimeout(timeoutId); // Clear timeout on error if (error.name === "AbortError") { console.error(`Request timed out for ${url}`); } else { console.error(`Network error for ${url}: ${error.message}`); } return null; } } // --- Usage --- // (async () => { // const data = await getMarkdownFromUrl("https://example.com/article"); // if (data) { // console.log(`Title: ${data.metadata?.title}`); // // console.log(data.content); // } // })(); ``` ``` import requests import os import logging # Configure logging logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s') def get_markdown_from_url(url: str, force_fresh: bool = False) -> dict | None: """ Fetches clean markdown from a URL using the Tabstack API. Returns the parsed JSON data or None on failure. """ api_key = os.environ.get("TABSTACK_API_KEY") if not api_key: logging.error("TABSTACK_API_KEY environment variable not set.") return None endpoint_url = "https://api.tabstack.ai/v1/extract/markdown" headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json" } payload = { "url": url, "metadata": True, # Always use the reliable metadata object "nocache": force_fresh } try: response = requests.post( endpoint_url, headers=headers, json=payload, timeout=30 # Always set a 30-second timeout! ) # Check for HTTP errors (4xx or 5xx) if not response.ok: error_data = response.json() logging.warning( f"API Error (HTTP {response.status_code}) for {url}: {error_data.get('error')}" ) return None return response.json() except requests.exceptions.Timeout: logging.error(f"Request timed out for {url}") return None except requests.exceptions.RequestException as e: # Catch-all for network/connection errors logging.error(f"Network error for {url}: {e}") return None except requests.exceptions.JSONDecodeError: # Catch error if response is not valid JSON logging.error(f"Failed to decode JSON response from API. Status: {response.status_code}") return None # --- Usage --- # good_url = "https://your-blog.com/some-article" # data = get_markdown_from_url(good_url) # # if data: # logging.info(f"Title: {data['metadata'].get('title')}") # # print(data['content']) # # bad_url = "not-a-real-url" # get_markdown_from_url(bad_url) ``` ``` #!/bin/bash # A robust bash script for error handling with curl # Requires: curl, jq API_KEY="$TABSTACK_API_KEY" URL_TO_FETCH="$1" if [ -z "$API_KEY" ]; then echo "Error: TABSTACK_API_KEY environment variable not set." >&2 exit 1 fi if [ -z "$URL_TO_FETCH" ]; then echo "Usage: $0 " >&2 exit 1 fi # -s: silent # -w "\n%{http_code}": write the http code on a new line response=$(curl -s -w "\n%{http_code}" \ -X POST https://api.tabstack.ai/v1/extract/markdown \ -H "Authorization: Bearer $API_KEY" \ -H "Content-Type: application/json" \ --connect-timeout 10 \ --max-time 30 \ -d '{ "url": "'"$URL_TO_FETCH"'", "metadata": true, "nocache": false }') # Split response body and status code http_code=$(echo "$response" | tail -n1) response_body=$(echo "$response" | sed '$d') if [ "$http_code" -eq 200 ]; then echo "Success:" echo "$response_body" | jq . else echo "Error (HTTP $http_code):" >&2 # Try to parse error with jq, fall back to plain echo echo "$response_body" | jq .error 2>/dev/null || echo "$response_body" >&2 exit 1 fi ``` This production-ready function adds three key safety features: a guard clause that checks for the API key upfront, a 30-second timeout to prevent hanging requests, and comprehensive error handling for both HTTP errors and network failures. On errors, it logs helpful messages and returns null rather than crashing. ## Quick Reference ### Endpoint - **URL:** `https://api.tabstack.ai/v1/extract/markdown` - **Method:** `POST` - **Authentication:** `Authorization: Bearer YOUR_API_KEY` ### Request Parameters (JSON Body) | Parameter | Type | Required | Default | Description | | ---------- | ------- | -------- | ------- | ------------------------------------------------------------------------------------------------------------------------------ | | `url` | string | **Yes** | | The publicly accessible URL to convert. | | `metadata` | boolean | No | `false` | If `true`, returns metadata as a separate `metadata` object. If `false`, embeds metadata as YAML frontmatter in the `content`. | | `nocache` | boolean | No | `false` | If `true`, bypasses the cache and forces a fresh fetch of the URL. | ### Metadata Object Fields When `metadata: true` is used (or in frontmatter), these are the common fields you can expect. > **Note:** Not all fields will be present for every URL. Availability depends entirely on the metadata provided by the source website. | Field | Type | Description | | ------------- | ------ | --------------------------------------------------- | | `title` | string | Page title from Open Graph or HTML ``. | | `description` | string | Page description from Open Graph or HTML meta tags. | | `author` | string | Author information from HTML metadata. | | `publisher` | string | Publisher name from Open Graph. | | `image` | string | Featured image URL from Open Graph. | | `site_name` | string | Website name from Open Graph. | | `url` | string | Canonical URL from Open Graph. | | `type` | string | Content type from Open Graph (e.g., “article”). | ## Best Practices Review To recap, follow these rules for a smooth integration: 1. **Secure Your Key:** Never hardcode API keys. Use environment variables. - **JS:** `process.env.TABSTACK_API_KEY` - **Python:** `os.environ.get("TABSTACK_API_KEY")` 2. **Use `metadata: true`:** Prefer the separate `metadata` object for programmatic access. It’s more reliable than parsing YAML. 3. **Set Timeouts:** Always set a reasonable timeout on your HTTP requests. - [JavaScript](#tab-panel-48) - [Python](#tab-panel-49) ``` // Use AbortController for fetch timeouts const controller = new AbortController(); const timeoutId = setTimeout(() => controller.abort(), 30000); fetch(url, { signal: controller.signal, }).finally(() => { clearTimeout(timeoutId); }); ``` ``` # The 'requests' library makes this easy try: response = requests.post(url, json=data, timeout=30) except requests.exceptions.Timeout: print("Request timed out") ``` 4. **Handle Errors:** Check for non-2xx HTTP status codes (`!response.ok`) and wrap your network calls in `try...catch` / `try...except` blocks. 5. **Validate URLs:** If possible, validate that a string is a valid `http/https` URL on your end *before* sending it to the API to save a request. 6. **Use Caching:** Don’t use `nocache: true` unless you absolutely need real-time data. Let the API’s cache work for you to get faster responses. --- ## Related Resources - [API Reference: Extract Markdown Endpoint](/api/resources/extract/methods/convert_to_markdown/index.md) - [Quick Start Guide](/getting-started/quick-start/index.md) - [Build Your First Tabstack App](/getting-started/build-your-first-tabs-app/index.md)