# Extract ## Json `extract.json(ExtractJsonParams**kwargs) -> ExtractJsonResponse` **post** `/extract/json` Fetches a URL and extracts structured data according to a provided JSON schema ### Parameters - `json_schema: object` JSON schema definition that describes the structure of data to extract. - `url: str` URL to fetch and extract data from - `effort: Optional[Literal["min", "standard", "max"]]` Fetch effort level controlling speed vs. capability tradeoff. "min": fastest, no fallback (1-5s). "standard": balanced with enhanced reliability (default, 3-15s). "max": full browser rendering for JS-heavy sites (15-60s). - `"min"` - `"standard"` - `"max"` - `geo_target: Optional[GeoTarget]` Optional geotargeting parameters for proxy requests - `country: Optional[str]` Country code using ISO 3166-1 alpha-2 standard (2 letters, e.g., "US", "GB", "JP"). See: https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2 - `nocache: Optional[bool]` Bypass cache and force fresh data retrieval ### Returns - `Dict[str, object]` ### Example ```python import os from tabstack import Tabstack client = Tabstack( api_key=os.environ.get("TABSTACK_API_KEY"), # This is the default and can be omitted ) response = client.extract.json( json_schema={ "properties": { "stories": { "items": { "properties": { "author": { "description": "Author username", "type": "string", }, "points": { "description": "Story points", "type": "number", }, "title": { "description": "Story title", "type": "string", }, }, "type": "object", }, "type": "array", } }, "type": "object", }, url="https://news.ycombinator.com", ) print(response) ``` ## Markdown `extract.markdown(ExtractMarkdownParams**kwargs) -> ExtractMarkdownResponse` **post** `/extract/markdown` Fetches a URL and converts its HTML content to clean Markdown format with optional metadata extraction ### Parameters - `url: str` URL to fetch and convert to markdown - `effort: Optional[Literal["min", "standard", "max"]]` Fetch effort level controlling speed vs. capability tradeoff. "min": fastest, no fallback (1-5s). "standard": balanced with enhanced reliability (default, 3-15s). "max": full browser rendering for JS-heavy sites (15-60s). - `"min"` - `"standard"` - `"max"` - `geo_target: Optional[GeoTarget]` Optional geotargeting parameters for proxy requests - `country: Optional[str]` Country code using ISO 3166-1 alpha-2 standard (2 letters, e.g., "US", "GB", "JP"). See: https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2 - `metadata: Optional[bool]` Include extracted metadata (Open Graph and HTML metadata) as a separate field in the response - `nocache: Optional[bool]` Bypass cache and force fresh data retrieval ### Returns - `class ExtractMarkdownResponse: …` - `content: str` The markdown content (includes metadata as YAML frontmatter by default) - `url: str` The URL that was converted to markdown - `metadata: Optional[Metadata]` Extracted metadata from the page (only included when metadata parameter is true) - `author: Optional[str]` Author information from HTML metadata - `created_at: Optional[str]` Document creation date (ISO 8601) - `creator: Optional[str]` Creator application (e.g., "Microsoft Word") - `description: Optional[str]` Page description from Open Graph or HTML - `image: Optional[str]` Featured image URL from Open Graph - `keywords: Optional[List[str]]` PDF keywords as array - `modified_at: Optional[str]` Document modification date (ISO 8601) - `page_count: Optional[int]` Number of pages (PDF documents) - `pdf_version: Optional[str]` PDF version (e.g., "1.5") - `producer: Optional[str]` PDF producer software (e.g., "Adobe PDF Library") - `publisher: Optional[str]` Publisher information from Open Graph - `site_name: Optional[str]` Site name from Open Graph - `subject: Optional[str]` PDF-specific metadata fields (populated for PDF documents) PDF subject or summary - `title: Optional[str]` Page title from Open Graph or HTML - `type: Optional[str]` Content type from Open Graph (e.g., article, website) - `url: Optional[str]` Canonical URL from Open Graph ### Example ```python import os from tabstack import Tabstack client = Tabstack( api_key=os.environ.get("TABSTACK_API_KEY"), # This is the default and can be omitted ) response = client.extract.markdown( url="https://example.com/blog/article", ) print(response.content) ```