# Extract

## Json

`extract.json(ExtractJsonParams**kwargs)  -> ExtractJsonResponse`

**post** `/extract/json`

Fetches a URL and extracts structured data according to a provided JSON schema

### Parameters

- `json_schema: object`

  JSON schema definition that describes the structure of data to extract.

- `url: str`

  URL to fetch and extract data from

- `effort: Optional[Literal["min", "standard", "max"]]`

  Fetch effort level controlling speed vs. capability tradeoff. "min": fastest, no fallback (1-5s). "standard": balanced with enhanced reliability (default, 3-15s). "max": full browser rendering for JS-heavy sites (15-60s).

  - `"min"`

  - `"standard"`

  - `"max"`

- `geo_target: Optional[GeoTarget]`

  Optional geotargeting parameters for proxy requests

  - `country: Optional[str]`

    Country code using ISO 3166-1 alpha-2 standard (2 letters, e.g., "US", "GB", "JP").
    See: https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2

- `nocache: Optional[bool]`

  Bypass cache and force fresh data retrieval

### Returns

- `Dict[str, object]`

### Example

```python
import os
from tabstack import Tabstack

client = Tabstack(
    api_key=os.environ.get("TABSTACK_API_KEY"),  # This is the default and can be omitted
)
response = client.extract.json(
    json_schema={
        "properties": {
            "stories": {
                "items": {
                    "properties": {
                        "author": {
                            "description": "Author username",
                            "type": "string",
                        },
                        "points": {
                            "description": "Story points",
                            "type": "number",
                        },
                        "title": {
                            "description": "Story title",
                            "type": "string",
                        },
                    },
                    "type": "object",
                },
                "type": "array",
            }
        },
        "type": "object",
    },
    url="https://news.ycombinator.com",
)
print(response)
```

## Markdown

`extract.markdown(ExtractMarkdownParams**kwargs)  -> ExtractMarkdownResponse`

**post** `/extract/markdown`

Fetches a URL and converts its HTML content to clean Markdown format with optional metadata extraction

### Parameters

- `url: str`

  URL to fetch and convert to markdown

- `effort: Optional[Literal["min", "standard", "max"]]`

  Fetch effort level controlling speed vs. capability tradeoff. "min": fastest, no fallback (1-5s). "standard": balanced with enhanced reliability (default, 3-15s). "max": full browser rendering for JS-heavy sites (15-60s).

  - `"min"`

  - `"standard"`

  - `"max"`

- `geo_target: Optional[GeoTarget]`

  Optional geotargeting parameters for proxy requests

  - `country: Optional[str]`

    Country code using ISO 3166-1 alpha-2 standard (2 letters, e.g., "US", "GB", "JP").
    See: https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2

- `metadata: Optional[bool]`

  Include extracted metadata (Open Graph and HTML metadata) as a separate field in the response

- `nocache: Optional[bool]`

  Bypass cache and force fresh data retrieval

### Returns

- `class ExtractMarkdownResponse: …`

  - `content: str`

    The markdown content (includes metadata as YAML frontmatter by default)

  - `url: str`

    The URL that was converted to markdown

  - `metadata: Optional[Metadata]`

    Extracted metadata from the page (only included when metadata parameter is true)

    - `author: Optional[str]`

      Author information from HTML metadata

    - `created_at: Optional[str]`

      Document creation date (ISO 8601)

    - `creator: Optional[str]`

      Creator application (e.g., "Microsoft Word")

    - `description: Optional[str]`

      Page description from Open Graph or HTML

    - `image: Optional[str]`

      Featured image URL from Open Graph

    - `keywords: Optional[List[str]]`

      PDF keywords as array

    - `modified_at: Optional[str]`

      Document modification date (ISO 8601)

    - `page_count: Optional[int]`

      Number of pages (PDF documents)

    - `pdf_version: Optional[str]`

      PDF version (e.g., "1.5")

    - `producer: Optional[str]`

      PDF producer software (e.g., "Adobe PDF Library")

    - `publisher: Optional[str]`

      Publisher information from Open Graph

    - `site_name: Optional[str]`

      Site name from Open Graph

    - `subject: Optional[str]`

      PDF-specific metadata fields (populated for PDF documents)
      PDF subject or summary

    - `title: Optional[str]`

      Page title from Open Graph or HTML

    - `type: Optional[str]`

      Content type from Open Graph (e.g., article, website)

    - `url: Optional[str]`

      Canonical URL from Open Graph

### Example

```python
import os
from tabstack import Tabstack

client = Tabstack(
    api_key=os.environ.get("TABSTACK_API_KEY"),  # This is the default and can be omitted
)
response = client.extract.markdown(
    url="https://example.com/blog/article",
)
print(response.content)
```