Skip to content
Get started

Extract

JSON
client.extract.json(ExtractJsonParams { json_schema, url, effort, 2 more } body, RequestOptionsoptions?): ExtractJsonResponse
POST/extract/json
Markdown
client.extract.markdown(ExtractMarkdownParams { url, effort, geo_target, 2 more } body, RequestOptionsoptions?): ExtractMarkdownResponse { content, url, metadata }
POST/extract/markdown
ModelsExpand Collapse
ExtractJsonResponse = Record<string, unknown>
ExtractMarkdownResponse { content, url, metadata }
content: string

The markdown content (includes metadata as YAML frontmatter by default)

url: string

The URL that was converted to markdown

formaturi
metadata?: Metadata { author, created_at, creator, 13 more }

Extracted metadata from the page (only included when metadata parameter is true)

author?: string

Author information from HTML metadata

created_at?: string

Document creation date (ISO 8601)

creator?: string

Creator application (e.g., “Microsoft Word”)

description?: string

Page description from Open Graph or HTML

image?: string

Featured image URL from Open Graph

formaturi
keywords?: Array<string>

PDF keywords as array

modified_at?: string

Document modification date (ISO 8601)

page_count?: number

Number of pages (PDF documents)

pdf_version?: string

PDF version (e.g., “1.5”)

producer?: string

PDF producer software (e.g., “Adobe PDF Library”)

publisher?: string

Publisher information from Open Graph

site_name?: string

Site name from Open Graph

subject?: string

PDF-specific metadata fields (populated for PDF documents) PDF subject or summary

title?: string

Page title from Open Graph or HTML

type?: string

Content type from Open Graph (e.g., article, website)

url?: string

Canonical URL from Open Graph

formaturi