/extract
endpoint simplifies collecting structured data from any number of URLs or entire domains. Provide a list of URLs, optionally with wildcards (e.g., example.com/*
), and a prompt or schema describing the information you want. Firecrawl handles the details of crawling, parsing, and collating large or small datasets.
Extract is billed differently than other endpoints. See the Extract pricing for details.
Using /extract
You can extract structured data from one or multiple URLs, including wildcards:
- Single Page
Example:https://firecrawl.dev/some-page
- Multiple Pages / Full Domain
Example:https://firecrawl.dev/*
/*
, Firecrawl will automatically crawl and parse all URLs it can discover in that domain, then extract the requested data. This feature is experimental; email help@firecrawl.com if you have issues.
Example Usage
- urls: An array of one or more URLs. Supports wildcards (
/*
) for broader crawling. - prompt (Optional unless no schema): A natural language prompt describing the data you want or specifying how you want that data structured.
- schema (Optional unless no prompt): A more rigid structure if you already know the JSON layout.
- enableWebSearch (Optional): When
true
, extraction can follow links outside the specified domain.
Response (sdks)
JSON
Job status and completion
When you submit an extraction job—either directly via the API or through the starter methods—you’ll receive a Job ID. You can use this ID to:- Get Job Status: Send a request to the /extract/ endpoint to see if the job is still running or has finished.
- Wait for results: If you use the default
extract
method (Python/Node), the SDK waits and returns final results. - Start then poll: If you use the start methods—
start_extract
(Python) orstartExtract
(Node)—the SDK returns a Job ID immediately. Useget_extract_status
(Python) orgetExtractStatus
(Node) to check progress.
This endpoint only works for jobs in progress or recently completed (within 24
hours).
Possible States
- completed: The extraction finished successfully.
- processing: Firecrawl is still processing your request.
- failed: An error occurred; data was not fully extracted.
- cancelled: The job was cancelled by the user.
Pending Example
JSON
Completed Example
JSON
Extracting without a Schema
If you prefer not to define a strict structure, you can simply provide aprompt
. The underlying model will choose a structure for you, which can be useful for more exploratory or flexible requests.
JSON
Improving Results with Web Search
SettingenableWebSearch = true
in your request will expand the crawl beyond the provided URL set. This can capture supporting or related information from linked pages.
Here’s an example that extracts information about dash cams, enriching the results with data from related pages:
Example Response with Web Search
JSON
Extracting without URLs
The/extract
endpoint now supports extracting structured data using a prompt without needing specific URLs. This is useful for research or when exact URLs are unknown. Currently in Alpha.
Known Limitations (Beta)
-
Large-Scale Site Coverage
Full coverage of massive sites (e.g., “all products on Amazon”) in a single request is not yet supported. -
Complex Logical Queries
Requests like “find every post from 2025” may not reliably return all expected data. More advanced query capabilities are in progress. -
Occasional Inconsistencies
Results might differ across runs, particularly for very large or dynamic sites. Usually it captures core details, but some variation is possible. -
Beta State
Since/extract
is still in Beta, features and performance will continue to evolve. We welcome bug reports and feedback to help us improve.
Using FIRE-1
FIRE-1 is an AI agent that enhances Firecrawl’s scraping capabilities. It can controls browser actions and navigates complex website structures to enable comprehensive data extraction beyond traditional scraping methods. You can leverage the FIRE-1 agent with the/extract
endpoint for complex extraction tasks that require navigation across multiple pages or interaction with elements.
Example (cURL):
FIRE-1 is already live and available under preview.