Installation
To install the Firecrawl Python SDK, you can use pip:Python
Usage
- Get an API key from firecrawl.dev
- Set the API key as an environment variable named
FIRECRAWL_API_KEY
or pass it as a parameter to theFirecrawl
class.
Python
Scraping a URL
To scrape a single URL, use thescrape
method. It takes the URL as a parameter and returns the scraped document.
Python
Crawl a Website
To crawl a website, use thecrawl
method. It takes the starting URL and optional options as arguments. The options allow you to specify additional settings for the crawl job, such as the maximum number of pages to crawl, allowed domains, and the output format. See Pagination for auto/manual pagination and limiting.
Python
Start a Crawl
Prefer non-blocking? Check out the Async Class section below.
start_crawl
. It returns a job ID
you can use to check status. Use crawl
when you want a waiter that blocks until completion. See Pagination for paging behavior and limits.
Python
Checking Crawl Status
To check the status of a crawl job, use theget_crawl_status
method. It takes the job ID as a parameter and returns the current status of the crawl job.
Python
Cancelling a Crawl
To cancel an crawl job, use thecancel_crawl
method. It takes the job ID of the start_crawl
as a parameter and returns the cancellation status.
Python
Map a Website
Usemap
to generate a list of URLs from a website. The options let you customize the mapping process, including excluding subdomains or utilizing the sitemap.
Python
Crawling a Website with WebSockets
To crawl a website with WebSockets, start the job withstart_crawl
and subscribe using the watcher
helper. Create a watcher with the job ID and attach handlers (e.g., for page, completed, failed) before calling start()
.
Python
Pagination
Firecrawl endpoints for crawl and batch return anext
URL when more data is available. The Python SDK auto-paginates by default and aggregates all documents; in that case next
will be None
. You can disable auto-pagination or set limits.
Crawl
Use the waiter methodcrawl
for the simplest experience, or start a job and page manually.
Simple crawl (auto-pagination, default)
- See the default flow in Crawl a Website.
Manual crawl with pagination control (single page)
- Start a job, then fetch one page at a time with
auto_paginate=False
.
Python
Manual crawl with limits (auto-pagination + early stop)
- Keep auto-pagination on but stop early with
max_pages
,max_results
, ormax_wait_time
.
Python
Batch Scrape
Use the waiter methodbatch_scrape
, or start a job and page manually.
Simple batch scrape (auto-pagination, default)
- See the default flow in Batch Scrape.
Manual batch scrape with pagination control (single page)
- Start a job, then fetch one page at a time with
auto_paginate=False
.
Python
Manual batch scrape with limits (auto-pagination + early stop)
- Keep auto-pagination on but stop early with
max_pages
,max_results
, ormax_wait_time
.
Python
Error Handling
The SDK handles errors returned by the Firecrawl API and raises appropriate exceptions. If an error occurs during a request, an exception will be raised with a descriptive error message.Async Class
For async operations, use theAsyncFirecrawl
class. Its methods mirror Firecrawl
, but they don’t block the main thread.
Python