Skip to main content

Web Scraper

The web scraper tool uses crawl4ai to extract content from web pages with advanced anti-bot evasion and JavaScript rendering. It's optimised for high-volume scraping of sites that block standard scrapers.

Tool: scrape_url

Scrapes a URL and returns the cleaned text content, including content from dynamically rendered JavaScript.

Arguments

ArgumentTypeDescription
urlstringThe URL to scrape
wait_forstringOptional CSS selector to wait for before extracting
scrollbooleanScroll to bottom to trigger infinite scroll loading

Use cases

  • Scraping news articles, blog posts, documentation
  • Extracting content from sites with bot protection (LinkedIn, financial news sites)
  • Crawling dynamic React/Vue/Angular apps
  • Monitoring competitor pricing or product listings

Anti-bot evasion

The scraper uses crawl4ai's stealth features:

  • Headless browser with realistic browser fingerprints
  • Randomised request timing
  • Cookie and session handling
  • User-agent rotation

This makes it significantly more effective against sites that block requests-based scrapers or require JavaScript.

Example

Scrape the latest articles from https://news.ycombinator.com and summarise the top 5 stories.
Get the pricing information from https://competitor.com/pricing.

Comparison with Browser tool

See Browser → Comparison for a side-by-side comparison.

note

Web scraping should comply with a site's Terms of Service and robots.txt. Use this tool responsibly.