Web Scraper
The web scraper tool uses crawl4ai to extract content from web pages with advanced anti-bot evasion and JavaScript rendering. It's optimised for high-volume scraping of sites that block standard scrapers.
Tool: scrape_url
Scrapes a URL and returns the cleaned text content, including content from dynamically rendered JavaScript.
Arguments
| Argument | Type | Description |
|---|---|---|
url | string | The URL to scrape |
wait_for | string | Optional CSS selector to wait for before extracting |
scroll | boolean | Scroll to bottom to trigger infinite scroll loading |
Use cases
- Scraping news articles, blog posts, documentation
- Extracting content from sites with bot protection (LinkedIn, financial news sites)
- Crawling dynamic React/Vue/Angular apps
- Monitoring competitor pricing or product listings
Anti-bot evasion
The scraper uses crawl4ai's stealth features:
- Headless browser with realistic browser fingerprints
- Randomised request timing
- Cookie and session handling
- User-agent rotation
This makes it significantly more effective against sites that block requests-based scrapers or require JavaScript.
Example
Scrape the latest articles from https://news.ycombinator.com and summarise the top 5 stories.
Get the pricing information from https://competitor.com/pricing.
Comparison with Browser tool
See Browser → Comparison for a side-by-side comparison.
note
Web scraping should comply with a site's Terms of Service and robots.txt. Use this tool responsibly.