FRESH

Hacker News

Ask HN: Scaling a targeted web crawler beyond 500M pages/day

22 points by honungsburk

by faangguyindia

1 subcomments

If you want to access data from websites which prevent it, you gotta use a headless browser with Residential Proxy Network Like Bright Data (formerly Luminati).

by 4lx87

2 subcomments

I'm curious, how do you deal with Cloudflare and similar anti-bot systems? Just keep shopping the job around to different proxies?

0 subcomment

by fragmede

1 subcomments