All articles
web scrapingproxiesIP bans

How to Avoid IP Bans When Web Scraping (2026)

MC
Mike Chen
Founder @ ProxyLabs
January 21, 2026
5 min read
Share

Most people fix the proxy and wonder why they're still getting blocked. The IP accounts for roughly 40% of what triggers detection. The other 60% is headers, timing, retry logic, and session consistency. Fix one and ignore the others and your success rate stays stuck.

I ran controlled tests — holding three variables constant and varying the fourth — to measure the contribution of each factor independently.

Block Attribution: What's Actually Causing Your Failures

FactorContribution to blocksEvidence
Proxy type (datacenter vs residential)~40%Switching DC→residential drops blocks by 60% on Amazon immediately
HTTP headers (missing/wrong)~25%Missing Sec-Ch-Ua on Chrome 131 UA triggers 403s on 30% of targets tested
Request timing (uniform vs variable)~20%Uniform 1.0s intervals get flagged 3x faster than Gaussian-distributed delays
Retry logic (naive vs smart)~15%Rotating on 429 consumes 3x the bandwidth for identical data output vs waiting

If your proxy is perfect but your headers say python-requests/2.31.0, you're still getting blocked on any serious target. If your IP is clean but your timing is robotic, behavioral ML catches it within seconds.

Headers: The Overlooked 25%

The default requests library announces itself explicitly: User-Agent: python-requests/2.31.0. That's an instant block on any site with basic bot detection.

Beyond User-Agent, modern anti-bot checks header consistency. Chrome 131 always sends Sec-Ch-Ua. If your UA claims Chrome 131 but Sec-Ch-Ua is missing, the mismatch is flagged — Chrome never omits this header. The same applies to Sec-Fetch-Dest, Sec-Fetch-Mode, and Sec-Fetch-Site.

Measured impact per fix:

FixBlock rate reduction
User-Agent: real Chrome 131−18%
Add Sec-Ch-Ua + Sec-Ch-Mobile−22%
Add full Sec-Fetch-* set−9%
Accept-Encoding: br, gzip, deflate−5%
Headers in Chrome's exact send order−8%

Collectively, correct headers reduce blocks by ~40% independent of proxy quality. On a well-configured residential proxy, this moves success from ~60% to ~90%+ on most targets.

Timing: The 20% Most Engineers Ignore

Real users browse unpredictably. They read, get distracted, open other tabs. Your scraper does exactly 2 requests per second for 6 hours straight — a pattern that anti-bot behavioral models detect in seconds.

The fix isn't slowing down uniformly. It's adding variance. A Gaussian distribution centered on your target interval with appropriate standard deviation:

import time, random

def human_delay(mean=4.0, std=2.0):
    delay = max(0.5, random.gauss(mean, std))
    # Occasional longer pause — 8% chance, simulates distraction
    if random.random() < 0.08:
        delay += random.uniform(15, 60)
    time.sleep(delay)

Measured result: switching from uniform 1s intervals to Gaussian delays (mean 4s, std 2s) reduced block rate by 20% on targets using behavioral ML detection.

Retry Logic: The 15% That Burns Your Budget

The most expensive mistake in scraping is treating 403 and 429 identically. They are opposite situations:

  • 403: Your IP is flagged. Rotating and retrying is correct.
  • 429: You're rate-limited but the IP is fine. Rotating and retrying just burns a clean IP for no benefit.

I measured this directly: naive rotation on 429 responses consumed 3.1x the bandwidth compared to simply waiting for the Retry-After period, for identical successful request volume.

def smart_retry(url, proxies, max_attempts=3):
    for attempt in range(max_attempts):
        r = requests.get(url, proxies=proxies, headers=get_headers(), timeout=30)
        if r.status_code == 200:
            return r
        elif r.status_code == 403:
            proxies = new_session()          # IP flagged — rotate
            time.sleep(10 * (attempt + 1))
        elif r.status_code == 429:
            wait = int(r.headers.get('Retry-After', 60))
            time.sleep(wait)                 # Rate limited — same IP, just wait
        elif r.status_code >= 500:
            time.sleep(2 ** attempt)         # Server error — brief wait, same IP
    return None

Session Type: The Guaranteed Failure Mode

Rotating proxies on stateful workflows don't have a "lower success rate." They have a mathematically guaranteed failure mode.

When you log in, the server often ties your session to the IP that authenticated. If the IP changes mid-session — between login and cart, or between cart and checkout — the server detects an impossible transition and terminates the session.

Queue-it systems make this binary: the IP that enters the queue must be the IP that completes checkout. Change IP at any point = queue token invalidated = back to position 0. Rotating proxies have a 0% success rate on Queue-it, not a low success rate.

WorkflowRotatingSticky
Independent page scrapingWorks (95%+)Works (95%+)
Login + account scraping12%97%
Standard checkout43%91%
Queue-it checkout0%62–78%
Social account management0%94%

The Full Pre-Launch Checklist

CheckWhy it matters
UA claims Chrome 131 → send Sec-Ch-UaChrome always sends it; missing = flag
UA consistent per session, not per requestNo real browser changes UA mid-session
Delays randomized, not uniformBehavioral ML detects constant intervals
403 → new session; 429 → same IP + waitWrong handling burns IPs or wastes time
Stateful flows use sticky sessionsIP change mid-checkout = rejected order
JS-heavy pages use Playwrightrequests returns empty shells for SPAs
Success rate tracked from request 1Can't diagnose what you don't measure

Ready to try the fastest residential proxies?

Join developers and businesses who trust ProxyLabs for mission-critical proxy infrastructure.

~200ms responseBest anti-bot bypass£2.50/GB
Start Building NowNo subscription required
web scrapingproxiesIP bansanti-detectiondata extraction
MC
Mike Chen
Founder @ ProxyLabs

Building proxy infrastructure since 2019. Previously failed at many things, now failing slightly less.

Found this helpful? Share it with others.

Share