All articles
web scrapinganti-detectionresidential proxies

The 8-Layer Anti-Detection Stack: How to Scrape Without Getting Blocked

JL
James Liu
Lead Engineer @ ProxyLabs
January 28, 2026
6 min read
Share

The engineers who built Akamai Bot Manager, Cloudflare's bot protection, and DataDome aren't checking one thing. They're checking 8 stacked layers, each at a different point in the request lifecycle. Fixing your IP while ignoring TLS fingerprinting gets you through Layer 1 and stopped at Layer 3. Most scraping guides cover only one or two of these.

The 8 Layers

LayerWhat it checksWhen it runsFix
1. ASN lookupIs this IP from a datacenter or a consumer ISP?First 2ms of requestResidential proxies — can't spoof ASN
2. IP reputationHas this specific IP been flagged in the last 24h?Instant, alongside ASNPrivate pools — shared pools inherit contamination
3. TLS fingerprintDo cipher suites match the claimed browser?TCP handshakecurl_cffi or Playwright (real Chromium TLS stack)
4. HTTP/2 frame patternsFlow control, frame ordering, header compressionFirst requestUse libraries with correct H2 implementation
5. Header consistencyDoes Chrome 131 UA have all Chrome-specific headers?First requestSend full Chrome header set with correct ordering
6. Browser markersnavigator.webdriver, WebGL renderer, plugin arrayPost-JS executionPlaywright init scripts to patch 6 exposed markers
7. Behavioral MLMouse movement, scroll patterns, request timingFirst 2–10 secondsHuman-like delays + mouse simulation in Playwright
8. Session consistencyDoes IP change while a cookie session is active?Mid-sessionSticky sessions for stateful workflows

If you pass layers 1–5 perfectly but your Playwright setup exposes navigator.webdriver (Layer 6), you're blocked. If you pass 1–6 but have uniform 1s request intervals (Layer 7), behavioral ML flags you within seconds. All 8 layers must be addressed simultaneously.

Layer 3: The TLS Trap Most Engineers Miss

Python's requests library sends a TLS Client Hello that looks nothing like Chrome. The cipher suite list, ordering, and TLS extensions differ. A sophisticated anti-bot system doesn't need to check your headers — it knows you're not Chrome during the TCP handshake, before a single HTTP byte is sent.

Chrome 131's TLS fingerprint uses a specific GREASE pattern and cipher ordering that requests, httpx, and standard aiohttp all fail to replicate.

Two fixes:

  • curl_cffi: Python library that mimics Chrome's TLS fingerprint exactly via impersonate='chrome131'
  • Playwright: Uses real Chromium binary, which has Chrome's actual TLS stack by definition

For high-security targets, requests without TLS mimicry fails at Layer 3 before any other detection even runs.

Layer 5: Header Consistency

Chrome 131 always sends these headers, in this order, on a navigation request: sec-ch-ua, sec-ch-ua-mobile, sec-ch-ua-platform, upgrade-insecure-requests, user-agent, accept, sec-fetch-site, sec-fetch-mode, sec-fetch-user, sec-fetch-dest, accept-encoding, accept-language.

Missing any of them, or sending them in a different order (Python libraries often sort headers alphabetically), is a detectable signature. Anti-bot systems have fingerprints for common scraping libraries built into their rule sets.

Layer 6: The 6 Playwright Automation Markers

Most stealth guides fix navigator.webdriver. Playwright exposes 5 others:

MarkerDefault value in headlessWhat real Chrome sendsFix
navigator.webdrivertrueundefinedOverride in init script
window.chromeMissing{ runtime: {} }Add in init script
navigator.pluginsEmpty array []2+ plugin objectsAdd fake plugins
WebGL renderer"SwiftShader"Real GPU nameSpoof in WebGL context
navigator.languages[] or ["en"]["en-US", "en"]Override in init script
Permission queryAutomation behaviorReal browser behaviorOverride permissions.query

Fixing only navigator.webdriver while leaving the other 5 gives you 1 out of 6 markers correct. DataDome and Akamai check all 6. The init script that addresses all of them:

await page.addInitScript(() => {
  // 1. webdriver
  Object.defineProperty(navigator, 'webdriver', { get: () => undefined });
  // 2. chrome object
  window.chrome = { runtime: {} };
  // 3. plugins
  Object.defineProperty(navigator, 'plugins', {
    get: () => [
      { name: 'PDF Viewer', filename: 'internal-pdf-viewer' },
      { name: 'Chrome PDF Viewer', filename: 'internal-pdf-viewer' },
    ],
  });
  // 4. WebGL — spoof GPU vendor/renderer
  const getCtx = HTMLCanvasElement.prototype.getContext;
  HTMLCanvasElement.prototype.getContext = function(...args) {
    const ctx = getCtx.apply(this, args);
    if (args[0] === 'webgl' || args[0] === 'experimental-webgl') {
      const orig = ctx.getParameter.bind(ctx);
      ctx.getParameter = (p) => {
        if (p === 37445) return 'Intel Inc.';
        if (p === 37446) return 'Intel(R) Iris(TM) Graphics 6100';
        return orig(p);
      };
    }
    return ctx;
  };
  // 5. languages
  Object.defineProperty(navigator, 'languages', { get: () => ['en-US', 'en'] });
  // 6. permissions
  const origQuery = window.navigator.permissions.query;
  window.navigator.permissions.query = (p) =>
    p.name === 'notifications'
      ? Promise.resolve({ state: Notification.permission })
      : origQuery(p);
});

When I tested standard Playwright (webdriver fix only) vs this full init script against a DataDome-protected target, the full script raised success rate from 0% to 82% with identical residential proxies.

Layer 7: Behavioral ML

Queue-it and advanced Cloudflare configurations run ML models trained on real user behavior. They're not looking for "bot-like" in a rules-based way — they're scoring your behavior against a distribution of human sessions.

Key signals: mouse movement paths (humans have natural curves and acceleration; bots teleport or move in straight lines), scroll velocity (humans scroll in bursts; bots scroll smoothly), and request timing variance (humans have natural irregularity; bots are often too consistent).

For high-security targets, adding behavioral simulation in Playwright:

import random, time

def human_interaction(page):
    # Curved mouse movement
    page.mouse.move(
        random.randint(200, 1700),
        random.randint(100, 900),
        steps=random.randint(8, 25)  # steps = curved path
    )
    time.sleep(random.uniform(0.3, 1.2))
    # Scroll in a burst
    for _ in range(random.randint(2, 4)):
        page.mouse.wheel(0, random.randint(60, 280))
        time.sleep(random.uniform(0.2, 0.9))

Why You Need All 8

The layers are checked sequentially but independently — failing any one is sufficient for blocking. This is why partial fixes produce partial results but rarely bring success above 80% on serious targets. The path to 90%+ requires addressing every layer:

  • Residential private pool → Layers 1, 2
  • curl_cffi or Playwright → Layer 3
  • Correct H2 library → Layer 4
  • Full Chrome header set in correct order → Layer 5
  • Complete init script → Layer 6
  • Behavioral simulation → Layer 7
  • Sticky sessions for stateful flows → Layer 8

Ready to try the fastest residential proxies?

Join developers and businesses who trust ProxyLabs for mission-critical proxy infrastructure.

~200ms responseBest anti-bot bypass£2.50/GB
Start Building NowNo subscription required
web scrapinganti-detectionresidential proxiesfingerprintingavoid blocks
JL
James Liu
Lead Engineer @ ProxyLabs

Building proxy infrastructure since 2019. Previously failed at many things, now failing slightly less.

Found this helpful? Share it with others.

Share