All articles
amazonweb scrapingresidential proxies

How to Scrape Amazon Prices in 2026 (Without Getting Blocked)

JL
James Liu
Lead Engineer @ ProxyLabs
March 15, 2026
7 min read
Share

Amazon is one of the hardest sites to scrape reliably. Their anti-bot system combines device fingerprinting, behavioral analysis, IP reputation scoring, and CAPTCHA challenges in a layered stack that catches most scrapers within 20-50 requests. In 2026, a bare requests.get() call to an Amazon product page — even with a residential IP — gets blocked about 70% of the time.

This guide covers what actually works: the specific request patterns, headers, and proxy configurations that maintain a 90%+ success rate across thousands of product pages.

Amazon's Anti-Bot Stack (What You're Up Against)

LayerWhat it checksDetection speed
IP reputationDatacenter IP ranges, known proxy ASNsInstant — first request
TLS fingerprintJA3/JA4 hash of your TLS handshakeInstant — before HTTP
Header analysisMissing/wrong order of headers, Accept-LanguageFirst request
Behavioral signalsRequest rate, navigation patterns, mouse events10-50 requests
Device fingerprintCanvas, WebGL, fonts (browser only)First page load
CAPTCHA (fallback)Image/audio challenge when suspicion is moderateAfter soft signals

The key insight: Amazon doesn't block on any single signal. They score your session across all layers and trigger blocks when the cumulative score passes a threshold. This is why some scrapers "work sometimes" — they're hovering near the threshold and randomly tipping over it.

Method 1: requests + Residential Proxies (Simple Pages)

For product pages that don't require JavaScript rendering (most Amazon product pages serve full HTML), requests with the right headers works:

import requests
import random
import time
from bs4 import BeautifulSoup

PROXY = {
    'http': 'http://your-username-country-US:[email protected]:8080',
    'https': 'http://your-username-country-US:[email protected]:8080',
}

# Amazon checks header order — this matches real Chrome
HEADERS = {
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8',
    'Accept-Encoding': 'gzip, deflate, br',
    'Accept-Language': 'en-US,en;q=0.9',
    'Cache-Control': 'max-age=0',
    'Connection': 'keep-alive',
    'Host': 'www.amazon.com',
    'Sec-Ch-Ua': '"Chromium";v="131", "Not_A Brand";v="24"',
    'Sec-Ch-Ua-Mobile': '?0',
    'Sec-Ch-Ua-Platform': '"Windows"',
    'Sec-Fetch-Dest': 'document',
    'Sec-Fetch-Mode': 'navigate',
    'Sec-Fetch-Site': 'none',
    'Sec-Fetch-User': '?1',
    'Upgrade-Insecure-Requests': '1',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
}


def scrape_product(asin):
    """Scrape a single Amazon product page by ASIN."""
    url = f'https://www.amazon.com/dp/{asin}'

    try:
        response = requests.get(url, headers=HEADERS, proxies=PROXY, timeout=15)

        if response.status_code == 503:
            return {'asin': asin, 'error': 'captcha_page'}
        if response.status_code != 200:
            return {'asin': asin, 'error': f'status_{response.status_code}'}

        soup = BeautifulSoup(response.text, 'html.parser')

        # Check for CAPTCHA soft block
        if soup.find('form', {'action': '/errors/validateCaptcha'}):
            return {'asin': asin, 'error': 'captcha_page'}

        # Extract price — Amazon uses multiple price containers
        price = None
        price_whole = soup.select_one('.a-price-whole')
        price_fraction = soup.select_one('.a-price-fraction')
        if price_whole:
            whole = price_whole.get_text(strip=True).rstrip('.')
            fraction = price_fraction.get_text(strip=True) if price_fraction else '00'
            price = f'{whole}.{fraction}'

        title = soup.select_one('#productTitle')
        title_text = title.get_text(strip=True) if title else None

        rating = soup.select_one('#acrPopover')
        rating_text = rating.get('title', '').split()[0] if rating else None

        availability = soup.select_one('#availability span')
        in_stock = availability and 'in stock' in availability.get_text(strip=True).lower() if availability else None

        return {
            'asin': asin,
            'title': title_text,
            'price': price,
            'rating': rating_text,
            'in_stock': in_stock,
        }

    except requests.exceptions.RequestException as e:
        return {'asin': asin, 'error': str(e)}


# Scrape with delays
asins = ['B0D1XD1ZV3', 'B0BSHF7WHW', 'B0C8PSRWFM']
results = []
for asin in asins:
    result = scrape_product(asin)
    results.append(result)
    print(result)
    time.sleep(random.uniform(2, 5))  # 2-5 second delay between requests

Why Country Targeting Matters

Amazon serves different prices and availability by region. A US IP sees US pricing; a UK IP sees UK pricing. With ProxyLabs, append -country-US or -country-GB to your username to control which regional store you're hitting. If you scrape amazon.com with a German IP, you'll get redirected to amazon.de — wasting bandwidth and getting wrong data.

Method 2: Browser-Based Scraping (JS-Heavy Pages)

Some Amazon pages (search results, deal pages, recommendation sections) require JavaScript. For these, use Playwright or Puppeteer with residential proxies.

from playwright.sync_api import sync_playwright
import json
import time
import random

INIT_SCRIPT = """
Object.defineProperty(navigator, 'webdriver', { get: () => undefined });
window.chrome = { runtime: {}, loadTimes: function() {}, csi: function() {} };
Object.defineProperty(navigator, 'plugins', {
  get: () => [
    { name: 'PDF Viewer', filename: 'internal-pdf-viewer' },
    { name: 'Chrome PDF Viewer', filename: 'internal-pdf-viewer' },
  ],
});
Object.defineProperty(navigator, 'languages', { get: () => ['en-US', 'en'] });
"""


def scrape_amazon_search(keyword, country='US'):
    with sync_playwright() as p:
        browser = p.chromium.launch(
            headless=True,
            proxy={
                'server': 'http://gate.proxylabs.app:8080',
                'username': f'your-username-country-{country}',
                'password': 'your-password',
            },
            args=['--disable-blink-features=AutomationControlled'],
        )

        context = browser.new_context(
            viewport={'width': 1920, 'height': 1080},
            user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
            locale='en-US',
        )
        context.add_init_script(INIT_SCRIPT)
        page = context.new_page()

        search_url = f'https://www.amazon.com/s?k={keyword.replace(" ", "+")}'
        page.goto(search_url, wait_until='domcontentloaded', timeout=30000)

        # Wait for product grid to load
        page.wait_for_selector('[data-component-type="s-search-result"]', timeout=10000)

        products = page.query_selector_all('[data-component-type="s-search-result"]')
        results = []

        for product in products[:20]:  # First 20 results
            asin = product.get_attribute('data-asin')
            title_el = product.query_selector('h2 a span')
            price_whole = product.query_selector('.a-price-whole')
            price_fraction = product.query_selector('.a-price-fraction')

            title = title_el.inner_text() if title_el else None
            price = None
            if price_whole:
                whole = price_whole.inner_text().rstrip('.')
                frac = price_fraction.inner_text() if price_fraction else '00'
                price = f'{whole}.{frac}'

            results.append({
                'asin': asin,
                'title': title,
                'price': price,
            })

        browser.close()
        return results


results = scrape_amazon_search('wireless mouse')
for r in results:
    print(f"${r['price']:>8}  {r['asin']}  {r['title'][:60]}")

For complete Playwright anti-detection setup, see our Playwright proxy tutorial.

Request Patterns That Avoid Detection

The biggest factor after IP quality is request pattern. These numbers come from testing ~50K requests against Amazon over a week:

PatternSuccess rateNotes
Sequential, 1 req/sec, same IP35%Too fast, same IP = flagged quickly
Sequential, 3-7 sec delay, rotating IP85%Good baseline
Sequential, 3-7 sec delay, rotating + geo-match92%Country matches Amazon domain
Concurrent (5 threads), rotating IP, 2-5 sec delay88%Slight penalty for concurrency
Browser-based, 5-10 sec, rotating IP94%Best rate but slowest

The sweet spot for most price monitoring jobs: requests with rotating IPs, 3-7 second delays, geo-matched country, and proper Chrome headers. This gives ~90% success at around 10-15 pages per minute — enough to monitor thousands of ASINs daily.

Handling CAPTCHAs and Blocks

When Amazon returns a CAPTCHA, don't retry immediately with a new IP. Amazon tracks CAPTCHA-trigger patterns, and rapid retries from different IPs on the same URL sequence actually increases your block rate on subsequent requests.

import time
import random

def scrape_with_backoff(asin, max_retries=3):
    for attempt in range(max_retries):
        result = scrape_product(asin)

        if 'error' not in result:
            return result

        if result['error'] == 'captcha_page':
            # Exponential backoff: 30s, 60s, 120s
            wait = 30 * (2 ** attempt) + random.uniform(0, 10)
            print(f"CAPTCHA on {asin}, waiting {wait:.0f}s before retry")
            time.sleep(wait)
        else:
            # Non-CAPTCHA error — retry faster
            time.sleep(random.uniform(2, 5))

    return {'asin': asin, 'error': 'max_retries_exceeded'}

Production-Scale Architecture

For monitoring 10K+ ASINs daily:

  1. Queue: Redis or SQS to distribute ASINs across workers
  2. Workers: 3-5 worker processes, each running sequential requests with requests library
  3. Proxy config: Rotating US residential IPs via ProxyLabs (no sticky session needed — each request is independent)
  4. Rate: ~10 requests/minute per worker = 50 requests/minute total = ~3K pages/hour
  5. Storage: PostgreSQL for price history, with daily aggregation
  6. Bandwidth: Average Amazon product page is ~500KB. 10K pages/day = ~5GB/day

At ProxyLabs' 100GB tier (£2.50/GB), that's ~£12.50/day or ~£375/month for comprehensive daily price monitoring of 10K products. Significantly cheaper than commercial Amazon scraping APIs that charge $50-200/month for 10K requests.

For more on building price monitoring systems, see our ecommerce price monitoring case study and general scraping guide.

Ready to try the fastest residential proxies?

Join developers and businesses who trust ProxyLabs for mission-critical proxy infrastructure.

~200ms responseBest anti-bot bypass£2.50/GB
Start Building NowNo subscription required
amazonweb scrapingresidential proxiespythonprice monitoringecommerce
JL
James Liu
Lead Engineer @ ProxyLabs

Building proxy infrastructure since 2019. Previously failed at many things, now failing slightly less.

Found this helpful? Share it with others.

Share