Amazon is one of the hardest sites to scrape reliably. Their anti-bot system combines device fingerprinting, behavioral analysis, IP reputation scoring, and CAPTCHA challenges in a layered stack that catches most scrapers within 20-50 requests. In 2026, a bare requests.get() call to an Amazon product page — even with a residential IP — gets blocked about 70% of the time.
This guide covers what actually works: the specific request patterns, headers, and proxy configurations that maintain a 90%+ success rate across thousands of product pages.
Amazon's Anti-Bot Stack (What You're Up Against)
| Layer | What it checks | Detection speed |
|---|---|---|
| IP reputation | Datacenter IP ranges, known proxy ASNs | Instant — first request |
| TLS fingerprint | JA3/JA4 hash of your TLS handshake | Instant — before HTTP |
| Header analysis | Missing/wrong order of headers, Accept-Language | First request |
| Behavioral signals | Request rate, navigation patterns, mouse events | 10-50 requests |
| Device fingerprint | Canvas, WebGL, fonts (browser only) | First page load |
| CAPTCHA (fallback) | Image/audio challenge when suspicion is moderate | After soft signals |
The key insight: Amazon doesn't block on any single signal. They score your session across all layers and trigger blocks when the cumulative score passes a threshold. This is why some scrapers "work sometimes" — they're hovering near the threshold and randomly tipping over it.
Method 1: requests + Residential Proxies (Simple Pages)
For product pages that don't require JavaScript rendering (most Amazon product pages serve full HTML), requests with the right headers works:
import requests
import random
import time
from bs4 import BeautifulSoup
PROXY = {
'http': 'http://your-username-country-US:[email protected]:8080',
'https': 'http://your-username-country-US:[email protected]:8080',
}
# Amazon checks header order — this matches real Chrome
HEADERS = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
'Host': 'www.amazon.com',
'Sec-Ch-Ua': '"Chromium";v="131", "Not_A Brand";v="24"',
'Sec-Ch-Ua-Mobile': '?0',
'Sec-Ch-Ua-Platform': '"Windows"',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'none',
'Sec-Fetch-User': '?1',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
}
def scrape_product(asin):
"""Scrape a single Amazon product page by ASIN."""
url = f'https://www.amazon.com/dp/{asin}'
try:
response = requests.get(url, headers=HEADERS, proxies=PROXY, timeout=15)
if response.status_code == 503:
return {'asin': asin, 'error': 'captcha_page'}
if response.status_code != 200:
return {'asin': asin, 'error': f'status_{response.status_code}'}
soup = BeautifulSoup(response.text, 'html.parser')
# Check for CAPTCHA soft block
if soup.find('form', {'action': '/errors/validateCaptcha'}):
return {'asin': asin, 'error': 'captcha_page'}
# Extract price — Amazon uses multiple price containers
price = None
price_whole = soup.select_one('.a-price-whole')
price_fraction = soup.select_one('.a-price-fraction')
if price_whole:
whole = price_whole.get_text(strip=True).rstrip('.')
fraction = price_fraction.get_text(strip=True) if price_fraction else '00'
price = f'{whole}.{fraction}'
title = soup.select_one('#productTitle')
title_text = title.get_text(strip=True) if title else None
rating = soup.select_one('#acrPopover')
rating_text = rating.get('title', '').split()[0] if rating else None
availability = soup.select_one('#availability span')
in_stock = availability and 'in stock' in availability.get_text(strip=True).lower() if availability else None
return {
'asin': asin,
'title': title_text,
'price': price,
'rating': rating_text,
'in_stock': in_stock,
}
except requests.exceptions.RequestException as e:
return {'asin': asin, 'error': str(e)}
# Scrape with delays
asins = ['B0D1XD1ZV3', 'B0BSHF7WHW', 'B0C8PSRWFM']
results = []
for asin in asins:
result = scrape_product(asin)
results.append(result)
print(result)
time.sleep(random.uniform(2, 5)) # 2-5 second delay between requests
Why Country Targeting Matters
Amazon serves different prices and availability by region. A US IP sees US pricing; a UK IP sees UK pricing. With ProxyLabs, append -country-US or -country-GB to your username to control which regional store you're hitting. If you scrape amazon.com with a German IP, you'll get redirected to amazon.de — wasting bandwidth and getting wrong data.
Method 2: Browser-Based Scraping (JS-Heavy Pages)
Some Amazon pages (search results, deal pages, recommendation sections) require JavaScript. For these, use Playwright or Puppeteer with residential proxies.
from playwright.sync_api import sync_playwright
import json
import time
import random
INIT_SCRIPT = """
Object.defineProperty(navigator, 'webdriver', { get: () => undefined });
window.chrome = { runtime: {}, loadTimes: function() {}, csi: function() {} };
Object.defineProperty(navigator, 'plugins', {
get: () => [
{ name: 'PDF Viewer', filename: 'internal-pdf-viewer' },
{ name: 'Chrome PDF Viewer', filename: 'internal-pdf-viewer' },
],
});
Object.defineProperty(navigator, 'languages', { get: () => ['en-US', 'en'] });
"""
def scrape_amazon_search(keyword, country='US'):
with sync_playwright() as p:
browser = p.chromium.launch(
headless=True,
proxy={
'server': 'http://gate.proxylabs.app:8080',
'username': f'your-username-country-{country}',
'password': 'your-password',
},
args=['--disable-blink-features=AutomationControlled'],
)
context = browser.new_context(
viewport={'width': 1920, 'height': 1080},
user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
locale='en-US',
)
context.add_init_script(INIT_SCRIPT)
page = context.new_page()
search_url = f'https://www.amazon.com/s?k={keyword.replace(" ", "+")}'
page.goto(search_url, wait_until='domcontentloaded', timeout=30000)
# Wait for product grid to load
page.wait_for_selector('[data-component-type="s-search-result"]', timeout=10000)
products = page.query_selector_all('[data-component-type="s-search-result"]')
results = []
for product in products[:20]: # First 20 results
asin = product.get_attribute('data-asin')
title_el = product.query_selector('h2 a span')
price_whole = product.query_selector('.a-price-whole')
price_fraction = product.query_selector('.a-price-fraction')
title = title_el.inner_text() if title_el else None
price = None
if price_whole:
whole = price_whole.inner_text().rstrip('.')
frac = price_fraction.inner_text() if price_fraction else '00'
price = f'{whole}.{frac}'
results.append({
'asin': asin,
'title': title,
'price': price,
})
browser.close()
return results
results = scrape_amazon_search('wireless mouse')
for r in results:
print(f"${r['price']:>8} {r['asin']} {r['title'][:60]}")
For complete Playwright anti-detection setup, see our Playwright proxy tutorial.
Request Patterns That Avoid Detection
The biggest factor after IP quality is request pattern. These numbers come from testing ~50K requests against Amazon over a week:
| Pattern | Success rate | Notes |
|---|---|---|
| Sequential, 1 req/sec, same IP | 35% | Too fast, same IP = flagged quickly |
| Sequential, 3-7 sec delay, rotating IP | 85% | Good baseline |
| Sequential, 3-7 sec delay, rotating + geo-match | 92% | Country matches Amazon domain |
| Concurrent (5 threads), rotating IP, 2-5 sec delay | 88% | Slight penalty for concurrency |
| Browser-based, 5-10 sec, rotating IP | 94% | Best rate but slowest |
The sweet spot for most price monitoring jobs: requests with rotating IPs, 3-7 second delays, geo-matched country, and proper Chrome headers. This gives ~90% success at around 10-15 pages per minute — enough to monitor thousands of ASINs daily.
Handling CAPTCHAs and Blocks
When Amazon returns a CAPTCHA, don't retry immediately with a new IP. Amazon tracks CAPTCHA-trigger patterns, and rapid retries from different IPs on the same URL sequence actually increases your block rate on subsequent requests.
import time
import random
def scrape_with_backoff(asin, max_retries=3):
for attempt in range(max_retries):
result = scrape_product(asin)
if 'error' not in result:
return result
if result['error'] == 'captcha_page':
# Exponential backoff: 30s, 60s, 120s
wait = 30 * (2 ** attempt) + random.uniform(0, 10)
print(f"CAPTCHA on {asin}, waiting {wait:.0f}s before retry")
time.sleep(wait)
else:
# Non-CAPTCHA error — retry faster
time.sleep(random.uniform(2, 5))
return {'asin': asin, 'error': 'max_retries_exceeded'}
Production-Scale Architecture
For monitoring 10K+ ASINs daily:
- Queue: Redis or SQS to distribute ASINs across workers
- Workers: 3-5 worker processes, each running sequential requests with
requestslibrary - Proxy config: Rotating US residential IPs via ProxyLabs (no sticky session needed — each request is independent)
- Rate: ~10 requests/minute per worker = 50 requests/minute total = ~3K pages/hour
- Storage: PostgreSQL for price history, with daily aggregation
- Bandwidth: Average Amazon product page is ~500KB. 10K pages/day = ~5GB/day
At ProxyLabs' 100GB tier (£2.50/GB), that's ~£12.50/day or ~£375/month for comprehensive daily price monitoring of 10K products. Significantly cheaper than commercial Amazon scraping APIs that charge $50-200/month for 10K requests.
For more on building price monitoring systems, see our ecommerce price monitoring case study and general scraping guide.
Ready to try the fastest residential proxies?
Join developers and businesses who trust ProxyLabs for mission-critical proxy infrastructure.
Building proxy infrastructure since 2019. Previously failed at many things, now failing slightly less.
Related Articles
Residential Proxies for SEO & SERP Monitoring
How to use residential proxies for accurate SERP tracking, rank monitoring, and SEO audits. Covers geo-targeting, avoiding personalization bias, and code examples.
8 min readScrapy Proxy Middleware: Rotating IPs Without Bans
Build a Scrapy proxy middleware that rotates residential IPs per request. Includes retry logic, geo-targeting, bandwidth tracking, and ban detection.
6 min readContinue exploring
Implementation guides for requests, Scrapy, Axios, Puppeteer, and more.
See how residential proxies fit large-scale scraping workflows.
Evaluate ProxyLabs against Bright Data, Oxylabs, Smartproxy, and others.
Browse location coverage and targeting options across 195+ countries.