All articles
pythonrequestsweb scraping

Web Scraping with Python Requests and Proxies: Complete Guide

JL
James Liu
Lead Engineer @ ProxyLabs
March 14, 2026
7 min read
Share

Python's requests library is still the most common HTTP client for web scraping in 2026. It's simple, well-documented, and fast enough for most use cases. But proxy configuration in requests has some sharp edges that catch people off guard — silent failures, authentication quirks, and session handling that doesn't work the way you'd expect.

This guide covers everything from basic setup to production patterns. If you already know how to set a proxy in requests, skip to the session management section.

Basic Proxy Configuration

import requests

proxies = {
    'http': 'http://your-username:[email protected]:8080',
    'https': 'http://your-username:[email protected]:8080',
}

response = requests.get('https://httpbin.org/ip', proxies=proxies, timeout=15)
print(response.json())
# {'origin': '74.125.xxx.xxx'}  — residential IP, not yours

Three things to note:

  1. Both http and https keys use http:// — This is the most common mistake. The key (https) specifies which requests use this proxy. The value (http://...) specifies the proxy protocol. Your proxy server accepts HTTP CONNECT for HTTPS tunneling, so the proxy URL itself is http://.

  2. Always set timeout — Without it, a dead proxy hangs your script forever. 15 seconds is reasonable for residential proxies; they occasionally take 2-5 seconds on first connection.

  3. requests doesn't verify the proxy is working — If authentication fails, you'll get an HTTP 407 back, but requests doesn't raise an exception for it by default. Check response.status_code.

Authentication Methods

URL-embedded credentials (simplest)

proxies = {
    'http': 'http://your-username:[email protected]:8080',
    'https': 'http://your-username:[email protected]:8080',
}

HTTPProxyAuth (when passwords contain special characters)

If your password contains @, :, or other URL-special characters, URL-embedding breaks. Use HTTPProxyAuth instead:

from requests.auth import HTTPProxyAuth

proxies = {
    'http': 'http://gate.proxylabs.app:8080',
    'https': 'http://gate.proxylabs.app:8080',
}
auth = HTTPProxyAuth('your-username', 'your-password')

response = requests.get('https://httpbin.org/ip', proxies=proxies, auth=auth, timeout=15)

Gotcha: HTTPProxyAuth and regular auth parameter share the same header mechanism. If the target site also requires HTTP Basic auth, you'll have a conflict. In that case, use URL-embedded proxy credentials and auth= for the target site.

Session Management and IP Rotation

A bare requests.get() with proxies creates a new connection each time. For rotating proxies this is fine — you get a new IP per request. But for sticky sessions (same IP across multiple requests), you need to pair requests.Session() with session-tagged proxy credentials.

Rotating IPs (new IP per request)

import requests

def get_rotating_proxy():
    return {
        'http': 'http://your-username:[email protected]:8080',
        'https': 'http://your-username:[email protected]:8080',
    }

# Each request gets a different IP
for url in urls:
    response = requests.get(url, proxies=get_rotating_proxy(), timeout=15)

Sticky Sessions (same IP for multiple requests)

import requests
import uuid

def create_sticky_session(country=None):
    session_id = uuid.uuid4().hex[:8]
    username = f'your-username-session-{session_id}'
    if country:
        username += f'-country-{country}'

    session = requests.Session()
    session.proxies = {
        'http': f'http://{username}:[email protected]:8080',
        'https': f'http://{username}:[email protected]:8080',
    }
    session.headers.update({
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
    })
    return session

# All requests in this session use the same IP
session = create_sticky_session(country='US')
page1 = session.get('https://example.com/page/1', timeout=15)
page2 = session.get('https://example.com/page/2', timeout=15)
page3 = session.get('https://example.com/page/3', timeout=15)

The session ID in the username (-session-abc123) tells the proxy gateway to route all requests with that ID through the same residential IP. The IP stays allocated for up to 30 minutes of inactivity. For details on how sticky sessions work under the hood, see our sticky sessions guide.

Retry Logic That Doesn't Waste Bandwidth

Naive retry logic re-sends the request through the same broken proxy. With rotating proxies, each retry naturally gets a new IP — but you should still handle the common failure modes explicitly.

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
import time
import random

class ProxyRetrySession:
    def __init__(self, proxy_username, proxy_password, max_retries=3):
        self.proxy_username = proxy_username
        self.proxy_password = proxy_password
        self.max_retries = max_retries

    def _make_proxy(self, country=None, session_id=None):
        username = self.proxy_username
        if session_id:
            username += f'-session-{session_id}'
        if country:
            username += f'-country-{country}'
        proxy_url = f'http://{username}:{self.proxy_password}@gate.proxylabs.app:8080'
        return {'http': proxy_url, 'https': proxy_url}

    def get(self, url, country=None, **kwargs):
        kwargs.setdefault('timeout', 15)

        for attempt in range(self.max_retries):
            try:
                proxy = self._make_proxy(country=country)
                response = requests.get(url, proxies=proxy, **kwargs)

                if response.status_code == 407:
                    raise Exception("Proxy authentication failed — check credentials")
                if response.status_code == 502:
                    # Bad gateway — proxy peer disconnected. Retry with new IP.
                    time.sleep(random.uniform(1, 3))
                    continue
                if response.status_code == 429:
                    # Rate limited by target. Back off significantly.
                    wait = min(2 ** attempt * 5 + random.uniform(0, 5), 60)
                    time.sleep(wait)
                    continue

                return response

            except requests.exceptions.ProxyError:
                # Proxy connection failed. New IP on retry.
                time.sleep(random.uniform(1, 3))
                continue
            except requests.exceptions.ConnectTimeout:
                # Proxy or target too slow.
                continue
            except requests.exceptions.ReadTimeout:
                # Connected but response took too long.
                continue

        return None  # All retries exhausted

Using urllib3 Retry with Proxies

The built-in Retry adapter works, but be aware: it retries through the same proxy connection. With rotating proxies this still gets you a new IP (the gateway handles rotation), but with sticky sessions you'll retry on the same IP that already failed.

session = requests.Session()
retries = Retry(total=3, backoff_factor=1, status_forcelist=[500, 502, 503])
adapter = HTTPAdapter(max_retries=retries)
session.mount('http://', adapter)
session.mount('https://', adapter)
session.proxies = {
    'http': 'http://your-username:[email protected]:8080',
    'https': 'http://your-username:[email protected]:8080',
}

Concurrent Scraping with Proxies

requests is synchronous. For concurrent scraping, use concurrent.futures or switch to aiohttp. Here's the requests approach:

from concurrent.futures import ThreadPoolExecutor, as_completed
import requests

def fetch(url, proxy):
    try:
        response = requests.get(url, proxies=proxy, timeout=15, headers={
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
        })
        return url, response.status_code, len(response.content)
    except Exception as e:
        return url, 0, str(e)

proxy = {
    'http': 'http://your-username:[email protected]:8080',
    'https': 'http://your-username:[email protected]:8080',
}

urls = ['https://example.com/page/{}'.format(i) for i in range(100)]

with ThreadPoolExecutor(max_workers=10) as executor:
    futures = {executor.submit(fetch, url, proxy): url for url in urls}
    for future in as_completed(futures):
        url, status, size = future.result()
        print(f'{url}: {status} ({size})')

With 10 concurrent workers and rotating proxies, each request goes through a different IP. You're making 10 simultaneous requests from 10 different residential IPs — which is indistinguishable from 10 different people browsing the same site.

The TLS Fingerprint Problem

Here's something most proxy guides skip: requests has a detectable TLS fingerprint.

Every HTTP client establishes a TLS connection with a specific cipher suite order, extension list, and protocol version. Python's requests (via urllib3) produces a TLS fingerprint that doesn't match any browser. Anti-bot systems like Cloudflare, DataDome, and PerimeterX check this.

You can verify this yourself:

response = requests.get('https://tls.peet.ws/api/all', proxies=proxy, timeout=15)
print(response.json()['tls']['ja3_hash'])
# Will show a hash that matches Python, not Chrome

Fixes

Option 1: curl_cffi — Drop-in replacement for requests that mimics Chrome's TLS fingerprint.

from curl_cffi import requests as curl_requests

response = curl_requests.get(
    'https://example.com',
    proxies={
        'http': 'http://your-username:[email protected]:8080',
        'https': 'http://your-username:[email protected]:8080',
    },
    impersonate='chrome131',
    timeout=15,
)

Option 2: tls_client — More control over the fingerprint.

import tls_client

session = tls_client.Session(client_identifier='chrome_131')
session.proxies = {
    'http': 'http://your-username:[email protected]:8080',
    'https': 'http://your-username:[email protected]:8080',
}
response = session.get('https://example.com')

For targets without anti-bot protection (internal APIs, public data feeds, basic sites), requests is fine. For anything protected by Cloudflare or similar, use curl_cffi or a browser-based approach.

Environment Variable Configuration

For production deployments, don't hardcode proxy credentials:

import os

PROXY_USER = os.environ['PROXY_USERNAME']
PROXY_PASS = os.environ['PROXY_PASSWORD']
PROXY_HOST = os.environ.get('PROXY_HOST', 'gate.proxylabs.app')
PROXY_PORT = os.environ.get('PROXY_PORT', '8080')

def get_proxy(country=None, session_id=None):
    username = PROXY_USER
    if session_id:
        username += f'-session-{session_id}'
    if country:
        username += f'-country-{country}'
    url = f'http://{username}:{PROXY_PASS}@{PROXY_HOST}:{PROXY_PORT}'
    return {'http': url, 'https': url}

Common Mistakes

Setting only http or only https proxy — If you only set https, HTTP requests bypass the proxy entirely. Always set both.

Using requests.Session() for rotating proxies — A session object reuses TCP connections. With rotating proxies this can sometimes reuse the same peer IP within a connection pool. Set session.proxies and the gateway handles rotation, but for guaranteed rotation, use individual requests.get() calls.

Ignoring response.encoding — When scraping through proxies, some responses come back with incorrect encoding headers. Always check response.apparent_encoding if you're getting garbled text.

Not handling connection resets — Residential proxies occasionally drop connections mid-transfer (the residential peer went offline). Always wrap requests in try/except and implement retry logic.

To test your proxy configuration before deploying, use our proxy tester tool or verify your IP with the IP lookup tool. For other language integrations, see the Scrapy proxy guide or Selenium setup.

Ready to try the fastest residential proxies?

Join developers and businesses who trust ProxyLabs for mission-critical proxy infrastructure.

~200ms responseBest anti-bot bypass£2.50/GB
Start Building NowNo subscription required
pythonrequestsweb scrapingproxiesproxy rotationtutorials
JL
James Liu
Lead Engineer @ ProxyLabs

Building proxy infrastructure since 2019. Previously failed at many things, now failing slightly less.

Found this helpful? Share it with others.

Share