Browser Pool
The BrowserPool is VoidCrawl’s primary interface for concurrent browser automation. It pre-opens tabs across one or more Chrome processes and recycles them instead of closing and reopening. After the initial warmup, acquiring a tab is near-instant.
Cold Start vs. Tab Reuse
The cold start problem
Launching a new Chrome instance and opening a tab takes 1-3 seconds. If you’re scraping thousands of URLs, that startup cost dominates your runtime.
How the pool solves it
- Warmup: Pre-open
tabs_per_browser x browsersblank tabs and push them to the ready queue. - Acquire: Decrement the semaphore (blocks if all tabs are busy), pop a tab from the queue. If
use_count >= tab_max_uses, hard-recycle (close and reopen). - Release: Increment
use_count, push the tab back to the queue, release the semaphore permit. No CDP call — the next acquire caller’snavigate()overwrites the previous page content. - Eviction (background): Tabs idle longer than
tab_max_idle_secsare closed and replaced with fresh ones.
Configuration
Via constructor
from voidcrawl import BrowserConfig, BrowserPool, PoolConfig
config = PoolConfig( browsers=2, tabs_per_browser=4, tab_max_uses=50, tab_max_idle_secs=60, browser=BrowserConfig(headless=True, stealth=True),)
async with BrowserPool(config) as pool: async with pool.acquire() as tab: await tab.goto("https://qscrape.dev")
...Via environment variables
from voidcrawl import BrowserPool, PoolConfig
# PoolConfig.from_env() reads all config from env varsconfig = PoolConfig.from_env()
async with BrowserPool(config) as pool: ...| Variable | Default | Description |
|---|---|---|
CHROME_WS_URLS | — | Comma-separated ws:// or http:// URLs. If set, connect mode (skip launching). |
BROWSER_COUNT | 1 | Number of Chrome processes to launch. |
TABS_PER_BROWSER | 4 | Idle tabs pre-opened per browser. |
TAB_MAX_USES | 50 | Hard-recycle a tab after this many uses. |
TAB_MAX_IDLE_SECS | 60 | Evict idle tabs after this many seconds. |
ACQUIRE_TIMEOUT_SECS | 30 | Max seconds acquire() waits when all tabs are busy. |
AUTO_EVICT | 1 | Set to "0" to disable background idle eviction. |
CHROME_NO_SANDBOX | — | Set to "1" to pass --no-sandbox. |
CHROME_HEADLESS | 1 | Set to "0" for headful mode. |
SCALE_PROFILE | — | "minimal", "balanced", or "advanced" — overrides all other sizing vars. |
Parallel Fetching
The pool’s semaphore naturally limits concurrency. Use asyncio.gather to fetch multiple URLs in parallel:
import asynciofrom voidcrawl import BrowserPool, PageResponse, PoolConfig
async def main(): async with BrowserPool(PoolConfig(tabs_per_browser=4)) as pool: async def fetch(url: str) -> PageResponse: async with pool.acquire() as tab: return await tab.goto(url)
urls = ["https://qscrape.dev"] * 4 results = await asyncio.gather(*[fetch(u) for u in urls]) for resp in results: print(f" {resp.status_code} -- {len(resp.html)} chars")
asyncio.run(main())goto() returns a PageResponse with HTML, final URL, HTTP status code, and redirect info — see the Cookbook for details.
If all 4 tabs are busy, the 5th acquire() call blocks until one is released. No overload, no crashes.
Docker Integration
In production, Chrome runs as a persistent daemon managed by supervisord:
supervisord+-- chrome-debug-1 (port 9222, --user-data-dir=/tmp/chrome-profile-1)+-- chrome-debug-2 (port 9223, --user-data-dir=/tmp/chrome-profile-2)The pool connects via CHROME_WS_URLS instead of launching Chrome itself:
config = PoolConfig( chrome_ws_urls=["http://localhost:9222", "http://localhost:9223"], tabs_per_browser=4,)Separate user-data-dirs prevent SingletonLock conflicts between Chrome instances.
FAQs
How many tabs should I run per browser?
Start with 4. Each tab consumes ~100-200 MB of memory. More tabs mean more concurrency but higher memory usage. Monitor your system and adjust.
What happens when a tab crashes?
The pool detects the crashed tab on the next acquire attempt and replaces it with a fresh one. The acquire call may take slightly longer (cold start for the replacement tab), but subsequent acquires are instant again.
Should I use multiple browsers or multiple tabs?
Multiple tabs on one browser share a process and memory. Multiple browsers are fully isolated. Use multiple browsers when you need process-level isolation (e.g., different proxy configs) or when a single Chrome process is hitting memory limits.
Why not close and reopen tabs instead of recycling?
Closing a CDP tab and opening a new one takes ~200-500ms. Navigating an existing tab to a new URL is nearly instant because the tab’s resources (V8 isolate, render process) are already allocated.