Skip to content
Cascading Labs QScrape VoidCrawl Yosoi

Browser Pool

The BrowserPool is VoidCrawl’s primary interface for concurrent browser automation. It pre-opens tabs across one or more Chrome processes and recycles them instead of closing and reopening. After the initial warmup, acquiring a tab is near-instant.

Cold Start vs. Tab Reuse

The cold start problem

Launching a new Chrome instance and opening a tab takes 1-3 seconds. If you’re scraping thousands of URLs, that startup cost dominates your runtime.

How the pool solves it

BrowserPoolSession 0port 9222Session 1port 9223···Ready Queue (deque)[ Tab0, Tab1, Tab2, Tab3, ... ]
  1. Warmup: Pre-open tabs_per_browser x browsers blank tabs and push them to the ready queue.
  2. Acquire: Decrement the semaphore (blocks if all tabs are busy), pop a tab from the queue. If use_count >= tab_max_uses, hard-recycle (close and reopen).
  3. Release: Increment use_count, push the tab back to the queue, release the semaphore permit. No CDP call — the next acquire caller’s navigate() overwrites the previous page content.
  4. Eviction (background): Tabs idle longer than tab_max_idle_secs are closed and replaced with fresh ones.

Configuration

Via constructor

from voidcrawl import BrowserConfig, BrowserPool, PoolConfig
config = PoolConfig(
browsers=2,
tabs_per_browser=4,
tab_max_uses=50,
tab_max_idle_secs=60,
browser=BrowserConfig(headless=True, stealth=True),
)
async with BrowserPool(config) as pool:
async with pool.acquire() as tab:
await tab.goto("https://qscrape.dev")
...

Via environment variables

from voidcrawl import BrowserPool, PoolConfig
# PoolConfig.from_env() reads all config from env vars
config = PoolConfig.from_env()
async with BrowserPool(config) as pool:
...
VariableDefaultDescription
CHROME_WS_URLSComma-separated ws:// or http:// URLs. If set, connect mode (skip launching).
BROWSER_COUNT1Number of Chrome processes to launch.
TABS_PER_BROWSER4Idle tabs pre-opened per browser.
TAB_MAX_USES50Hard-recycle a tab after this many uses.
TAB_MAX_IDLE_SECS60Evict idle tabs after this many seconds.
ACQUIRE_TIMEOUT_SECS30Max seconds acquire() waits when all tabs are busy.
AUTO_EVICT1Set to "0" to disable background idle eviction.
CHROME_NO_SANDBOXSet to "1" to pass --no-sandbox.
CHROME_HEADLESS1Set to "0" for headful mode.
SCALE_PROFILE"minimal", "balanced", or "advanced" — overrides all other sizing vars.

Parallel Fetching

The pool’s semaphore naturally limits concurrency. Use asyncio.gather to fetch multiple URLs in parallel:

import asyncio
from voidcrawl import BrowserPool, PageResponse, PoolConfig
async def main():
async with BrowserPool(PoolConfig(tabs_per_browser=4)) as pool:
async def fetch(url: str) -> PageResponse:
async with pool.acquire() as tab:
return await tab.goto(url)
urls = ["https://qscrape.dev"] * 4
results = await asyncio.gather(*[fetch(u) for u in urls])
for resp in results:
print(f" {resp.status_code} -- {len(resp.html)} chars")
asyncio.run(main())

goto() returns a PageResponse with HTML, final URL, HTTP status code, and redirect info — see the Cookbook for details.

If all 4 tabs are busy, the 5th acquire() call blocks until one is released. No overload, no crashes.

Docker Integration

In production, Chrome runs as a persistent daemon managed by supervisord:

supervisord
+-- chrome-debug-1 (port 9222, --user-data-dir=/tmp/chrome-profile-1)
+-- chrome-debug-2 (port 9223, --user-data-dir=/tmp/chrome-profile-2)

The pool connects via CHROME_WS_URLS instead of launching Chrome itself:

config = PoolConfig(
chrome_ws_urls=["http://localhost:9222", "http://localhost:9223"],
tabs_per_browser=4,
)

Separate user-data-dirs prevent SingletonLock conflicts between Chrome instances.

FAQs

How many tabs should I run per browser?

Start with 4. Each tab consumes ~100-200 MB of memory. More tabs mean more concurrency but higher memory usage. Monitor your system and adjust.

What happens when a tab crashes?

The pool detects the crashed tab on the next acquire attempt and replaces it with a fresh one. The acquire call may take slightly longer (cold start for the replacement tab), but subsequent acquires are instant again.

Should I use multiple browsers or multiple tabs?

Multiple tabs on one browser share a process and memory. Multiple browsers are fully isolated. Use multiple browsers when you need process-level isolation (e.g., different proxy configs) or when a single Chrome process is hitting memory limits.

Why not close and reopen tabs instead of recycling?

Closing a CDP tab and opening a new one takes ~200-500ms. Navigating an existing tab to a new URL is nearly instant because the tab’s resources (V8 isolate, render process) are already allocated.