Browser Pool

The BrowserPool is VoidCrawl’s primary interface for concurrent browser automation. It pre-opens tabs across one or more Chrome processes and recycles them instead of closing and reopening. After the initial warmup, acquiring a tab is near-instant.

Cold Start vs. Tab Reuse

The cold start problem

Launching a new Chrome instance and opening a tab takes 1-3 seconds. If you’re scraping thousands of URLs, that startup cost dominates your runtime.

How the pool solves it

BrowserPoolSession 0port 9222Session 1port 9223···Ready Queue (deque)[ Tab0, Tab1, Tab2, Tab3, ... ]

Warmup: Pre-open tabs_per_browser x browsers blank tabs and push them to the ready queue.
Acquire: Decrement the semaphore (blocks if all tabs are busy), pop a tab from the queue. If use_count >= tab_max_uses, hard-recycle (close and reopen).
Release: Increment use_count, push the tab back to the queue, release the semaphore permit. No CDP call — the next acquire caller’s navigate() overwrites the previous page content.
Eviction (background): Tabs idle longer than tab_max_idle_secs are closed and replaced with fresh ones.

Configuration

Via constructor

from voidcrawl import BrowserConfig, BrowserPool, PoolConfig

config = PoolConfig(
    browsers=2,
    tabs_per_browser=4,
    tab_max_uses=50,
    tab_max_idle_secs=60,
    browser=BrowserConfig(headless=True, stealth=True),
)

async with BrowserPool(config) as pool:
    async with pool.acquire() as tab:
        await tab.goto("https://qscrape.dev")

...

Via environment variables

from voidcrawl import BrowserPool, PoolConfig

# PoolConfig.from_env() reads all config from env vars
config = PoolConfig.from_env()

async with BrowserPool(config) as pool:
    ...

Variable	Default	Description
`CHROME_WS_URLS`	—	Comma-separated `ws://` or `http://` URLs. If set, connect mode (skip launching).
`BROWSER_COUNT`	`1`	Number of Chrome processes to launch.
`TABS_PER_BROWSER`	`4`	Idle tabs pre-opened per browser.
`TAB_MAX_USES`	`50`	Hard-recycle a tab after this many uses.
`TAB_MAX_IDLE_SECS`	`60`	Evict idle tabs after this many seconds.
`ACQUIRE_TIMEOUT_SECS`	`30`	Max seconds `acquire()` waits when all tabs are busy.
`AUTO_EVICT`	`1`	Set to `"0"` to disable background idle eviction.
`CHROME_NO_SANDBOX`	—	Set to `"1"` to pass `--no-sandbox`.
`CHROME_HEADLESS`	`1`	Set to `"0"` for headful mode.
`SCALE_PROFILE`	—	`"minimal"`, `"balanced"`, or `"advanced"` — overrides all other sizing vars.

Parallel Fetching

The pool’s semaphore naturally limits concurrency. Use asyncio.gather to fetch multiple URLs in parallel:

import asyncio
from voidcrawl import BrowserPool, PageResponse, PoolConfig

async def main():
    async with BrowserPool(PoolConfig(tabs_per_browser=4)) as pool:
        async def fetch(url: str) -> PageResponse:
            async with pool.acquire() as tab:
                return await tab.goto(url)

        urls = ["https://qscrape.dev"] * 4
        results = await asyncio.gather(*[fetch(u) for u in urls])
        for resp in results:
            print(f"  {resp.status_code} -- {len(resp.html)} chars")

asyncio.run(main())

goto() returns a PageResponse with HTML, final URL, HTTP status code, and redirect info — see the Cookbook for details.

If all 4 tabs are busy, the 5th acquire() call blocks until one is released. No overload, no crashes.

Docker Integration

In production, Chrome runs as a persistent daemon managed by supervisord:

supervisord
+-- chrome-debug-1 (port 9222, --user-data-dir=/tmp/chrome-profile-1)
+-- chrome-debug-2 (port 9223, --user-data-dir=/tmp/chrome-profile-2)

The pool connects via CHROME_WS_URLS instead of launching Chrome itself:

config = PoolConfig(
    chrome_ws_urls=["http://localhost:9222", "http://localhost:9223"],
    tabs_per_browser=4,
)

Separate user-data-dirs prevent SingletonLock conflicts between Chrome instances.

FAQs

How many tabs should I run per browser?

Start with 4. Each tab consumes ~100-200 MB of memory. More tabs mean more concurrency but higher memory usage. Monitor your system and adjust.

What happens when a tab crashes?

The pool detects the crashed tab on the next acquire attempt and replaces it with a fresh one. The acquire call may take slightly longer (cold start for the replacement tab), but subsequent acquires are instant again.

Should I use multiple browsers or multiple tabs?

Multiple tabs on one browser share a process and memory. Multiple browsers are fully isolated. Use multiple browsers when you need process-level isolation (e.g., different proxy configs) or when a single Chrome process is hitting memory limits.

Why not close and reopen tabs instead of recycling?

Closing a CDP tab and opening a new one takes ~200-500ms. Navigating an existing tab to a new URL is nearly instant because the tab’s resources (V8 isolate, render process) are already allocated.