Skip to content
Cascading Labs QScrape VoidCrawl Yosoi

Quick Start

Install

uv add voidcrawl

Make sure Chrome or Chromium is installed on your system.

The pool pre-opens tabs and recycles them, giving near-instant page loads after the first warmup:

import asyncio
from voidcrawl import BrowserPool, PoolConfig
async def main():
async with BrowserPool(PoolConfig()) as pool:
async with pool.acquire() as tab:
await tab.goto("https://qscrape.dev")
print(await tab.title()) # "qScrape"
print(len(await tab.content()))
asyncio.run(main())

Key points:

  • PoolConfig() uses sensible defaults (1 browser, 4 tabs). Configure via constructor args or env vars.
  • pool.acquire() returns a PooledTab — use it like a Page. The context manager auto-releases it back to the pool.
  • Tabs are recycled (navigated to about:blank) rather than closed, making subsequent acquires near-instant.

Option 2: BrowserSession (low-level)

For direct browser control without pooling:

import asyncio
from voidcrawl import BrowserConfig, BrowserSession
async def main():
async with BrowserSession(BrowserConfig()) as session:
page = await session.new_page("https://qscrape.dev")
print(await page.title()) # "qScrape"
print(len(await page.content()))
await page.close()
asyncio.run(main())

Option 3: Docker

For production, Chrome runs as a persistent daemon in Docker with pre-warmed profiles:

cd docker
docker compose up -d

The pool connects to Chrome via CHROME_WS_URLS instead of launching it:

export CHROME_WS_URLS="http://localhost:9222,http://localhost:9223"
python your_script.py

See Docker & VNC for the full guide.

Try it with QScrape

QScrape provides purpose-built fictional websites for testing scrapers. Try VoidCrawl against a QScrape target:

import asyncio
from voidcrawl import BrowserPool, PoolConfig
async def main():
async with BrowserPool(PoolConfig()) as pool:
async with pool.acquire() as tab:
await tab.goto(
"https://qscrape.dev/l1/eshop/catalog/"
"?cat=Forge%20%26%20Smithing"
)
title = await tab.title()
print(f"Page: {title}")
# Query product names from the DOM
products = await tab.query_selector_all(".product-name")
for p in products[:5]:
print(f" - {p}")
asyncio.run(main())

What Just Happened?

  1. VoidCrawl launched a headless Chrome instance (or connected to an existing one via Docker)

  2. A tab was acquired from the pool and navigated to the target URL

  3. The page rendered with full JavaScript execution — VoidCrawl sees the live DOM, not raw HTML

  4. DOM queries extracted content using CSS selectors, just like document.querySelectorAll in a browser console

  5. The tab was released back to the pool for reuse (not closed)

Important Concepts

  • Every method on Page, PooledTab, and BrowserSession is async — always await them.
  • Both BrowserPool and BrowserSession are async context managers that ensure clean shutdown.
  • Stealth mode is on by default. Pass stealth=False to BrowserConfig to disable it.
  • goto(url) and navigate(url) are both useful but not interchangeable:
    • goto(url, timeout=30.0) navigates and waits for network idle, returning a PageResponse (HTML, final URL, status, redirect flag). Use this when you want to read content immediately after — it’s the right default for most scraping.
    • navigate(url) fires the navigation and returns immediately with None. Use it when you want to control the wait yourself (e.g. await tab.wait_for_navigation(), wait on a specific selector, or race multiple conditions). Reading content() right after navigate() without waiting may return an empty or partial page.

Next Steps

References

QScrape. Cascading Labs. Purpose-built fictional scraping targets for benchmarking and testing. https://qscrape.dev