Skip to content
Cascading Labs QScrape VoidCrawl Yosoi

Example: Docker Headful

Connect VoidCrawl to Chrome instances running in the Docker headful container (Sway + wayvnc + GPU). Watch everything Chrome does via VNC.

Setup

Start the headful Docker container first:

./docker/run-headful.sh # auto-detects your GPU
# or: ./docker/run-headful.sh --gpu amd

Then run the script. Open http://localhost:6080 in your browser and click Connect to watch Chrome in real time.

Code

import asyncio
from voidcrawl import BrowserConfig, BrowserPool, PoolConfig
async def main() -> None:
config = PoolConfig(
chrome_ws_urls=[
"http://localhost:19222",
"http://localhost:19223",
],
tabs_per_browser=2,
browser=BrowserConfig(headless=False),
)
async with BrowserPool(config) as pool:
# -- Basic navigation --
async with pool.acquire() as tab:
resp = await tab.goto(
"https://en.wikipedia.org/wiki/Web_scraping",
timeout=30.0,
)
print(f"Status: {resp.status_code}, redirected: {resp.redirected}")
title = await tab.title()
print(f"Title: {title}")
print(f"HTML: {len(resp.html):,} chars")
# DOM queries
headings = await tab.query_selector_all("#toc li a")
print(f"Table of contents entries: {len(headings)}")
for h in headings[:5]:
print(f" - {h}")
# Screenshot
png_bytes = await tab.screenshot_png()
print(f"Screenshot: {len(png_bytes):,} bytes")
# JavaScript evaluation
link_count = await tab.evaluate_js(
'document.querySelectorAll("a").length'
)
print(f"Links on page: {link_count}")
# -- Parallel fetch (watch both tabs in VNC!) --
print("\nParallel fetch...")
async def fetch(url: str) -> tuple[str, int]:
async with pool.acquire() as tab:
await tab.goto(url)
t = await tab.title()
length = len(await tab.content())
return t or "(no title)", length
results = await asyncio.gather(
fetch("https://en.wikipedia.org/wiki/Web_scraping"),
fetch(
"https://en.wikipedia.org/"
"wiki/Rust_(programming_language)"
),
)
for t, length in results:
print(f" {t}: {length:,} chars")
print("\nDone! The Docker container is still running.")
print("Connect VNC to localhost:5900 to see the Chrome windows.")
if __name__ == "__main__":
asyncio.run(main())

Key Points

  • chrome_ws_urls tells the pool to connect to existing Chrome instances instead of launching new ones.
  • headless=False in BrowserConfig is required when connecting to headful Chrome.
  • goto() combines navigate() + wait_for_network_idle() in one call and returns a PageResponse with HTML, final URL, HTTP status code, and redirect info.
  • The parallel fetch demonstrates two tabs working simultaneously — visible in VNC as two windows navigating at once.

See the Docker & VNC guide for the full setup.