When a server responds to an HTTP request, it sends back raw HTML — a string of tags and text. When a browser receives that string, it parses it into a live tree of objects called the DOM△ (Document Object Model). JavaScript can then modify that tree: adding nodes, removing them, rewriting text, injecting entire components.
What you see on screen may look nothing like the original HTML the server sent.
This distinction is why tools like curl or requests are useless for scraping modern web apps. They fetch the raw HTML — the string the server sent. If the content is injected by JavaScript after the page loads, that string is a mostly-empty shell.
Server HTML:<body>
<div id="root"></div>
<script src="/static/js/main.c8f2a1d.js"></script>
</body>
VoidCrawl gives you access to the live DOM, not the raw HTML. It controls a real Chrome instance that executes JavaScript, renders the page, and exposes the resulting DOM tree for you to query and interact with.
Why You Need a Real Browser
Static HTML fetching (requests, httpx, curl) works when the content is in the server response. But modern websites increasingly rely on:
Client-side rendering — React, Vue, Svelte, and Solid apps that build the page in JavaScript
Lazy loading — content that loads as you scroll or interact
Authentication flows — login forms, OAuth redirects, session cookies
Anti-bot protections — WAFs that check for real browser behavior before serving content
Shadow DOM — encapsulated components that aren’t visible in the outer HTML
For all of these, you need a real browser that executes JavaScript, manages cookies, and renders the page. VoidCrawl provides that browser.
The Tool Landscape
VoidCrawl isn’t the only browser automation tool. Here’s how it fits in:
Playwright and Selenium are excellent general-purpose tools. But for high-volume scraping where you need to keep browsers alive, reuse tabs, and minimize overhead:
Playwright launches a new browser context per session. There’s no built-in tab pooling or long-lived daemon mode.
Selenium uses the WebDriver protocol, which has higher overhead than CDP and doesn’t support tab-level recycling.
zendriver/nodriver are async and stealth-focused (VoidCrawl’s stealth approach is inspired by them), but are pure Python with no compiled core. Also they have copy-left licenses.
VoidCrawl’s Rust core handles CDP I/O on a Tokio runtime. The BrowserPool keeps Chrome alive as a daemon and recycles tabs via a semaphore-bounded queue. The result: near-instant tab acquisition after warmup, with the stealth properties of zendriver.
When to use what
Just need to render a page once? Playwright is simpler to set up.
Need long-running concurrent scraping with tab reuse? VoidCrawl’s pool is purpose-built for this.
Need to bypass WAFs? VoidCrawl’s stealth mode handles most cases. See Stealth Mode.
Need cross-browser support (Firefox, Safari)? Use Playwright or Selenium. VoidCrawl is Chrome-only.
Page Source vs. Inspect
A quick way to check whether a site needs browser automation:
View Source (Ctrl+U / Cmd+U) shows the raw HTML the server sent. If the content you want is here, static fetching works.
Inspect (Ctrl+Shift+I / Cmd+Shift+I) shows the live DOM after JavaScript has run. If the content only appears here, you need a browser.
VoidCrawl’s tab.content() returns the live DOM — equivalent to what Inspect shows, not View Source.
FAQs
Can VoidCrawl handle single-page apps (SPAs)?
Yes. VoidCrawl runs a real Chrome instance that executes JavaScript and builds the full DOM. SPAs render normally. Use wait_for_stable_dom() to ensure the page has finished rendering before extracting content.
What about shadow DOM components?
VoidCrawl’s stealth layer forces all attachShadow calls to use mode: 'open', making shadow DOM content accessible to automation. This also enables interaction with WAF challenges like Cloudflare Turnstile.
Do I need VoidCrawl if I’m already using Yosoi?
Yosoi fetches raw HTML by default. For sites that require JavaScript rendering (L2+ difficulty), you can use VoidCrawl to render the page and pass the DOM to Yosoi. A native integration is planned.