Skip to content
Cascading Labs QScrape VoidCrawl Yosoi

QScrape

QScrape is a web scraper evaluation suite that provides a collection of fictional test websites across multiple difficulty levels, designed for benchmarking and testing the capabilities of web scrapers.

Built as a side project for Yosoi, QScrape gives you realistic targets to validate your scraping pipelines against, without hitting real sites.

Difficulty Levels

L1: Static HTML/CSS/JS

Standard sites with no dynamic data and no web frameworks. Currently available:

L2: Modern Frameworks

Sites built with Svelte , React , Vue , and Solid . Each framework leaves a distinct fingerprint in the rendered HTML — compiled class hashing, virtual DOM diffing, reactive proxy rendering, and fine-grained signal-based updates respectively — covering the DOM-level complexity typical of real-world SPAs.

L3: Anti-Bot

Sites that actively resist automated access. Techniques applied include rate limiting, request-header fingerprinting, session-cookie enforcement, and obfuscated markup. Captchas and challenge puzzles are excluded. Use these to test how your scraping pipeline holds up before hitting live targets.

Using QScrape with Yosoi

QScrape sites are ideal targets for testing Yosoi’s selector discovery. Point Yosoi at any QScrape site to validate that discovered selectors correctly extract the expected content from a known, stable source.

FAQs

Are QScrape sites suitable for production scraping?

No. They are fictional test targets designed for evaluation and benchmarking, not for collecting real data.

Do QScrape sites ever change their structure?

Occasionally, when new difficulty variants or layout scenarios are added. Check the GitHub changelog before relying on a specific selector set in CI.

Can I add QScrape to my automated test suite?

Yes. Because the sites are stable and purpose-built, they work well as regression targets. Run Yosoi against a QScrape URL in your tests and assert on the extracted field values.

What is the difference between L2 and L3 sites?

L2 sites add framework complexity (React, Svelte, etc.) but make no attempt to block scrapers. L3 sites actively apply anti-bot techniques such as rate limiting or obfuscated markup.

References

Svelte. Rich Harris. Compiler-based frontend framework. https://svelte.dev/

React. Meta. JavaScript library for building user interfaces. https://react.dev/

Vue. Evan You. Progressive JavaScript framework for building UIs. https://vuejs.org/

Solid. Ryan Carniato. Fine-grained reactive UI library with no virtual DOM. https://www.solidjs.com/