Skip to content
Cascading Labs QScrape VoidCrawl Yosoi

Anti-Bot Vendors

VoidCrawl annotates responses with the WAF, CDN, or anti-bot vendor it sees. The detector answers two questions:

  1. Which vendor is in front of this response?
  2. Is that vendor actively challenging the browser?

That distinction is the whole feature. Cloudflare serving a normal page is not a problem. Cloudflare serving a Turnstile wall is.

Presence vs Challenge

SignalMeaningAction
PresenceA vendor fronts the site. Example: server: cloudflare.Record telemetry. Do not rotate by default.
ChallengeAn active block or challenge fired. Example: Cloudflare challenge body, DataDome block, Akamai reference page.Route, escalate, rotate, or abort.

Blind retry wastes profiles and proxies. Vendor detection gives the pipeline a reason to change tactics.

Where It Appears

Python:

resp = await page.goto("https://example.com")
if resp.antibot and resp.antibot.challenged:
print(resp.antibot.challenge_vendor)

MCP:

{
"url": "https://fortress.theplumber.dev/",
"status_code": 200,
"antibot": {
"vendors": ["cloudflare"],
"challenged": true,
"challenge_vendor": "cloudflare",
"corpus_version": "cl-2026.06.01",
"evidence": "body"
}
}

The MCP field appears on fetch, fetch_many, and session_navigate when a vendor is detected.

Evidence Tiers

VoidCrawl scans in two tiers.

TierWhat it readsWhy
HeadersStatus and response headers.Cheap and high signal for CDN and WAF presence.
Body prefixFirst 64 KiB of HTML.Catches 200 responses that cloak the challenge in the body.

The verdict includes evidence: "headers" or evidence: "body" so callers know which tier matched.

Vendor Coverage

The corpus covers the vendors VoidCrawl has seen in real work:

  • Cloudflare and Turnstile;
  • DataDome;
  • Akamai;
  • Imperva and Incapsula;
  • PerimeterX / HUMAN;
  • Kasada;
  • AWS WAF;
  • F5 BigIP;
  • Sucuri;
  • CloudFront;
  • reCAPTCHA and hCaptcha markers.

The rules are small, reviewed, and versioned. Record corpus_version with captures. A verdict is a captured fact, not something to recompute later against a newer rule set.

Routing Policy

Suggested routing:

VerdictFirst move
cloudflare challengedHeadful plus warm profile.
datadome challengedRotate to a cleaner proxy.
perimeterx or kasada challengedHeadful, slower actions, warm profile, then proxy rotation.
Presence onlyContinue. Do not rotate.

This page documents the detection layer. Your crawler owns the routing policy.

FAQs

Is vendor presence the same as being blocked?

No. Presence means a vendor fronts the site. A challenge means the vendor is actively walling the response.

Where does the verdict appear?

Python returns PageResponse.antibot. MCP returns antibot on fetch, fetch_many, and session_navigate when a vendor is detected.

Why scan the body if headers are available?

Some challenge pages return status 200 and hide the wall in the HTML. VoidCrawl scans a bounded body prefix to catch that case.

What should I do with Cloudflare challenged=true?

Try headful mode and a warm profile first. If that still fails, rotate IP or mark the URL uncrawlable.

Does this solve captchas?

No. This is triage. It tells your pipeline which wall is in front of the browser.