Anti-Bot Vendors
VoidCrawl annotates responses with the WAF, CDN, or anti-bot vendor it sees. The detector answers two questions:
- Which vendor is in front of this response?
- Is that vendor actively challenging the browser?
That distinction is the whole feature. Cloudflare serving a normal page is not a problem. Cloudflare serving a Turnstile wall is.
Presence vs Challenge
| Signal | Meaning | Action |
|---|---|---|
| Presence | A vendor fronts the site. Example: server: cloudflare. | Record telemetry. Do not rotate by default. |
| Challenge | An active block or challenge fired. Example: Cloudflare challenge body, DataDome block, Akamai reference page. | Route, escalate, rotate, or abort. |
Blind retry wastes profiles and proxies. Vendor detection gives the pipeline a reason to change tactics.
Where It Appears
Python:
resp = await page.goto("https://example.com")if resp.antibot and resp.antibot.challenged: print(resp.antibot.challenge_vendor)MCP:
{ "url": "https://fortress.theplumber.dev/", "status_code": 200, "antibot": { "vendors": ["cloudflare"], "challenged": true, "challenge_vendor": "cloudflare", "corpus_version": "cl-2026.06.01", "evidence": "body" }}The MCP field appears on fetch, fetch_many, and session_navigate when a vendor is detected.
Evidence Tiers
VoidCrawl scans in two tiers.
| Tier | What it reads | Why |
|---|---|---|
| Headers | Status and response headers. | Cheap and high signal for CDN and WAF presence. |
| Body prefix | First 64 KiB of HTML. | Catches 200 responses that cloak the challenge in the body. |
The verdict includes evidence: "headers" or evidence: "body" so callers know which tier matched.
Vendor Coverage
The corpus covers the vendors VoidCrawl has seen in real work:
- Cloudflare and Turnstile;
- DataDome;
- Akamai;
- Imperva and Incapsula;
- PerimeterX / HUMAN;
- Kasada;
- AWS WAF;
- F5 BigIP;
- Sucuri;
- CloudFront;
- reCAPTCHA and hCaptcha markers.
The rules are small, reviewed, and versioned. Record corpus_version with captures. A verdict is a captured fact, not something to recompute later against a newer rule set.
Routing Policy
Suggested routing:
| Verdict | First move |
|---|---|
cloudflare challenged | Headful plus warm profile. |
datadome challenged | Rotate to a cleaner proxy. |
perimeterx or kasada challenged | Headful, slower actions, warm profile, then proxy rotation. |
| Presence only | Continue. Do not rotate. |
This page documents the detection layer. Your crawler owns the routing policy.
FAQs
Is vendor presence the same as being blocked?
No. Presence means a vendor fronts the site. A challenge means the vendor is actively walling the response.
Where does the verdict appear?
Python returns PageResponse.antibot. MCP returns antibot on fetch, fetch_many, and session_navigate when a vendor is detected.
Why scan the body if headers are available?
Some challenge pages return status 200 and hide the wall in the HTML. VoidCrawl scans a bounded body prefix to catch that case.
What should I do with Cloudflare challenged=true?
Try headful mode and a warm profile first. If that still fails, rotate IP or mark the URL uncrawlable.
Does this solve captchas?
No. This is triage. It tells your pipeline which wall is in front of the browser.