JS Fields
ys.js is for fields that are not present as stable text in the static HTML. It runs JavaScript in the live browser tab and merges the result into the extracted record.
There are two modes:
# Hand-authored. No LLM.signals: dict = ys.js("(() => ({ hasChat: !!window.Intercom }))()")
# Discovery-driven. Yosoi writes and verifies the script once.signals: dict = ys.js(description="Detect chat widgets and loaded vendor scripts")What It Solves
Use ys.js when the value lives in runtime browser state:
- script URLs loaded after page render;
windowglobals installed by widgets;- performance resource entries;
- iframe URLs and widget IDs;
- feature flags or counters that are easier to read through JavaScript than CSS.
Do not use it for normal text extraction. CSS selectors are cheaper and easier to audit when the value is already in the HTML.
Discovery Flow
For ys.js(description=...), Yosoi opens a browser tab and runs a short loop.
- Pre-probe the page for scripts, iframes, cookies, and relevant
windowkeys. - Ask the LLM for a JavaScript expression for the field.
- Evaluate the expression in the live tab.
- Validate the result through the contract field type.
- Retry with feedback when the result throws, returns nothing, or fails validation.
- Cache the verified script.
The cached script is reused on later scrapes. No LLM call is needed after the first successful discovery.
Typed Validation
The field’s Python annotation is the oracle. Yosoi does not accept a script just because it returns “something”.
from typing import Annotatedfrom pydantic import BeforeValidatorimport yosoi as ys
def comma_int(value: object) -> int: return int(str(value).replace(",", ""))
class Place(ys.Contract): review_count: Annotated[int, BeforeValidator(comma_int)] = ys.js( description="Google Maps review count" )If the script returns "1,234", the validator can coerce it to 1234. If it returns a long blob of unrelated text, discovery rejects the script and tries again.
Cache Rules
Scripts live under:
.yosoi/ js_scripts/ js_example_com.jsonThe cache is per domain and per field. A script is reused only when the field description still matches. That means unrelated contract edits do not strand working scripts, but a changed description triggers rediscovery for that field.
Fetcher Requirement
JS discovery needs a browser tab.
records = await ys.scrape( "https://example.com", ContractWithJsFields, fetcher_type="headless",)Use waterfall if you want Yosoi to stay on plain HTTP for static pages and escalate only when needed.
FAQs
When should I use ys.js?
Use it when the value only exists in the rendered browser, such as runtime scripts, widget globals, iframe URLs, or performance entries.
Does the LLM run on every scrape?
No. Discovery-driven JS fields cache a verified script per domain and field. Later scrapes reuse it.
How is the script verified?
Yosoi evaluates it in the live tab and validates the result through the contract field type and Pydantic validators.
Where is the script cache stored?
In .yosoi/js_scripts/, one file per domain, keyed by field name.
Can SimpleFetcher run JS discovery?
No. Use headless, headful, or waterfall.