Skip to content
Cascading Labs QScrape VoidCrawl Yosoi

Functions

Generated from yosoi v0.0.1a11. Only symbols in __all__ are listed.

auto_config

auto_config(model: str | None = None, debug: bool = False) -> YosoiConfig

Auto-detect LLM provider and build config.

Resolution order:

  1. Explicit model argument (provider:model-name format)
  2. $YOSOI_MODEL environment variable
  3. First provider with an available API key
  4. Groq default fallback Args:
  • model str | None — Model string in provider:model-name format, or None.
  • debug bool — Whether to enable debug HTML saving.

Returns: YosoiConfig — Validated YosoiConfig.

Raises:

  • ValueError — On configuration errors (bad model format, no API key, etc.).

css

css(value: str) -> SelectorEntry

Create a CSS SelectorEntry.

discover

discover() -> SelectorEntry

Sentinel: AI will discover the root for this scoped nested contract.

jsonld

jsonld(value: str) -> SelectorEntry

Create a JSON-LD SelectorEntry.

load_urls_from_file

load_urls_from_file(filepath: str) -> list[str]

Load URLs from a file (JSON, plain text, CSV, Excel, Parquet, or Markdown). Args:

  • filepath str — Path to file containing URLs.

Returns: list[str] — List of URL strings.

Raises:

  • FileNotFoundError — If file does not exist.
  • ValueError — If file format requires unavailable dependencies.

regex

regex(value: str) -> SelectorEntry

Create a regex SelectorEntry.

register_coercion

register_coercion(type_name: str, description: str = '', config_defaults: Any = {}) -> Callable[[Callable[..., CoercedValue]], Callable[..., Any]]

Decorator that registers a coercion function and returns a Field factory.

The decorated function becomes the Field factory — its name is what you use in a Contract. The coercion logic is stored internally in the registry.

Decorator kwargs define the config schema:

  • description: default field description

  • all other kwargs: config keys that appear in json_schema_extra and are forwarded to the coerce function via config Args:

  • type_name str — The yosoi_type identifier (e.g. 'price').

  • description str — Default field description shown in manifests and to the AI.

  • **config_defaults Any — Config keys with their default values. These become keyword arguments on the generated factory function.

Example::

@register_coercion('phone', description='A phone number', country_code='+1')
def PhoneNumber(v, config, source_url=None):
import re
digits = re.sub(r'\D', '', str(v))
return config.get('country_code', '+1') + digits
# PhoneNumber is now a Field factory:
# PhoneNumber(country_code='+44') -> Field(json_schema_extra={...})

resolve_contract

resolve_contract(name: str) -> type[Contract]

Resolve a contract name to a Contract class (exact matching only).

This is the programmatic API. No fuzzy matching or file scanning is performed — those are CLI-only DX features in SchemaParamType.

Resolution order:

  1. Exact match in BUILTIN_SCHEMAS
  2. Case-insensitive match in BUILTIN_SCHEMAS
  3. Exact / case-insensitive match in _CONTRACT_REGISTRY (custom schemas)
  4. Dynamic import via path:ClassName Args:
  • name str — Contract name or path:ClassName string.

Returns: type[Contract] — The resolved Contract subclass.

Raises:

  • ValueError — If no matching contract is found.

xpath

xpath(value: str) -> SelectorEntry

Create an XPath SelectorEntry.