Configuration
Environment Variables
| Variable | Required | Description |
|---|---|---|
GROQ_KEY | One of these | Groq△ API key |
GEMINI_KEY | One of these | Google Gemini○ API key |
OPENAI_KEY | One of these | OpenAI◑ API key |
CEREBRAS_KEY | One of these | Cerebras◇ API key |
OPENROUTER_KEY | One of these | OpenRouter★ API key |
YOSOI_MODEL | Optional | Default model in provider:model format (e.g. groq:llama-3.3-70b-versatile) |
YOSOI_LOG_LEVEL | Optional | Logging level: DEBUG, INFO, WARNING, ERROR, ALL (default: DEBUG) |
LOGFIRE_TOKEN | Optional | Enables Logfire⬡ tracing |
These are the most commonly used provider keys. Yosoi supports 25+ providers — each with its own environment variable. You only need one.
Local Storage
Yosoi stores all state in .yosoi/ in your project root (gitignored by default):
.yosoi/ selectors/ # Cached selector JSON per domain logs/ # Run logs (run_YYYYMMDD_HHMMSS.log) debug_html/ # Extracted HTML snapshots (--debug only) content/ # Extracted output files (JSON, CSV, etc.) stats.json # Cumulative LLM call and usage statisticsObservability
Set LOGFIRE_TOKEN to send traces to Logfire for cloud-based observability. Without it, logs are written locally only.
FAQs
What happens if I set multiple provider keys?
Yosoi picks one based on a built-in fallback order (Groq, Gemini, Cerebras, OpenAI, OpenRouter). To control which provider and model are used, set YOSOI_MODEL to a provider:model string (e.g. groq:llama-3.3-70b-versatile).
Can I change the .yosoi/ storage location?
Not currently. The directory is always created in the working directory where Yosoi is run.
Is .yosoi/ safe to commit to version control?
The selector cache is safe to commit if you want to share discovered selectors across a team. The logs/, debug_html/, and content/ subdirectories are noisy and should stay gitignored.
How do I enable debug HTML snapshots?
Pass --debug when running the CLI. Snapshots are saved to .yosoi/debug_html/ and are useful for diagnosing extraction failures.
References
△ Groq API. Groq, Inc. Low-latency LLM inference. https://console.groq.com/docs/
○ Gemini API. Google. Gemini language model API. https://ai.google.dev/gemini-api/docs
◑ OpenAI API. OpenAI. GPT model API. https://platform.openai.com/docs/
◇ Cerebras API. Cerebras Systems. High-speed LLM inference on wafer-scale hardware. https://inference-docs.cerebras.ai/
★ OpenRouter. OpenRouter. Unified API for LLM providers. https://openrouter.ai/docs
⬡ Logfire. Pydantic. Cloud observability and tracing. https://logfire.pydantic.dev/docs/