Skip to content
Cascading Labs QScrape VoidCrawl Yosoi

JSON Output

Set output_format on the Pipeline (or --output on the CLI) to persist extracted data automatically.

CLI

uv run yosoi --url https://qscrape.dev/l1/news --contract NewsArticle --output json

Combine multiple formats in one run:

uv run yosoi --url https://qscrape.dev/l1/news --contract NewsArticle --output json,csv

Python

# output.py
import asyncio
import yosoi as ys
class Article(ys.Contract):
title: str = ys.Title()
author: str = ys.Author()
async def main():
pipeline = ys.Pipeline(
ys.auto_config(),
contract=Article,
output_format='json',
)
async for item in pipeline.scrape('https://qscrape.dev/l1/news'):
print(item.get('title'))
asyncio.run(main())

Run it:

uv run python output.py

Results are written to .yosoi/content/<domain>/results.json. Multi-item pages are saved as {"items": [...]}.

Multiple Formats at Once

pipeline = ys.Pipeline(config, contract=Article, output_format=['json', 'csv'])

Supported Formats

FormatExtensionNotes
json.jsonOne file per domain
jsonl.jsonlOne JSON object per line (append-friendly)
ndjson.jsonlAlias for jsonl
csv.csvFlat tabular output
md.mdMarkdown table
xlsx.xlsxRequires openpyxl (uv add yosoi[tabular])
parquet.parquetRequires pyarrow (uv add yosoi[tabular])