Files
FacereDataset/docs
Knowit d89a7cdf9c oshwhub: dump full listing index (33,695 projects) for batch sizing
Probed listing API and learned: total field is exposed (Pro=21,202 / Std=12,493),
pageSize accepts >=1000 (full corpus = 35 requests / 71s), sort param is silently
ignored. Dump all listings via scripts/dump_listing_index.py to local jsonl so
downstream batch-selection no longer hits the API.

Why: needed quantitative anchors before scaling Pro batch beyond top-5. License
is detail-page only (~19h serial scan), so we want to filter on grade/like
*locally* first to shortlist before paying that cost. Quality-tier counts now
known: A-tier (grade>=3 & like>=10) = 2,806 across both origins.

- scripts/dump_listing_index.py: one-shot scraper, polite QPS, streams to jsonl
- docs/sources/oshwhub_listing_full.md: human-readable report with growth
  trends, quality tiers, owner concentration, and storage-budget anchors

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 23:30:56 +08:00
..
2026-04-23 18:58:10 +08:00