Topic-targeted pull from local listing index (`name OR introduction`
contains 飞控). 79 std hits in oshwhub_listing_full.jsonl, 2 already
crawled, 77 newly fetched.
dev1 (Guangzhou) walltime:
Step 1 detail scrape ~12s, Step 4 std-source backfill ~80s
(concurrency=5)
Source completeness: 73/77 with editor source, 4 are upstream
attachments-only (no editor session ever attached, source_documents=[]
is genuine — no editor_version on the SSR page either).
Crawler hardening (crawlers/oshwhub/crawler.py):
- count.{like,star,fork,views} are now `.get(..., 0)` defensive.
Listing API omits zero-valued fields for some low-activity entries
(3/77 hit this on first pass, hard-failed with KeyError 'like').
Affects rank_score, pick_top, and metadata.json metrics block.
License mix: 65% GPL 3.0, 11% Public Domain, 11% MIT, ~6% CC variants.
Transport: dev1 → SG via tar+scp (33 MB, ~3 min over lossy
cross-region link). Bypassed gitea push from dev1 because the same
6.5%-loss link tanks single-stream throughput.