飞控-77: 77 std flight-controller projects ingested
Topic-targeted pull from local listing index (`name OR introduction`
contains 飞控). 79 std hits in oshwhub_listing_full.jsonl, 2 already
crawled, 77 newly fetched.
dev1 (Guangzhou) walltime:
Step 1 detail scrape ~12s, Step 4 std-source backfill ~80s
(concurrency=5)
Source completeness: 73/77 with editor source, 4 are upstream
attachments-only (no editor session ever attached, source_documents=[]
is genuine — no editor_version on the SSR page either).
Crawler hardening (crawlers/oshwhub/crawler.py):
- count.{like,star,fork,views} are now `.get(..., 0)` defensive.
Listing API omits zero-valued fields for some low-activity entries
(3/77 hit this on first pass, hard-failed with KeyError 'like').
Affects rank_score, pick_top, and metadata.json metrics block.
License mix: 65% GPL 3.0, 11% Public Domain, 11% MIT, ~6% CC variants.
Transport: dev1 → SG via tar+scp (33 MB, ~3 min over lossy
cross-region link). Bypassed gitea push from dev1 because the same
6.5%-loss link tanks single-stream throughput.
This commit is contained in:
@@ -0,0 +1,5 @@
|
||||
{
|
||||
"detail_url": "https://oshwhub.com/JumperShao/m1_mh743_ada_v4",
|
||||
"cover_url": "https://image.lceda.cn/pullimage/haHqGHCALFdesiNBAPqK4v0aqj32O5SyRmxiDvtC.jpeg",
|
||||
"attachments": []
|
||||
}
|
||||
Reference in New Issue
Block a user