crawler: --no-cover, --concurrency, drop cross-host sleep + batch-50 Step 1 done

Three crawler ergonomics for batch operations:

--no-cover  Skip cover image download. For scan-only modes (license/meta
            scrape) this drops ~1.3s/project and avoids slow-CDN hangs.

--concurrency N  ThreadPoolExecutor wrapping the per-project loop. Default
                 1 = serial (current behavior). Anonymous endpoints tolerate
                 5+ comfortably; output uses a print lock for readable
                 interleaved progress. fetch_cover plumbs through crawl_one.

Drop cross-host sleep #1: in crawl_one between detail HTML (oshwhub.com)
and cover image (image.lceda.cn). Different hosts — sleep was unnecessary.
Saves ~1s/project. Sleep #2 (post-cover, before next iteration) stays — it
gates the next oshwhub.com hit.

download_to gains max_seconds wall budget (default 60s, cover uses 15s).
Defends against pathologically slow CDN connections — observed 10 KB/s
on image.lceda.cn for one project, would have hung 6+ min on a 3.6 MB
cover otherwise. httpx default timeout resets per chunk, so streaming
downloads need an external wall-clock guard.

batch-50 Step 1 (license/meta scrape) shipped:
  50/50 candidates have metadata.json + license recorded
  License distribution: GPL 3.0 32, Public Domain 6, NC variants 8,
                        CERN-OHL 1, MIT 1, CC BY 3.0 1
  Forge-friendly (non-NC): 41/50 (82%)
  Declared attachments: 180 files / 2.36 GB (median 18 MB/proj, max 304 MB)
  Walltime: 3min 26s for 28 projects at concurrency=5 (server-side
            HTML render bound, not sleep-bound)

One orphan partial cover (a670e60a...) cleaned up — leftover from the
first aborted run before the timeout fix landed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-29 01:35:11 +08:00
parent fe6971f3f9
commit c6fd111d6d
171 changed files with 5410 additions and 17 deletions

View File

@@ -0,0 +1,91 @@
{
"detail_url": "https://oshwhub.com/bryan_he/usb-ps-v30",
"cover_url": "https://image.lceda.cn/oshwhub/02c786878782461daaea1f2679e947c3.jpg",
"attachments": [
{
"name": "UP_SHELL.stp",
"url": "https://image.lceda.cn/oshwhub/project/attachments/fe24cf66b1ec44ea90ee4f84d8f13579.stp",
"original_id": "e0ab43ef325b407f807f420f4872c500"
},
{
"name": "DW_SHELL.stp",
"url": "https://image.lceda.cn/oshwhub/project/attachments/96c42de256974c499d0f6190a29db656.stp",
"original_id": "7cdef1490b6744cba57dabb5d501b5b6"
},
{
"name": "3D_PRINT_SHELL_2mm.stp",
"url": "https://image.lceda.cn/oshwhub/project/attachments/8526ff7d003d4125beaa279062807f8b.stp",
"original_id": "0fb510d9b7184e47882804eca5bc648f"
},
{
"name": "USB-PS-V3-演示视频.mp4",
"url": "https://image.lceda.cn/oshwhub/project/attachments/f448fff414d44a2081944ccb0f9adaf1.mp4",
"original_id": "a408733f2cbd48c6802d141b3a5272f8"
},
{
"name": "uart2ble_tools .7z",
"url": "https://image.lceda.cn/oshwhub/project/attachments/b10801221297404297f85bd768fd8ebb.7z",
"original_id": "0ff78fffd0b14cd4a05c3286c7d0f546"
},
{
"name": "AT32F421F8P7-USB-PS-V3-V1.1.hex",
"url": "https://image.lceda.cn/oshwhub/project/attachments/ac463ba72da645f7b902cf85beed3a27.hex",
"original_id": "b1e7bfed2e3447f3ab169d8ccb93c72f"
},
{
"name": "AT32F421F8P7-USB-PS-V3-V1.2.hex",
"url": "https://image.lceda.cn/oshwhub/project/attachments/64597dd50ce849f3b9956de52f2fa43e.hex",
"original_id": "7085bb6d72cd4269ae8ce57438f95592"
},
{
"name": "USB-PS-V3-煮开水.mp4",
"url": "https://image.lceda.cn/oshwhub/project/attachments/60ab89b45eea4dacba513ccd88c5e3af.mp4",
"original_id": "cac2193764cc44d3b2cd19388cbaa33a"
},
{
"name": "AT32F421F8P7-USB-PS-V3-V2.1-for-HW4.4.hex",
"url": "https://image.lceda.cn/oshwhub/project/attachments/2ff04fb300ee418b8ea5a25f043f8fdc.hex",
"original_id": "ce45a16319064b70becb4372fd32a116"
},
{
"name": "AT32F421F8P7-USB-PS-V3-V2.3-for-HWV4.4.hex",
"url": "https://image.lceda.cn/oshwhub/project/attachments/e7514010034442c08cbb10be0755c1e9.hex",
"original_id": "ee1520adb12d4a14868b6e1590dc6e43"
},
{
"name": "小程序演示.mp4",
"url": "https://image.lceda.cn/oshwhub/project/attachments/7be3c8e2229b435084ade0a96b49892d.mp4",
"original_id": "1b957a1a8b1040fe8f93623c340d31d5"
},
{
"name": "基本功能演示.mp4",
"url": "https://image.lceda.cn/oshwhub/project/attachments/af520259599a4665a8fddaf1c494003a.mp4",
"original_id": "fa327881fca94388915e23481596ac12"
},
{
"name": "AT32F421F8P7-USB-PS-V3-V2.4-for-HWV4.4.hex",
"url": "https://image.lceda.cn/oshwhub/project/attachments/e584a5d2a2034a3199a849d687d5c31d.hex",
"original_id": "d21d07c0335d44719b8edd265bd8e2b0"
},
{
"name": "AT32F421F8P7-USB-PS-V3-V2.6-for-HW4.4.hex",
"url": "https://image.lceda.cn/oshwhub/project/attachments/7806bc484e394f3c9c6ad4298fdc8460.hex",
"original_id": "c31ea38996f848058d0a3e2cb8b95290"
},
{
"name": "AT32F421F8P7-USB-PS-V3-V2.8-for-HW4.7.hex",
"url": "https://image.lceda.cn/oshwhub/project/attachments/cfcb951a1aac4c64b3f2062f082adbad.hex",
"original_id": "f19cfe79e680466d8af197d3aaa617a2"
},
{
"name": "AT32F421F8P7-USB-PS-V3-V3.0-for-HW4.7.hex",
"url": "https://image.lceda.cn/oshwhub/project/attachments/29696e9994f948bbac9972252676e81b.hex",
"original_id": "b56ef1745c074fd7bd9d4dfcf98651df"
},
{
"name": "AT32F421F8P7-USB-PS-V3-V3.1-for-HWV4.7.hex",
"url": "https://image.lceda.cn/oshwhub/project/attachments/3ba25297e30546b98e0b8802f999adea.hex",
"original_id": "027ac804a8bd4c2483bc4bb8ff2f89aa"
}
]
}