crawler: split sleep policy by host — chain blob fetches drop 5s -> 0.2s

The Pro modern fetch_pro_modern walks a per-history blob loop on modules.lceda.cn (CDN-flavored host serving AES-encrypted EPRO2 streams). We were sleeping 5s between every blob — same rate we use for the rate-sensitive pro.lceda.cn API host. HAR analysis (proexportNew2.har) shows the editor fires these blobs back-to-back without throttling, so 0.2s is plenty. Walltime drops linearly with chain length: ESP-VoCat (chain=12): 80s sleep -> 22s sleep (-72%) 220V power (chain=28): 160s sleep -> 26s sleep (-84%) X86 board (chain~700, projection): ~1h -> ~3min Verified by re-fetching ESP-VoCat + 220V power: byte-identical output across all per-doc .epro2 files (sha256 match), only fetched_at timestamp differs in manifest.json. Two manifest files re-stamped as proof of the validation runs. API host sleeps (4x 5s in modern fetcher, 7x 5s in legacy fetcher) are unchanged — those go to pro.lceda.cn /api/ which still wants polite QPS<=0.2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 00:09:19 +08:00
parent ff5553fb06
commit 1e06ba6582
3 changed files with 11 additions and 4 deletions
--- a/data/raw/oshwhub/ba64bd6f1c9c467ba3b674a54943557d/source/manifest.json
+++ b/data/raw/oshwhub/ba64bd6f1c9c467ba3b674a54943557d/source/manifest.json
@@ -2,7 +2,7 @@
  "project_uuid": "ba64bd6f1c9c467ba3b674a54943557d",
  "branch_uuid": "ef5f58bd0f1245b0a808c07e541a1b5c",
  "head_uuid": "764dd8b722a44914a915493277e204c9",
-  "fetched_at": "2026-04-28T13:22:26.550372+00:00",
+  "fetched_at": "2026-04-28T16:06:55.434479+00:00",
  "editor_version": "3.2.91",
  "chain_length": 12,
  "blob_bytes_total": 1195716,