Files
FacereDataset/crawlers/oshwhub
Knowit cb868988b9 crawler: drop sleep rates 10x for Pro API, 2x for oshwhub detail
Calibrated against ladder probes on 2026-04-29. Findings in
docs/sources/probe_rate_limit_results.md.

  SLEEP_PRO     5.0 -> 0.5  (pro.lceda.cn API)
  SLEEP_BETWEEN 2.0 -> 1.0  (oshwhub detail/listing)
  SLEEP_SOURCE  5.0 unchanged (lceda.cn Std endpoints — not yet probed)
  SLEEP_PRO_CDN 0.2 unchanged (modules.lceda.cn — already optimized)

The original 5s rate for Pro API was set out of caution because Pro
requires a logged-in cookie. Empirical sustained-burst probe (25
distinct UUIDs at 0.5s sleep, no recovery): 0/25 errors, median
latency 410ms, p90 932ms. The "Pro is rate-sensitive" assumption was
wrong — server tolerates QPS=2 cleanly.

oshwhub detail HTML pages slowed from p90 6.4s at 1.0s sleep to
p90 15s at 0.5s — server queue backs up. 1.0s is the headroom-safe
water mark.

Net effect on batch-50 estimate: ~1.5h -> ~30min.

scripts/probe_rate_limit.py: rate-limit ladder probe tool. Reusable
for new endpoints (Std source still owes a probe). Designed for safety:
30s tier recovery, low rep counts on auth hosts, bail on first non-200.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 00:45:34 +08:00
..