Files
FacereDataset/docs/sources/probe_rate_limit_results.md
Knowit cb868988b9 crawler: drop sleep rates 10x for Pro API, 2x for oshwhub detail
Calibrated against ladder probes on 2026-04-29. Findings in
docs/sources/probe_rate_limit_results.md.

  SLEEP_PRO     5.0 -> 0.5  (pro.lceda.cn API)
  SLEEP_BETWEEN 2.0 -> 1.0  (oshwhub detail/listing)
  SLEEP_SOURCE  5.0 unchanged (lceda.cn Std endpoints — not yet probed)
  SLEEP_PRO_CDN 0.2 unchanged (modules.lceda.cn — already optimized)

The original 5s rate for Pro API was set out of caution because Pro
requires a logged-in cookie. Empirical sustained-burst probe (25
distinct UUIDs at 0.5s sleep, no recovery): 0/25 errors, median
latency 410ms, p90 932ms. The "Pro is rate-sensitive" assumption was
wrong — server tolerates QPS=2 cleanly.

oshwhub detail HTML pages slowed from p90 6.4s at 1.0s sleep to
p90 15s at 0.5s — server queue backs up. 1.0s is the headroom-safe
water mark.

Net effect on batch-50 estimate: ~1.5h -> ~30min.

scripts/probe_rate_limit.py: rate-limit ladder probe tool. Reusable
for new endpoints (Std source still owes a probe). Designed for safety:
30s tier recovery, low rep counts on auth hosts, bail on first non-200.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 00:45:34 +08:00

3.2 KiB
Raw Blame History

Rate-limit probe results

Probe date: 2026-04-29 Script: scripts/probe_rate_limit.py Method: Ladder test — N requests at decreasing inter-request sleep, 30s recovery between tiers, watch for status != 200, body shrinkage, or latency degradation.

oshwhub.com listing API (/api/project)

No auth. 6 tiers × 10 reps = 60 reqs total.

sleep status bad latency p90
2.0s all 200 0 1187ms
1.0s all 200 0 1237ms
0.5s all 200 0 567ms
0.25s all 200 0 1180ms
0.1s all 200 0 2194ms
0.0s all 200 0 5362ms ← server soft-limits via latency

Verdict: 0.5s safe water mark. Going faster doesn't fail but server adds queueing latency (no return on the speed-up).

oshwhub.com detail HTML (/<owner>/<path>)

No auth. 6 tiers × 10 distinct paths from batch-50 candidates.

sleep status bad latency p90
2.0s all 200 0 4767ms
1.0s all 200 0 6350ms
0.5s all 200 0 15364ms ← queue building
0.25s all 200 0 3755ms
0.1s all 200 0 8179ms
0.0s all 200 0 3856ms

Verdict: 1.0s safe water mark. Detail HTML is 0.5 MB SSR, server slowdown earlier than listing API. Going to 0.5s already triggers server queue (one outlier 15s response), risk of timeout cascades on real bulk runs.

pro.lceda.cn API (/api/v4/projects/<P>)

Auth required (logged-in cookie). Conservative ladder, reps capped at 8 to limit fingerprint exposure. 5 tiers × 8 reqs.

sleep status bad latency p90
5.0s all 200 0 7299ms
2.0s all 200 0 5518ms
1.0s all 200 0 1409ms
0.5s all 200 0 2995ms
0.25s all 200 0 1552ms

Then sustained burst test at the chosen water mark: 25 distinct Pro UUIDs at 0.5s sleep, no recovery.

  • 25/25 success (all status 200, all success: true)
  • median latency 410ms, p90 932ms, max 1853ms (first call only — TLS handshake)
  • effective QPS 1.0
  • wall time 24.9s (vs ~140s at the old 5s/req — 5.6× speedup)

Verdict: 0.5s safe water mark. Empirically Pro API tolerates QPS=2 cleanly, even sustained. Originally set high (5s) out of caution because Pro requires a logged-in account — that caution was unjustified.

lceda.cn Std source endpoints — NOT YET PROBED

Currently SLEEP_SOURCE = 5.0. Should be probed before lowering. Std crawler isn't on the critical path for batch-50 (~12 min vs Pro's ~10 min savings), so this can wait.

modules.lceda.cn CDN — already at 0.2s

CDN host serving AES-encrypted EPRO2 history blobs. Pre-existing SLEEP_PRO_CDN = 0.2, validated against editor HAR which fires blobs back-to-back without throttling. No further probing needed.

Settings applied

SLEEP_BETWEEN = 1.0   # was 2.0  (oshwhub detail/listing)
SLEEP_SOURCE  = 5.0   # unchanged (Std source — not yet probed)
SLEEP_PRO     = 0.5   # was 5.0  (Pro API host, 10× speedup)
SLEEP_PRO_CDN = 0.2   # unchanged (CDN, already optimized)

Net impact on batch-50 plan

  • Pro 25 项 × ~5 API calls each: 5×5 = 25s/proj × 25 = ~10min → 0.5×5 = 2.5s/proj × 25 = ~1min
  • Detail page scan 50 项: 50 × 2s = 100s → 50 × 1s = 50s
  • Combined batch-50 walltime estimate: ~1.5h → ~30 min