crawler: drop SLEEP_SOURCE 5.0 -> 0.5 (Std doc endpoint probe)

Ladder probe lceda.cn/api/documents/<uuid>: 5 tiers (5/2/1/0.5/0.25s)
× 9 distinct Std doc UUIDs = 45 reqs total, all 200/success. Latency
variance is dominated by payload size (Std docs span 4 KB to 4.5 MB)
not server backpressure. Same posture as Pro API.

Net effect on batch-50 estimate: Std 25 项 × 10 doc calls saved ~19
min wall time (21min sleep -> 2min sleep). Combined plan now projects
~2h -> ~10min walltime exclusive of download bytes.

scripts/probe_rate_limit.py: --host std-doc tier added. Reads doc UUIDs
from /tmp/std_doc_uuids.json (assembled by caller from any source/manifest.json
upstream_version_documents lists). Reusable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-29 00:54:46 +08:00
parent 8b857428e3
commit 183f82a3be
3 changed files with 86 additions and 9 deletions

View File

@@ -49,8 +49,10 @@ BROWSER_UA = (
SLEEP_BETWEEN = 1.0 # oshwhub.com detail/listing — ladder probe: 0.5s clean,
# 1.0s leaves headroom (detail HTML p90 hits 6s at 1.0s,
# 15s at 0.5s due to server-queue softlimit).
SLEEP_SOURCE = 5.0 # lceda.cn Std source endpoints — NOT yet probed; keep
# conservative. Drop only after a dedicated ladder run.
SLEEP_SOURCE = 0.5 # lceda.cn Std source endpoints — ladder probe 5/2/1/0.5/0.25s
# all clean (45/45 200/success). Latency is dominated by
# payload size (Std docs span 4 KB to 4.5 MB) not server
# backpressure. Same posture as Pro API. 10x speedup.
SLEEP_PRO = 0.5 # pro.lceda.cn API host — sustained burst probe (25
# distinct UUIDs at 0.5s) showed 0/25 errors, median
# latency 410ms. 10x faster than the original 5.0s.