crawler: drop SLEEP_SOURCE 5.0 -> 0.5 (Std doc endpoint probe)
Ladder probe lceda.cn/api/documents/<uuid>: 5 tiers (5/2/1/0.5/0.25s) × 9 distinct Std doc UUIDs = 45 reqs total, all 200/success. Latency variance is dominated by payload size (Std docs span 4 KB to 4.5 MB) not server backpressure. Same posture as Pro API. Net effect on batch-50 estimate: Std 25 项 × 10 doc calls saved ~19 min wall time (21min sleep -> 2min sleep). Combined plan now projects ~2h -> ~10min walltime exclusive of download bytes. scripts/probe_rate_limit.py: --host std-doc tier added. Reads doc UUIDs from /tmp/std_doc_uuids.json (assembled by caller from any source/manifest.json upstream_version_documents lists). Reusable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -49,8 +49,10 @@ BROWSER_UA = (
|
||||
SLEEP_BETWEEN = 1.0 # oshwhub.com detail/listing — ladder probe: 0.5s clean,
|
||||
# 1.0s leaves headroom (detail HTML p90 hits 6s at 1.0s,
|
||||
# 15s at 0.5s due to server-queue softlimit).
|
||||
SLEEP_SOURCE = 5.0 # lceda.cn Std source endpoints — NOT yet probed; keep
|
||||
# conservative. Drop only after a dedicated ladder run.
|
||||
SLEEP_SOURCE = 0.5 # lceda.cn Std source endpoints — ladder probe 5/2/1/0.5/0.25s
|
||||
# all clean (45/45 200/success). Latency is dominated by
|
||||
# payload size (Std docs span 4 KB to 4.5 MB) not server
|
||||
# backpressure. Same posture as Pro API. 10x speedup.
|
||||
SLEEP_PRO = 0.5 # pro.lceda.cn API host — sustained burst probe (25
|
||||
# distinct UUIDs at 0.5s) showed 0/25 errors, median
|
||||
# latency 410ms. 10x faster than the original 5.0s.
|
||||
|
||||
Reference in New Issue
Block a user