Ladder probe lceda.cn/api/documents/<uuid>: 5 tiers (5/2/1/0.5/0.25s) × 9 distinct Std doc UUIDs = 45 reqs total, all 200/success. Latency variance is dominated by payload size (Std docs span 4 KB to 4.5 MB) not server backpressure. Same posture as Pro API. Net effect on batch-50 estimate: Std 25 项 × 10 doc calls saved ~19 min wall time (21min sleep -> 2min sleep). Combined plan now projects ~2h -> ~10min walltime exclusive of download bytes. scripts/probe_rate_limit.py: --host std-doc tier added. Reads doc UUIDs from /tmp/std_doc_uuids.json (assembled by caller from any source/manifest.json upstream_version_documents lists). Reusable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
106 lines
3.9 KiB
Markdown
106 lines
3.9 KiB
Markdown
# Rate-limit probe results
|
||
|
||
**Probe date**: 2026-04-29
|
||
**Script**: `scripts/probe_rate_limit.py`
|
||
**Method**: Ladder test — N requests at decreasing inter-request sleep,
|
||
30s recovery between tiers, watch for status != 200, body shrinkage,
|
||
or latency degradation.
|
||
|
||
## oshwhub.com listing API (`/api/project`)
|
||
|
||
No auth. 6 tiers × 10 reps = 60 reqs total.
|
||
|
||
| sleep | status | bad | latency p90 |
|
||
|---|---|---:|---:|
|
||
| 2.0s | all 200 | 0 | 1187ms |
|
||
| 1.0s | all 200 | 0 | 1237ms |
|
||
| 0.5s | all 200 | 0 | 567ms |
|
||
| 0.25s | all 200 | 0 | 1180ms |
|
||
| 0.1s | all 200 | 0 | 2194ms |
|
||
| 0.0s | all 200 | 0 | 5362ms ← server soft-limits via latency |
|
||
|
||
**Verdict**: 0.5s safe water mark. Going faster doesn't fail but server adds
|
||
queueing latency (no return on the speed-up).
|
||
|
||
## oshwhub.com detail HTML (`/<owner>/<path>`)
|
||
|
||
No auth. 6 tiers × 10 distinct paths from batch-50 candidates.
|
||
|
||
| sleep | status | bad | latency p90 |
|
||
|---|---|---:|---:|
|
||
| 2.0s | all 200 | 0 | 4767ms |
|
||
| 1.0s | all 200 | 0 | 6350ms |
|
||
| 0.5s | all 200 | 0 | **15364ms** ← queue building |
|
||
| 0.25s | all 200 | 0 | 3755ms |
|
||
| 0.1s | all 200 | 0 | 8179ms |
|
||
| 0.0s | all 200 | 0 | 3856ms |
|
||
|
||
**Verdict**: 1.0s safe water mark. Detail HTML is 0.5 MB SSR, server
|
||
slowdown earlier than listing API. Going to 0.5s already triggers server
|
||
queue (one outlier 15s response), risk of timeout cascades on real bulk runs.
|
||
|
||
## pro.lceda.cn API (`/api/v4/projects/<P>`)
|
||
|
||
**Auth required** (logged-in cookie). Conservative ladder, reps capped at 8
|
||
to limit fingerprint exposure. 5 tiers × 8 reqs.
|
||
|
||
| sleep | status | bad | latency p90 |
|
||
|---|---|---:|---:|
|
||
| 5.0s | all 200 | 0 | 7299ms |
|
||
| 2.0s | all 200 | 0 | 5518ms |
|
||
| 1.0s | all 200 | 0 | 1409ms |
|
||
| 0.5s | all 200 | 0 | 2995ms |
|
||
| 0.25s | all 200 | 0 | 1552ms |
|
||
|
||
Then **sustained burst test** at the chosen water mark:
|
||
**25 distinct Pro UUIDs at 0.5s sleep, no recovery**.
|
||
|
||
- 25/25 success (all status 200, all `success: true`)
|
||
- median latency 410ms, p90 932ms, max 1853ms (first call only — TLS handshake)
|
||
- effective QPS 1.0
|
||
- wall time 24.9s (vs ~140s at the old 5s/req — 5.6× speedup)
|
||
|
||
**Verdict**: 0.5s safe water mark. Empirically Pro API tolerates QPS=2
|
||
cleanly, even sustained. Originally set high (5s) out of caution because
|
||
Pro requires a logged-in account — that caution was unjustified.
|
||
|
||
## lceda.cn Std doc endpoint (`/api/documents/<uuid>`)
|
||
|
||
No auth (Std is anonymous-readable, browser UA + Referer only).
|
||
5 tiers × 9 distinct doc UUIDs from already-crawled Std projects.
|
||
|
||
| sleep | status | bad | latency med | latency p90 | body median |
|
||
|---|---|---:|---:|---:|---:|
|
||
| 5.0s | all 200 | 0 | 1124ms | 3846ms | 31 KB |
|
||
| 2.0s | all 200 | 0 | 2634ms | 7626ms | 495 KB |
|
||
| 1.0s | all 200 | 0 | 1781ms | **19834ms** (one 4.5 MB doc) | 918 KB |
|
||
| 0.5s | all 200 | 0 | 666ms | 891ms | 748 KB |
|
||
| 0.25s | all 200 | 0 | 416ms | 1384ms | 251 KB |
|
||
|
||
**Verdict**: 0.5s safe water mark. Latency variance is dominated by
|
||
**payload size** (Std docs span 4 KB to 4.5 MB) — not server backpressure.
|
||
The 19s p90 at the 1.0s tier was one giant doc, not a throttle. Same
|
||
posture as Pro API.
|
||
|
||
## modules.lceda.cn CDN — already at 0.2s
|
||
|
||
CDN host serving AES-encrypted EPRO2 history blobs. Pre-existing
|
||
`SLEEP_PRO_CDN = 0.2`, validated against editor HAR which fires blobs
|
||
back-to-back without throttling. No further probing needed.
|
||
|
||
## Settings applied
|
||
|
||
```python
|
||
SLEEP_BETWEEN = 1.0 # was 2.0 (oshwhub detail/listing)
|
||
SLEEP_SOURCE = 0.5 # was 5.0 (Std doc endpoint, 10× speedup)
|
||
SLEEP_PRO = 0.5 # was 5.0 (Pro API host, 10× speedup)
|
||
SLEEP_PRO_CDN = 0.2 # unchanged (CDN, already optimized)
|
||
```
|
||
|
||
## Net impact on batch-50 plan
|
||
|
||
- Pro 25 项 × ~5 API calls each: 5×5 = 25s/proj × 25 = ~10min → 0.5×5 = 2.5s/proj × 25 = ~1min
|
||
- Std 25 项 × ~10 doc calls each: 5×10 = 50s/proj × 25 = ~21min → 0.5×10 = 5s/proj × 25 = ~2min
|
||
- Detail page scan 50 项: 50 × 2s = 100s → 50 × 1s = 50s
|
||
- Combined batch-50 walltime estimate: **~2h → ~10 min** (excluding actual download bytes)
|