The downstream colleague's "encrypted_external" / "string old format"
projects were Pro 2.x, not Pro 3.x EPRO2. Pro 2.x ships each doc as a
JSON file whose `dataStr` is a plaintext op-stream — one JSON array per
line, e.g. `["COMPONENT","e1","",0,0,0,0,{},0]`. Different wire format
from EPRO2's binary tilde/pipe streams; same Std envelope works for
output.
- tools/epro2/std/pro2_writer.py: parses dataStr line-by-line, keys
objects by id (position 1 for most ops, OPTYPE for singletons),
extracts BBox by walking known coord positions per OPTYPE, derives
layers from LAYER ops directly (Pro 2.x almost matches Std layer
string format already). PCB blobs that are encrypted-external
(`dataStrId` URL + `iv` + `key`, no inline dataStr — Taishan PCB)
return None so the CLI skips with a message instead of stubbing.
- tools/epro2/std/__main__.py: auto-detect via manifest's
editor_version. "2.x" → Pro 2.x writer; otherwise the existing
EPRO2 replay path. CLI surface and output layout unchanged.
- docs/sources/epro2_to_std_mapping.md: adds a Pro 2.x section.
Adapter dispatches on `head.epro_format`: absent / "epro2" gets
dict-shaped objects values, "pro2" gets array-shaped values
(`[OPTYPE, arg1, ...]`). Lists the Pro 2.x-specific OPTYPEs
(FONTSTYLE / LINESTYLE / CONNECT / OBJ / REGION / DIMENSION /
STRING / TEARDROP) the EPRO2 vocabulary doesn't have.
Smoke (re-running --all on all 5 Pro projects): 191 → 222 JSON files.
Liangshan adds 3 (2 SCH + inline 5357-object PCB). Taishan adds 28
(SCH only — PCB skipped, encrypted-external; source/<uuid>.json still
keeps the dataStrId/iv/key for a later fetch+decrypt pass).
84 → 86 unit tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Downstream came back with concrete requirements: don't pre-compute Std
shape[] tilde strings, just dump the raw EPRO2 `objects: {id: payload}`
dict and they'll write a ~100-LoC adapter on their side. Pulling the
tilde-mapping work back saves us from second-guessing positional fields
without their parser to verify against, and shortens our pcb_writer
from ~500 lines to ~40.
Output shape (Std envelope intact, just no `shape[]`):
{
"success": true, "code": 0,
"result": {
"uuid", "puuid", "title",
"docType": 3 | 1,
"components": {},
"dataStr": {
"head": {
"docType": "3" | "1",
"editorVersion": "facere-epro2/0.1 (epro2 <X.Y.Z>)",
"units": "mil",
"epro2_doc_uuid": ...,
"epro2_editor_version": ...,
},
"BBox": {x, y, width, height}, # mil
"layers": [...], # Std layer-string array
"objects": dict(doc.objects), # raw EPRO2, 1:1
"preference": {}, "netColors": [], "DRCRULE": {},
}
}
}
Per-doc spec downstream gave us:
- shape[] dropped (empty placeholder misleads adapter)
- all units mil (no mm conversion — Std canvas already declares mil)
- head.units="mil" so adapter doesn't have to guess
- BBox min/max across known x/y/startX/endX/centerX fields; adapter
can refine by walking path arrays itself
- layers[] keeps Std's 17-line default + inner SIGNAL layers actually
used (21~Inner1.., 22~Inner2..)
- empty stubs preference/netColors/DRCRULE for grep-based triage
New: docs/sources/epro2_to_std_mapping.md with the full EPRO2 OPTYPE →
Std verb table that downstream's adapter authors will copy from. Tables
include the layer-id remapping (the 5↔7 paste/mask flip, 11→10 outline,
12→11 multi, SIGNAL 15+→21+), PCB op mappings, SCH op mappings (marked
best-effort: no Std SCH samples in our corpus), and the 5-Voltage
placeholder COMPONENT → extra net flag trick. Extracted from the
previous Option-3 writer (commit fe6971f) so adapter writers don't
have to reverse-engineer it from source.
ESP-VoCat smoke: 6 PCB + 9 SCH = 15 JSON files, head.units=mil
preserved, no shape[] field present. 82 → 84 unit tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The doc had been growing incrementally as each host got probed; reshape
it as a polished benchmark with TL;DR top, methodology section
(including safety constraints + caveats), per-host detailed tables,
final crawler settings, batch-50 walltime breakdown, and a reproduce
recipe.
Five hosts fully covered:
pro.lceda.cn API 5.0s -> 0.5s (10×)
lceda.cn doc 5.0s -> 0.5s (10×)
oshwhub detail 2.0s -> 1.0s ( 2×)
oshwhub listing 2.0s -> 1.0s ( 2×)
modules.lceda CDN 0.2s (already optimized)
Net effect on batch-50 plan: sleep total ~32min -> ~3min, walltime
~2h -> ~10-15min.
Key finding: the original 5s/req on Pro was set out of "logged-in
account is precious" caution with zero empirical evidence. Sustained
burst probe (25 distinct UUIDs at 0.5s, no recovery) showed 0/25 errors
and median latency 410ms — the caution was unjustified.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Ladder probe lceda.cn/api/documents/<uuid>: 5 tiers (5/2/1/0.5/0.25s)
× 9 distinct Std doc UUIDs = 45 reqs total, all 200/success. Latency
variance is dominated by payload size (Std docs span 4 KB to 4.5 MB)
not server backpressure. Same posture as Pro API.
Net effect on batch-50 estimate: Std 25 项 × 10 doc calls saved ~19
min wall time (21min sleep -> 2min sleep). Combined plan now projects
~2h -> ~10min walltime exclusive of download bytes.
scripts/probe_rate_limit.py: --host std-doc tier added. Reads doc UUIDs
from /tmp/std_doc_uuids.json (assembled by caller from any source/manifest.json
upstream_version_documents lists). Reusable.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Calibrated against ladder probes on 2026-04-29. Findings in
docs/sources/probe_rate_limit_results.md.
SLEEP_PRO 5.0 -> 0.5 (pro.lceda.cn API)
SLEEP_BETWEEN 2.0 -> 1.0 (oshwhub detail/listing)
SLEEP_SOURCE 5.0 unchanged (lceda.cn Std endpoints — not yet probed)
SLEEP_PRO_CDN 0.2 unchanged (modules.lceda.cn — already optimized)
The original 5s rate for Pro API was set out of caution because Pro
requires a logged-in cookie. Empirical sustained-burst probe (25
distinct UUIDs at 0.5s sleep, no recovery): 0/25 errors, median
latency 410ms, p90 932ms. The "Pro is rate-sensitive" assumption was
wrong — server tolerates QPS=2 cleanly.
oshwhub detail HTML pages slowed from p90 6.4s at 1.0s sleep to
p90 15s at 0.5s — server queue backs up. 1.0s is the headroom-safe
water mark.
Net effect on batch-50 estimate: ~1.5h -> ~30min.
scripts/probe_rate_limit.py: rate-limit ladder probe tool. Reusable
for new endpoints (Std source still owes a probe). Designed for safety:
30s tier recovery, low rep counts on auth hosts, bail on first non-200.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Colleague-facing explainer at docs/sources/pro_crawl_vs_export.md.
Addresses the "I see 278 .epro2 files but my browser only downloaded
one" confusion: web download is a ZIP container (extension is a UX
choice, not a format), our crawl produces per-doc message streams.
Both carry equivalent EPRO2 data; only real gap is IMAGE/ binary
previews which we don't fetch yet.
Why per-doc and not ZIP: the ZIP path has no public endpoint —
three HARs confirm the export button fires zero HTTP requests, it's
pure client-side JSZip on data already loaded by the editor. Our
crawler hits the same chain endpoints the editor uses internally,
which delivers per-doc streams.
Log entry references the 278 vs 266 doc-count delta for ESP-VoCat
(we walk full history chain, web export is a current snapshot).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Probed listing API and learned: total field is exposed (Pro=21,202 / Std=12,493),
pageSize accepts >=1000 (full corpus = 35 requests / 71s), sort param is silently
ignored. Dump all listings via scripts/dump_listing_index.py to local jsonl so
downstream batch-selection no longer hits the API.
Why: needed quantitative anchors before scaling Pro batch beyond top-5. License
is detail-page only (~19h serial scan), so we want to filter on grade/like
*locally* first to shortlist before paying that cost. Quality-tier counts now
known: A-tier (grade>=3 & like>=10) = 2,806 across both origins.
- scripts/dump_listing_index.py: one-shot scraper, polite QPS, streams to jsonl
- docs/sources/oshwhub_listing_full.md: human-readable report with growth
trends, quality tiers, owner concentration, and storage-budget anchors
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>