tools/epro2: add std/ writer — EPRO2 → EasyEDA Std-format JSON for downstream

The downstream colleague consumes oshwhub Std (lceda) dict-format JSON,
not KiCad. The EPRO2 decryption part (per-doc plaintext .epro2 streams
in data/raw/<uuid>/source/) is what we already provide; the missing
piece is converting EPRO2 op-streams into the same `dataStr.shape`
tilde-delimited format their parser already speaks.

New tools/epro2/std/ module, peer of tools/epro2/kicad/, kept
deliberately separate so the KiCad path stays untouched:

  - pcb_writer.write_pcb_std() — high-fidelity, validated against a Std
    PCB sample at data/raw/oshwhub/3e2f893d.../25931ddab8.json. Maps
    LINE→TRACK, VIA→VIA, POUR→COPPERAREA (with SVG `M..L..Z` path),
    POLY→CIRCLE/SOLIDREGION, COMPONENT+FOOTPRINT→LIB nested with
    #@$-separated PADs (placement rotation + translate applied so pad
    coords land at PCB-absolute positions). Layer-id mapping (EPRO2 5↔7
    flipped vs Std solder/paste, 11→10 outline, 12→11 multi, SIGNAL
    inner 15+ → Std 21+) noted inline.

  - sch_writer.write_sch_std() — best-effort. Our corpus has zero Std
    schematic samples (docType=1) so verb field orders follow the
    EasyEDA Std public spec, not direct observation. Emits W (wire),
    N (net flag, including the 5-Voltage Global Net Name power-port
    pattern), T (text), LIB (placement with #@$-nested PIN/T). If
    downstream's parser bails the fix is almost certainly a positional
    field tweak, not a re-architecture.

  - __main__.py — flat output `<doc_uuid>.json` per doc directly under
    --out (mirrors Std's own data layout); --all-pcb / --all-sch / --all.

Smoke test on ESP-VoCat: 6 PCB + 9 SCH = 15 JSON files, libs_unresolved=0
across the board. Compact JSON (separators=(",",":")) matches Std's
single-line format. Numbers use _num() — integers without trailing .0,
floats trimmed.

71 → 82 unit tests pass.

Open questions for downstream: (1) confirm SCH verb field orders, (2)
do they want any of the upstream metadata fields we drop (master,
owner, created_at, etc — those live on the crawler side, not the
schematic itself)?

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-29 01:16:39 +08:00
parent ed713fa557
commit fe6971f3f9
6 changed files with 1155 additions and 0 deletions

84
log.md
View File

@@ -4,6 +4,90 @@
---
## 2026-04-29 04:00 Std-format JSON 转换器EPRO2 → 下游同学 Wokwi pipeline 的输入格式
**Claude 会话**
KiCad 那条路下游同学不需要——他们的 Wokwi pipeline 吃 oshwhub Std (lceda) 的 JSON dict-format。EPRO2 解密我们已经搞定per-doc 流就在 `data/raw/<uuid>/source/`),现在缺的是把 EPRO2 op-stream 翻成 Std 的 `dataStr.shape` 字符串数组。
### 新增 `tools/epro2/std/`(跟 `kicad/` 平级,旧的不动)
参照 `data/raw/oshwhub/3e2f893d.../25931ddab8.json` 一个 Std PCB 实样反推协议:
- 信封:`{success, code, result: {uuid, puuid, title, docType, components, dataStr: {head, canvas, shape, layers, ...}}}`
- shape 字符串:`VERB~field1~field2~...``~` 分隔
- LIBfootprint placement下面挂 PAD/TEXT 用 `#@$` 分隔器嵌套
#### 已实现 verb 映射
**PCBdocType=3高保真对照实样**
| EPRO2 op | Std verb | 备注 |
|---|---|---|
| LINE | TRACK | layer 单独映射 |
| VIA | VIA | 字段顺序 `x~y~outerD~net~innerD~uuid~lock` |
| POUR | COPPERAREA | path 转成 SVG `M..L..Z` |
| FILL | SOLIDREGION | 同 SVG path |
| POLY (CIRCLE) | CIRCLE | |
| COMPONENT + FOOTPRINT.PADs | LIB...#@$PAD...#@$PAD... | 内层 PAD 坐标做了 placement rotate + translate |
**SCHdocType=1best-effort无实样**
- LINE → Wwire 段)
- LINE.lineGroup → WIRE.NET → 在端点放一个 Nnet flag
- COMPONENT → LIB...#@$P...(嵌 PIN/TEXT包括我们之前发现的 5-Voltage 电源占位符的 Global Net Name
- TEXT → T
**重要 caveat**:我们 corpus 里所有 Std 项目都只有 PCBdocType=3没有 SCHdocType=1实样。SCH 的 verb 字段顺序是按 EasyEDA Std 公开 spec 写的,**可能跟下游 parser 实际期望的字段顺序有出入**。下游同学 review 后给反馈,错的位移修一下就行。
### Layer 映射(重要,跟 KiCad 不一样)
EPRO2 跟 Std 的 layer id 不完全对齐:
- EPRO2 layer 5 (TOP_SOLDER_MASK) → Std 7
- EPRO2 layer 7 (TOP_PASTE_MASK) → Std 5 ← 跟 5 互换!
- EPRO2 layer 11 (OUTLINE) → Std 10 (BoardOutLine)
- EPRO2 layer 12 (MULTI) → Std 11 (Multi-Layer)
inner SIGNAL 层EPRO2 15+ → Std 21+ (Inner1 起步)。
### CLI 平铺输出
```
uv run python -m tools.epro2.std <project_dir> --all --out <dst>
```
输出按 Std 习惯**平铺**`<dst>/<doc_uuid>.json`,不分 board 子目录。三个互斥模式:`--all-pcb` / `--all-sch` / `--all`
### ESP-VoCat 实测
15 个 doc → 15 个 JSON
| 类型 | 数量 | 实测产物 |
|---|---:|---|
| PCBdocType=3| 6 | tracks 2K+, vias 700+, copperareas 19, libs 206, pads 807 |
| SCHdocType=1| 9 | wires 814, libs 477, netflags 838 (含 power-port), texts 71 |
`libs_unresolved=0` 全过——FOOTPRINT/SYMBOL doc 跨文档解析全部命中。
JSON 信封跟 Std 实样对比top-level keys 一致(`success/code/result``result``master/owner/created_at/...` 这些**爬取层 metadata**(不是数据本体,下游应该不需要);`dataStr.shape/layers/canvas/head` 全有。
### 决策Why
- **不替换 KiCad 那套**:用户说"原先那套页不要换"——保留 `tools/epro2/kicad/`,新写 `tools/epro2/std/` 平级,命令行也独立 `python -m tools.epro2.std` vs `python -m tools.epro2.kicad`
- **`json.dumps``separators=(",",":")` 不缩进**:实样 Std 文件就是单行紧凑 JSON没换行也没缩进节省空间也方便 diff。
- **数字格式 `_num()`**:实样 Std 输出整数不带 `.0``4303` 不是 `4303.0`),用 `math.isclose(f, int(f))` 判断后选择 int repr跟 Std 风格对齐。
### 测试
71 → 82 单测全过std_writers 11 个新(信封 / TRACK 字段顺序 / VIA 字段顺序 / COPPERAREA SVG path / LIB 嵌 PAD via `#@$` / docType=1 / W+N 配对 / power-port netflag / json.dumps round-trip
### 下游交付
15 个 ESP-VoCat JSON 已经在 `/tmp/std_json/`。要给下游同学的最小 deliverable
```
data/processed/std_json/<project_uuid>/<doc_uuid>.json
```
下一步:跑剩 4 块 Pro 项目X86主板 / 220V电源 / 泰山派 / 梁山派)—— Pro 2.x 那两块仍然不行,需要 Pro 2.x JSON 解析器。
---
## 2026-04-29 03:30 rate-limit benchmark 整理成正式报告
**Claude 会话**