tools/epro2: add std/ writer — EPRO2 → EasyEDA Std-format JSON for downstream

The downstream colleague consumes oshwhub Std (lceda) dict-format JSON, not KiCad. The EPRO2 decryption part (per-doc plaintext .epro2 streams in data/raw/<uuid>/source/) is what we already provide; the missing piece is converting EPRO2 op-streams into the same `dataStr.shape` tilde-delimited format their parser already speaks. New tools/epro2/std/ module, peer of tools/epro2/kicad/, kept deliberately separate so the KiCad path stays untouched: - pcb_writer.write_pcb_std() — high-fidelity, validated against a Std PCB sample at data/raw/oshwhub/3e2f893d.../25931ddab8.json. Maps LINE→TRACK, VIA→VIA, POUR→COPPERAREA (with SVG `M..L..Z` path), POLY→CIRCLE/SOLIDREGION, COMPONENT+FOOTPRINT→LIB nested with #@$-separated PADs (placement rotation + translate applied so pad coords land at PCB-absolute positions). Layer-id mapping (EPRO2 5↔7 flipped vs Std solder/paste, 11→10 outline, 12→11 multi, SIGNAL inner 15+ → Std 21+) noted inline. - sch_writer.write_sch_std() — best-effort. Our corpus has zero Std schematic samples (docType=1) so verb field orders follow the EasyEDA Std public spec, not direct observation. Emits W (wire), N (net flag, including the 5-Voltage Global Net Name power-port pattern), T (text), LIB (placement with #@$-nested PIN/T). If downstream's parser bails the fix is almost certainly a positional field tweak, not a re-architecture. - __main__.py — flat output `<doc_uuid>.json` per doc directly under --out (mirrors Std's own data layout); --all-pcb / --all-sch / --all. Smoke test on ESP-VoCat: 6 PCB + 9 SCH = 15 JSON files, libs_unresolved=0 across the board. Compact JSON (separators=(",",":")) matches Std's single-line format. Numbers use _num() — integers without trailing .0, floats trimmed. 71 → 82 unit tests pass. Open questions for downstream: (1) confirm SCH verb field orders, (2) do they want any of the upstream metadata fields we drop (master, owner, created_at, etc — those live on the crawler side, not the schematic itself)? Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 01:16:39 +08:00
parent ed713fa557
commit fe6971f3f9
6 changed files with 1155 additions and 0 deletions
--- a/log.md
+++ b/log.md
@@ -4,6 +4,90 @@

 ---

+## 2026-04-29 04:00  Std-format JSON 转换器：EPRO2 → 下游同学 Wokwi pipeline 的输入格式
+
+**Claude 会话**
+
+KiCad 那条路下游同学不需要——他们的 Wokwi pipeline 吃 oshwhub Std (lceda) 的 JSON dict-format。EPRO2 解密我们已经搞定（per-doc 流就在 `data/raw/<uuid>/source/`），现在缺的是把 EPRO2 op-stream 翻成 Std 的 `dataStr.shape` 字符串数组。
+
+### 新增 `tools/epro2/std/`（跟 `kicad/` 平级，旧的不动）
+
+参照 `data/raw/oshwhub/3e2f893d.../25931ddab8.json` 一个 Std PCB 实样反推协议：
+- 信封：`{success, code, result: {uuid, puuid, title, docType, components, dataStr: {head, canvas, shape, layers, ...}}}`
+- shape 字符串：`VERB~field1~field2~...`，`~` 分隔
+- LIB（footprint placement）下面挂 PAD/TEXT 用 `#@$` 分隔器嵌套
+
+#### 已实现 verb 映射
+
+**PCB（docType=3，高保真，对照实样）**：
+| EPRO2 op | Std verb | 备注 |
+|---|---|---|
+| LINE | TRACK | layer 单独映射 |
+| VIA | VIA | 字段顺序 `x~y~outerD~net~innerD~uuid~lock` |
+| POUR | COPPERAREA | path 转成 SVG `M..L..Z` |
+| FILL | SOLIDREGION | 同 SVG path |
+| POLY (CIRCLE) | CIRCLE | |
+| COMPONENT + FOOTPRINT.PADs | LIB...#@$PAD...#@$PAD... | 内层 PAD 坐标做了 placement rotate + translate |
+
+**SCH（docType=1，best-effort，无实样）**：
+- LINE → W（wire 段）
+- LINE.lineGroup → WIRE.NET → 在端点放一个 N（net flag）
+- COMPONENT → LIB...#@$P...（嵌 PIN/TEXT，包括我们之前发现的 5-Voltage 电源占位符的 Global Net Name）
+- TEXT → T
+
+**重要 caveat**：我们 corpus 里所有 Std 项目都只有 PCB（docType=3），没有 SCH（docType=1）实样。SCH 的 verb 字段顺序是按 EasyEDA Std 公开 spec 写的，**可能跟下游 parser 实际期望的字段顺序有出入**。下游同学 review 后给反馈，错的位移修一下就行。
+
+### Layer 映射（重要，跟 KiCad 不一样）
+
+EPRO2 跟 Std 的 layer id 不完全对齐：
+- EPRO2 layer 5 (TOP_SOLDER_MASK) → Std 7
+- EPRO2 layer 7 (TOP_PASTE_MASK) → Std 5  ← 跟 5 互换！
+- EPRO2 layer 11 (OUTLINE) → Std 10 (BoardOutLine)
+- EPRO2 layer 12 (MULTI) → Std 11 (Multi-Layer)
+
+inner SIGNAL 层：EPRO2 15+ → Std 21+ (Inner1 起步)。
+
+### CLI 平铺输出
+
+```
+uv run python -m tools.epro2.std <project_dir> --all --out <dst>
+```
+
+输出按 Std 习惯**平铺**：`<dst>/<doc_uuid>.json`，不分 board 子目录。三个互斥模式：`--all-pcb` / `--all-sch` / `--all`。
+
+### ESP-VoCat 实测
+
+15 个 doc → 15 个 JSON：
+| 类型 | 数量 | 实测产物 |
+|---|---:|---|
+| PCB（docType=3）| 6 | tracks 2K+, vias 700+, copperareas 19, libs 206, pads 807 |
+| SCH（docType=1）| 9 | wires 814, libs 477, netflags 838 (含 power-port), texts 71 |
+
+`libs_unresolved=0` 全过——FOOTPRINT/SYMBOL doc 跨文档解析全部命中。
+
+JSON 信封跟 Std 实样对比：top-level keys 一致（`success/code/result`）；`result` 缺 `master/owner/created_at/...` 这些**爬取层 metadata**（不是数据本体，下游应该不需要）；`dataStr.shape/layers/canvas/head` 全有。
+
+### 决策（Why）
+
+- **不替换 KiCad 那套**：用户说"原先那套页不要换"——保留 `tools/epro2/kicad/`，新写 `tools/epro2/std/` 平级，命令行也独立 `python -m tools.epro2.std` vs `python -m tools.epro2.kicad`。
+- **`json.dumps` 用 `separators=(",",":")` 不缩进**：实样 Std 文件就是单行紧凑 JSON，没换行也没缩进，节省空间也方便 diff。
+- **数字格式 `_num()`**：实样 Std 输出整数不带 `.0`（`4303` 不是 `4303.0`），用 `math.isclose(f, int(f))` 判断后选择 int repr，跟 Std 风格对齐。
+
+### 测试
+
+71 → 82 单测全过：std_writers 11 个新（信封 / TRACK 字段顺序 / VIA 字段顺序 / COPPERAREA SVG path / LIB 嵌 PAD via `#@$` / docType=1 / W+N 配对 / power-port netflag / json.dumps round-trip）。
+
+### 下游交付
+
+15 个 ESP-VoCat JSON 已经在 `/tmp/std_json/`。要给下游同学的最小 deliverable：
+```
+data/processed/std_json/<project_uuid>/<doc_uuid>.json
+```
+
+下一步：跑剩 4 块 Pro 项目（X86主板 / 220V电源 / 泰山派 / 梁山派）—— Pro 2.x 那两块仍然不行，需要 Pro 2.x JSON 解析器。
+
+---
+
 ## 2026-04-29 03:30  rate-limit benchmark 整理成正式报告

 **Claude 会话**