Files
FacereDataset/tools/epro2/parser.py
Knowit 3c57e75d51 Add tools/epro2 — EPRO2 parser + replay prototype
为 Pro 3.x .epro2 工程源数据写解析骨架,下游做 EPRO2→KiCad 转换器
前的基础设施。在 ESP-VoCat (278 docs / 7.5 MB) + 220V 桌面电源
(771 docs / 26 MB) 端到端跑通,0 parse errors。

模块结构:
  tools/epro2/parser.py    单行 → Op:rstrip("|") + split("||") + json.loads
  tools/epro2/replay.py    state-machine:DOCHEAD 设头;其它 op 按 id 做
                           upsert(payload=None 当 delete);EDIT_HEAD/
                           META/CANVAS/PREFERENCE/PANELIZE 当 doc 级单
                           例存
  tools/epro2/__main__.py  CLI:传项目目录走 manifest.json 重放每个 doc,
                           按 docType 聚合输出 + 可选 --dump-doc 看单文
                           档详情
  tools/epro2/tests/       6 个单测 pin 死 trailing-pipe / 三段消息 /
                           id-only-no-payload / 嵌入管道符等坑

ESP-VoCat 输出示例:
  Documents: 278  (parse_errors=0)
   count  docType         objects        ops  deletes  untyped_ops
     105  SYMBOL             4124       4439        0            0
      88  DEVICE               88        264        0            0
      55  FOOTPRINT          4641       4855        0            0
       9  SCH_PAGE           7982       8167       42            0
       6  PCB                8428       8547       38            0
       6  BOARD                 9         18        0            0
       6  SCH                   9         26        0            0
       1  BLOB                  4          8        0            0
       1  FONT                 16         28        0            0
       1  CONFIG                2          3        0            0
  Top ops: ATTR 7035 / ELE_PLACEHOLDER 4225 / LINE 3005 / LAYER 2318 ...

PCB 文档单 dump 验证语义正确:META 含 title (PCB-EchoEar-CoreBoard-V1_0)
+ board 引用;CANVAS 含 origin/grid/unit (mm);LAYER 1/2/3 = TOP/BOTTOM/
TOP_SILK 配色齐全。

跑法:
  uv run python -m tools.epro2 data/raw/oshwhub/<project_uuid>
  uv run python -m tools.epro2 data/raw/oshwhub/<uuid> --dump-doc <doc_uuid>

下一步(不在本 commit):
1. 把对象间关系建起来(COMPONENT.partId → PART;LINE.lineGroup → WIRE;
   PAD_NET id → PAD + NET 三方关联)—— 当前 replay 只做扁平 dict
2. EPRO2 → KiCad 序列化层(Forge 投影硬门槛)
3. 在 Pro 3.x 三个项目做整体回归(X86 主板 7374 docs 可作压力测试)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 22:10:27 +08:00

90 lines
2.6 KiB
Python

"""EPRO2 line parser.
EPRO2 is EasyEDA Pro 3.x's event-sourced project source format. After AES-GCM
decryption + gunzip (handled by the crawler), each newline-separated line has
the shape:
{"type":"X","ticket":N,"id":"..."}||{payload JSON}||{optional extra}|
Field separator is ``||``; line terminator is a single trailing ``|`` (NOT a
field separator — easy to mis-parse, see docs/sources/easyeda_pro_source.md §3.1).
This module only does line-level parsing (raw → ``Op``). State semantics
(create / update / delete) live in ``replay.py``.
"""
from __future__ import annotations
import json
from dataclasses import dataclass
from pathlib import Path
from typing import Iterator
@dataclass(slots=True)
class Op:
"""A single EPRO2 message after raw parsing."""
type: str
ticket: int | None
id: str | None
payload: dict | None
extra: dict | None
raw: bytes # original line, for debugging / round-trip
class Epro2ParseError(ValueError):
"""Raised when a line cannot be parsed."""
def parse_line(ln: bytes) -> Op:
"""Parse one EPRO2 line. Raises ``Epro2ParseError`` on a malformed head."""
stripped = ln.strip().rstrip(b"|")
if not stripped:
raise Epro2ParseError("empty line")
parts = stripped.split(b"||")
try:
head = json.loads(parts[0])
except json.JSONDecodeError as e:
raise Epro2ParseError(
f"bad head JSON at byte {e.pos}: {parts[0][:160]!r}"
) from e
payload = _maybe_json(parts[1]) if len(parts) >= 2 else None
extra = _maybe_json(parts[2]) if len(parts) >= 3 else None
return Op(
type=str(head.get("type", "?")),
ticket=head.get("ticket"),
id=head.get("id"),
payload=payload if isinstance(payload, dict) else None,
extra=extra if isinstance(extra, dict) else None,
raw=ln,
)
def _maybe_json(b: bytes) -> object | None:
"""JSON-decode if non-empty; tolerate malformed payloads (return None)."""
if not b:
return None
try:
return json.loads(b)
except json.JSONDecodeError:
return None
def iter_ops(path: Path | str) -> Iterator[Op]:
"""Yield ``Op`` records from a ``.epro2`` file.
Lines that fail to parse are skipped; structural failures (file not found,
encoding error) propagate.
"""
p = Path(path)
with p.open("rb") as f:
for ln in f:
ln = ln.rstrip(b"\n")
if not ln.strip():
continue
try:
yield parse_line(ln)
except Epro2ParseError:
continue