tools/epro2: add Relations layer for cross-object navigation

在 replay 的扁平 objects[id] -> payload 之上盖一层 Relations,建索引和
反向引用,把孤立对象拼成可遍历的图,是后续 EPRO2 → KiCad 转换器的
中间表示前置。

Relations.build(doc) 单遍扫所有对象,得到:

主集合(按类型分桶):
  parts / components / pins / pads / wires / nets / layers / rules

复合 ID 解析(关键):
  '["LAYER",1]'                          → layers[1]
  '["NET","GND"]'                        → nets["GND"]
  '["PAD_NET","e0","1","e7"]'            → pad_nets_by_pad/by_net
  '["RULE","SAFE","copperThickness1oz"]' → rules[("RULE","SAFE",...)]

反向引用:
  obj_ids_by_part         partId            → 引用对象 ids(lib 内 RECT/TEXT/PIN 都带 partId)
  components_by_part      partId            → component ids
  attrs_by_parent         parentId          → ATTR ids
  lines_by_wire           WIRE.id           → LINE ids(wire 由若干 LINE 段组成)
  pad_nets_by_pad         PAD.id            → PAD_NET 记录
  pad_nets_by_net         net name          → PAD_NET 记录
  objects_on_layer / objects_in_net  字段反查

便捷 accessor:
  attrs_dict(parent_id)   折叠所有 ATTR ops 到 {key: value} dict(last
                          write wins),KiCad 转换时按 component 拿
                          Designator/Value/Footprint 的常用入口

ATTR.parentId 解析(实测发现的两种坑):
1. 不仅指向 COMPONENT/PART —— 也大量指向 WIRE(schematic 上的网络
   标签 / 网络属性)。原查重函数漏算,636 个 false positive
   unresolved;改为"任意 doc.objects[parentId] 命中即算 resolved"
2. 复合形式 `<comp_id>-<pin_id>` 用于把 ATTR 挂在某 component 的某个
   pin 上(如 PinName)。`_resolve_parent()` 用 split("-",1) 兜底

CLI 加 --relations,按 docType 聚合 stats:
  uv run python -m tools.epro2 data/raw/oshwhub/<uuid> --relations

ESP-VoCat 验证:
  SCH_PAGE 9 docs : 572 components, 563 wires, 934 lines_grouped,
                    4111 attrs_attached, 0 unresolved_parents
  PCB      6 docs : 206 components, 807 pad_nets, 173 nets, 544 layers
  SYMBOL 105 docs : 106 parts, 560 pins, 1680 attrs_attached
  FOOTPRINT 55 docs: 496 pads, 9 nets, 1771 layers, 140 rules

注:PCB 内 pads=6 vs pad_nets=807 不矛盾 —— PAD 实例存在 FOOTPRINT
文档里,PCB stream 用 ["PAD_NET",comp,pin,pad] 复合 id 跨文档引用;
解析"comp 的某 pin 通过哪个 footprint 的哪个 pad"需要 project-级
Relations 聚合(下个 task)。

测试:tools/epro2/tests/test_relations.py 9 个单测覆盖复合 id 解析、
lineGroup 链接、parentId 直/复合解析、partId 反查、attrs 折叠。
parser + relations 共 15/15 通过。

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-28 22:17:28 +08:00
parent 3c57e75d51
commit 7f9e2fad73
4 changed files with 448 additions and 0 deletions

271
tools/epro2/relations.py Normal file
View File

@@ -0,0 +1,271 @@
"""Build cross-object relationship indices from a replayed Document.
After ``replay.Document`` flattens the EPRO2 stream into ``objects[id] -> payload``,
this module walks those payloads to build the secondary indices needed for
downstream translation (KiCad export, graph extraction, etc).
Relationships modeled (empirically — see docs/sources/easyeda_pro_source.md §3
+ probe results 2026-04-28 on ESP-VoCat):
PART --(id, dotted name)--> primitives via primitive.partId (lib/parts)
COMPONENT --(.partId)--> PART (sch) or footprint via ATTR (pcb)
ATTR --(.parentId)--> COMPONENT or PART (key/value annotations)
LINE --(.lineGroup)--> WIRE (sch wire segments)
PAD_NET[id=["PAD_NET",comp,pin,pad]] --(.padNet)--> NET[id=["NET",name]]
any obj --(.layerId)--> LAYER[id=["LAYER",N]] (pcb)
any obj --(.netName)--> NET (pcb)
Composite IDs (e.g. ``'["LAYER",1]'``) are emitted by the editor as JSON
serialized arrays. We parse them lazily — see ``parse_composite_id``.
"""
from __future__ import annotations
import json
from collections import defaultdict
from dataclasses import dataclass, field
from typing import Any
from .replay import Document
def parse_composite_id(s: str) -> list | None:
"""Best-effort decode an id field that's a serialized JSON array.
Returns the list if the string looks like JSON array, else None.
"""
if not isinstance(s, str) or not s.startswith("["):
return None
try:
v = json.loads(s)
except json.JSONDecodeError:
return None
return v if isinstance(v, list) else None
def _resolve_parent(parent_id: str, doc: Document) -> bool:
"""Check whether ``parent_id`` references something we know about.
Accepts:
- direct hit on ``doc.objects`` (any _type — COMPONENT/WIRE/PART/PAD/PIN/...)
- compound ``<a>-<b>`` where ``<a>`` resolves to a doc object
(used for "component+pin" addressing in schematic ATTR ops)
"""
if parent_id in doc.objects:
return True
if "-" in parent_id:
head = parent_id.split("-", 1)[0]
if head in doc.objects:
return True
return False
@dataclass
class Relations:
"""Indices built from one ``Document``. Cheap to (re)build.
Lookup conventions:
- "by_id" maps a primitive's id to its payload.
- "by_<key>" maps the value at <key> to a list of object ids referencing it.
- composite-keyed maps use the parsed tuple as key (e.g. layer int).
"""
doc: Document
# Primitive collections by type ----------------------------------------
parts: dict[str, dict] = field(default_factory=dict) # PART.id (dotted) → payload
components: dict[str, dict] = field(default_factory=dict) # COMPONENT.id → payload
pins: dict[str, dict] = field(default_factory=dict) # PIN.id → payload
pads: dict[str, dict] = field(default_factory=dict) # PAD.id → payload
wires: dict[str, dict] = field(default_factory=dict) # WIRE.id → payload
nets: dict[str, dict] = field(default_factory=dict) # NET name → payload
layers: dict[int, dict] = field(default_factory=dict) # LAYER int → payload
rules: dict[tuple, dict] = field(default_factory=dict) # ("RULE", ...) tuple → payload
# Cross-references -----------------------------------------------------
obj_ids_by_part: dict[str, list[str]] = field(default_factory=lambda: defaultdict(list))
"""partId (dotted name OR `pid...` prefix) → object ids referencing it."""
components_by_part: dict[str, list[str]] = field(default_factory=lambda: defaultdict(list))
"""partId → component ids whose COMPONENT.partId == this."""
attrs_by_parent: dict[str, list[str]] = field(default_factory=lambda: defaultdict(list))
"""parentId → ATTR ids attached."""
lines_by_wire: dict[str, list[str]] = field(default_factory=lambda: defaultdict(list))
"""WIRE.id → LINE ids whose lineGroup == this."""
pad_nets_by_pad: dict[str, list[dict]] = field(default_factory=lambda: defaultdict(list))
"""PAD.id → [{comp, pin, net_name, padNet_payload}, ...]."""
pad_nets_by_net: dict[str, list[dict]] = field(default_factory=lambda: defaultdict(list))
"""net_name (from PAD_NET.padNet) → [{comp, pin, pad}, ...]."""
objects_on_layer: dict[int, list[str]] = field(default_factory=lambda: defaultdict(list))
"""layer int → object ids whose payload.layerId == this."""
objects_in_net: dict[str, list[str]] = field(default_factory=lambda: defaultdict(list))
"""net name (payload.netName) → object ids."""
# Diagnostics ----------------------------------------------------------
unresolved_parents: int = 0 # ATTR.parentId points to nothing in components/parts/pads
unresolved_wires: int = 0 # LINE.lineGroup points to nothing in wires
unresolved_layers: int = 0 # payload.layerId points to nothing in layers (pcb only)
bad_composite_ids: int = 0
# ----------------------------------------------------------------------
@classmethod
def build(cls, doc: Document) -> "Relations":
rel = cls(doc=doc)
# First pass: bucket primitives by type, parse composite ids.
for obj_id, payload in doc.objects.items():
t = payload.get("_type")
if t == "PART":
# PART payload uses head.id as its key (e.g. "0.96_inch_lcd.1").
# In our replay, doc.objects[obj_id] has _type=PART; obj_id IS the part id.
rel.parts[obj_id] = payload
elif t == "COMPONENT":
rel.components[obj_id] = payload
if part_ref := payload.get("partId"):
rel.components_by_part[str(part_ref)].append(obj_id)
elif t == "PIN":
rel.pins[obj_id] = payload
elif t == "PAD":
rel.pads[obj_id] = payload
elif t == "WIRE":
rel.wires[obj_id] = payload
elif t == "NET":
# NET id is `["NET", "<name>"]`
comp = parse_composite_id(obj_id)
if comp and len(comp) >= 2 and comp[0] == "NET":
rel.nets[str(comp[1])] = payload
else:
rel.bad_composite_ids += 1
elif t == "LAYER":
# LAYER id is `["LAYER", <int>]`
comp = parse_composite_id(obj_id)
if comp and len(comp) >= 2 and comp[0] == "LAYER":
try:
rel.layers[int(comp[1])] = payload
except (TypeError, ValueError):
rel.bad_composite_ids += 1
else:
rel.bad_composite_ids += 1
elif t == "RULE":
comp = parse_composite_id(obj_id)
if comp and comp[0] == "RULE":
rel.rules[tuple(comp)] = payload
else:
rel.bad_composite_ids += 1
elif t == "PAD_NET":
# id is `["PAD_NET", <comp_id>, <pin_num>, <pad_id>]`
# payload.padNet = "<net name>"
comp = parse_composite_id(obj_id)
if comp and len(comp) >= 4 and comp[0] == "PAD_NET":
_, c_id, pin_num, pad_id = comp[0], str(comp[1]), str(comp[2]), str(comp[3])
net_name = payload.get("padNet")
record = {
"comp": c_id,
"pin": pin_num,
"pad": pad_id,
"net_name": net_name,
"payload": payload,
}
rel.pad_nets_by_pad[pad_id].append(record)
if net_name:
rel.pad_nets_by_net[str(net_name)].append(record)
else:
rel.bad_composite_ids += 1
# Second pass: cross-references that need full primitive maps available.
for obj_id, payload in doc.objects.items():
t = payload.get("_type")
# partId fan-in (not just COMPONENTs — RECT/TEXT/PIN inside SYMBOL/FOOTPRINT
# all carry partId pointing at their containing PART)
if (part_ref := payload.get("partId")) and t != "COMPONENT":
rel.obj_ids_by_part[str(part_ref)].append(obj_id)
# ATTR → parent. parentId may target any addressable object in the doc
# (COMPONENT / WIRE / PART / PAD / PIN), or a compound `<a>-<b>` form
# where <a> is a component and <b> is its pin/sub-ref.
if t == "ATTR":
if parent := payload.get("parentId"):
parent_str = str(parent)
rel.attrs_by_parent[parent_str].append(obj_id)
if not _resolve_parent(parent_str, doc):
rel.unresolved_parents += 1
# LINE → wire
if t == "LINE":
if wire_ref := payload.get("lineGroup"):
rel.lines_by_wire[str(wire_ref)].append(obj_id)
if wire_ref not in rel.wires:
rel.unresolved_wires += 1
# any obj on layer
if (lid := payload.get("layerId")) is not None:
try:
lid_int = int(lid)
rel.objects_on_layer[lid_int].append(obj_id)
if lid_int not in rel.layers:
rel.unresolved_layers += 1
except (TypeError, ValueError):
pass
# any obj in net
if net_name := payload.get("netName"):
rel.objects_in_net[str(net_name)].append(obj_id)
return rel
# Accessor helpers -----------------------------------------------------
def part_for_component(self, comp_id: str) -> dict | None:
"""Return the PART payload for a COMPONENT, if resolvable.
In schematic context, COMPONENT.partId is a `pid...` prefix string that
does NOT match PART.id directly — the editor resolves it via library
cache. We try a best-effort match on the raw partId; callers handle None.
"""
comp = self.components.get(comp_id)
if not comp:
return None
return self.parts.get(str(comp.get("partId", "")))
def attrs_dict(self, parent_id: str) -> dict[str, Any]:
"""Convenience: collapse all ATTR ops with parentId == ``parent_id`` into a
flat ``{key: value}`` dict. Last write wins on duplicate keys.
"""
out: dict[str, Any] = {}
for attr_id in self.attrs_by_parent.get(parent_id, []):
payload = self.doc.objects.get(attr_id) or {}
k = payload.get("key")
if k is not None:
out[str(k)] = payload.get("value")
return out
def summary(self) -> dict[str, int]:
"""Stats for CLI / tests / sanity checks."""
return {
"parts": len(self.parts),
"components": len(self.components),
"pins": len(self.pins),
"pads": len(self.pads),
"wires": len(self.wires),
"nets": len(self.nets),
"layers": len(self.layers),
"rules": len(self.rules),
"lines_grouped": sum(len(v) for v in self.lines_by_wire.values()),
"attrs_attached": sum(len(v) for v in self.attrs_by_parent.values()),
"pad_nets": sum(len(v) for v in self.pad_nets_by_pad.values()),
"objects_on_layer": sum(len(v) for v in self.objects_on_layer.values()),
"objects_in_net": sum(len(v) for v in self.objects_in_net.values()),
"unresolved_parents": self.unresolved_parents,
"unresolved_wires": self.unresolved_wires,
"unresolved_layers": self.unresolved_layers,
"bad_composite_ids": self.bad_composite_ids,
}