tools/epro2/std: add Pro 2.x JSON path — Liangshan + Taishan SCH now exportable

The downstream colleague's "encrypted_external" / "string old format"
projects were Pro 2.x, not Pro 3.x EPRO2. Pro 2.x ships each doc as a
JSON file whose `dataStr` is a plaintext op-stream — one JSON array per
line, e.g. `["COMPONENT","e1","",0,0,0,0,{},0]`. Different wire format
from EPRO2's binary tilde/pipe streams; same Std envelope works for
output.

  - tools/epro2/std/pro2_writer.py: parses dataStr line-by-line, keys
    objects by id (position 1 for most ops, OPTYPE for singletons),
    extracts BBox by walking known coord positions per OPTYPE, derives
    layers from LAYER ops directly (Pro 2.x almost matches Std layer
    string format already). PCB blobs that are encrypted-external
    (`dataStrId` URL + `iv` + `key`, no inline dataStr — Taishan PCB)
    return None so the CLI skips with a message instead of stubbing.

  - tools/epro2/std/__main__.py: auto-detect via manifest's
    editor_version. "2.x" → Pro 2.x writer; otherwise the existing
    EPRO2 replay path. CLI surface and output layout unchanged.

  - docs/sources/epro2_to_std_mapping.md: adds a Pro 2.x section.
    Adapter dispatches on `head.epro_format`: absent / "epro2" gets
    dict-shaped objects values, "pro2" gets array-shaped values
    (`[OPTYPE, arg1, ...]`). Lists the Pro 2.x-specific OPTYPEs
    (FONTSTYLE / LINESTYLE / CONNECT / OBJ / REGION / DIMENSION /
    STRING / TEARDROP) the EPRO2 vocabulary doesn't have.

Smoke (re-running --all on all 5 Pro projects): 191 → 222 JSON files.
Liangshan adds 3 (2 SCH + inline 5357-object PCB). Taishan adds 28
(SCH only — PCB skipped, encrypted-external; source/<uuid>.json still
keeps the dataStrId/iv/key for a later fetch+decrypt pass).

84 → 86 unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-29 02:00:37 +08:00
parent 3866e24189
commit 3720cd176a
4 changed files with 440 additions and 3 deletions

View File

@@ -0,0 +1,243 @@
"""Convert one EasyEDA Pro 2.x JSON document → an Option-2 Std-shaped JSON.
Pro 2.x stores each document as a JSON file whose ``dataStr`` field is a
**plaintext op-stream** — one JSON array per line, e.g.::
["DOCTYPE","SCH","1.1"]
["HEAD",{"originX":0,"originY":0,"version":"2.1.39","maxId":4639}]
["COMPONENT","e1","",0,0,0,0,{},0]
["ATTR","e18","e1","Symbol","6d31...",0,0,2506,-116,0,"st1",0]
...
This is **not** the same wire format as Pro 3.x EPRO2 (which is a binary
op-stream with tilde/pipe delimiters, decrypted from an AES-GCM blob);
the on-disk structure is closer to lceda Std but the op vocabulary
includes Pro 2.x extras (FONTSTYLE / LINESTYLE / CONNECT / OBJ /
REGION / DIMENSION / STRING / TEARDROP) the downstream adapter has to
dispatch on.
We don't translate the ops to a normalised dict like ``replay.Document``
does for EPRO2 — the field positions per OPTYPE are spec-defined for
Pro 2.x and the adapter already has to walk them by index. Instead we
emit the raw op arrays into ``dataStr.objects``, keyed by id (position 1
for most ops, OPTYPE for singletons), and tag the head with
``head.epro_format = "pro2"`` so the adapter can branch on it.
PCB docs whose payload is encrypted-external (``dataStrId`` / ``iv`` /
``key`` instead of inline ``dataStr``) are skipped — fetching from
modules.lceda.cn + AES-decrypt is out of this writer's scope.
"""
from __future__ import annotations
import json
from dataclasses import dataclass
from pathlib import Path
# Pro 2.x ops that carry no addressable id (one per doc) — keyed by their
# OPTYPE in our objects dict. The rest of the op list shows up as
# id-keyed entries (id = position 1 of the op array).
_SINGLETON_OPTYPES: set[str] = {
"DOCTYPE", "HEAD", "CANVAS", "ACTIVE_LAYER", "PREFERENCE",
"SILK_OPTS", "PANELIZE", "PANELIZE_STAMP", "PANELIZE_SIDE",
"PRIMITIVE",
}
@dataclass
class WriteStats:
objects: int = 0
bbox_x: float = 0.0
bbox_y: float = 0.0
bbox_w: float = 0.0
bbox_h: float = 0.0
skipped_encrypted: bool = False
def _parse_datastr(s: str) -> dict[str, list]:
"""Parse a Pro 2.x op-stream into ``{id: [OPTYPE, args...]}``.
Lines that fail to parse are skipped silently — we mirror the EPRO2
replay's tolerance for partial / malformed sources, since some
real-world docs have stray whitespace or trailing commas.
"""
objects: dict[str, list] = {}
auto_id = 0
for line in s.splitlines():
line = line.strip()
if not line:
continue
try:
arr = json.loads(line)
except json.JSONDecodeError:
continue
if not isinstance(arr, list) or not arr:
continue
optype = arr[0]
if not isinstance(optype, str):
continue
if optype in _SINGLETON_OPTYPES:
obj_id = optype
elif len(arr) > 1 and isinstance(arr[1], (str, int)) and arr[1] != "":
# Most id-keyed ops have a string id ("e<n>") at position 1;
# a handful use ints (LAYER, RULE_SELECTOR). Convert ints to
# qualified strings so they don't collide with string ids.
raw = arr[1]
obj_id = raw if isinstance(raw, str) else f"{optype}_{raw}"
else:
# Defensive fallback — shouldn't fire on healthy docs but
# keeps unknown shapes addressable instead of dropped.
auto_id += 1
obj_id = f"_anon_{optype}_{auto_id}"
# Last write wins on duplicate id — matches EPRO2 replay semantics.
objects[obj_id] = arr
return objects
def _gather_bbox_from_objects(objects: dict[str, list]) -> tuple[float, float, float, float]:
"""Best-effort BBox by scanning known coord positions across Pro 2.x op arrays.
Different OPTYPEs put coords at different positions; we cover the
common ones (LINE / WIRE / VIA / RECT / TEXT / COMPONENT) — anything
we miss just makes the BBox loose, not wrong.
"""
xs: list[float] = []
ys: list[float] = []
# OPTYPE → list of (x_idx, y_idx) pairs in the op array.
coord_positions: dict[str, list[tuple[int, int]]] = {
# ["LINE", id, layer, x1, y1, x2, y2, ...]
"LINE": [(3, 4), (5, 6)],
# ["WIRE", id, x1, y1, x2, y2, ...] (sch wire)
"WIRE": [(2, 3), (4, 5)],
# ["VIA", id, layer, x, y, outerD, innerD, ...]
"VIA": [(3, 4)],
# ["RECT", id, x, y, w, h, ...]
"RECT": [(2, 3)],
# ["TEXT", id, ?, x, y, ...] — best-effort
"TEXT": [(3, 4)],
# ["COMPONENT", id, ..., x, y, rot, ...] — position varies; field
# order has stayed consistent through 2.x but not bullet-proof
"COMPONENT": [(3, 4)],
# ["ARC", id, layer, cx, cy, ...]
"ARC": [(3, 4)],
}
for arr in objects.values():
if not isinstance(arr, list) or len(arr) < 3:
continue
optype = arr[0]
for xi, yi in coord_positions.get(optype, []):
if xi >= len(arr) or yi >= len(arr):
continue
try:
xs.append(float(arr[xi]))
ys.append(float(arr[yi]))
except (TypeError, ValueError):
pass
if not xs:
return (0.0, 0.0, 0.0, 0.0)
return (min(xs), min(ys), max(xs) - min(xs), max(ys) - min(ys))
def _layers_from_objects(objects: dict[str, list]) -> list[str]:
"""Pro 2.x emits LAYER ops that already match the Std layer-string
format almost 1:1; we just join the array elements with ``~``."""
layers: list[str] = []
for arr in objects.values():
if not isinstance(arr, list) or len(arr) < 2 or arr[0] != "LAYER":
continue
# ["LAYER", id, type, name, attr1, color1, alpha1, color2, alpha2]
# → "id~name~color~visible~active~locked~" (we coerce to Std style)
try:
lid = arr[1]
ltype = arr[2] if len(arr) > 2 else ""
lname = arr[3] if len(arr) > 3 else f"Layer{lid}"
color = arr[5] if len(arr) > 5 else "#000000"
# use/show/locked aren't in the same positions as Std's; we
# keep stable defaults that downstream's adapter can override.
layers.append(f"{lid}~{lname}~{color}~true~false~true~")
except (TypeError, IndexError):
continue
return layers
def write_pro2_doc(
json_path: Path,
*,
project_uuid: str = "",
editor_version_hint: str = "",
) -> dict | None:
"""Read a Pro 2.x JSON file → Option-2 Std envelope.
Returns ``None`` for docs we can't decode (e.g. encrypted-external
PCBs that store a ``dataStrId`` URL and AES iv/key instead of inline
``dataStr``); caller treats that as "skip with stats bump".
"""
raw = json.loads(json_path.read_text(encoding="utf-8"))
doc_uuid = raw.get("uuid") or json_path.stem
title = raw.get("title") or raw.get("display_title") or doc_uuid[:12]
doc_type_int = raw.get("docType")
# Encrypted-external: PCB blob lives at modules.lceda.cn keyed by
# dataStrId, AES-decrypt with the iv+key fields. We don't fetch.
if "dataStr" not in raw and ("dataStrId" in raw or "iv" in raw):
write_pro2_doc.last_stats = WriteStats(skipped_encrypted=True) # type: ignore[attr-defined]
return None
s = raw.get("dataStr")
if not isinstance(s, str):
return None
objects = _parse_datastr(s)
bbox_x, bbox_y, bbox_w, bbox_h = _gather_bbox_from_objects(objects)
layers = _layers_from_objects(objects)
# Pull the Pro 2.x editor version out of the HEAD op if present —
# finer-grained than the manifest's top-level editor_version (which
# is the project's, not the doc's).
head_op = objects.get("HEAD")
pro2_version = ""
if isinstance(head_op, list) and len(head_op) > 1 and isinstance(head_op[1], dict):
pro2_version = head_op[1].get("version", "")
result = {
"uuid": doc_uuid,
"puuid": project_uuid,
"title": title,
"description": raw.get("description", ""),
"docType": doc_type_int,
"components": {},
"dataStr": {
"head": {
"docType": str(doc_type_int) if doc_type_int is not None else "",
"editorVersion": (
f"facere-pro2/0.1 (lceda {pro2_version or editor_version_hint})"
),
"units": "mil",
"epro_format": "pro2",
"pro2_doc_uuid": doc_uuid,
"pro2_editor_version": pro2_version,
},
"BBox": {
"x": bbox_x,
"y": bbox_y,
"width": bbox_w,
"height": bbox_h,
},
"layers": layers,
"objects": objects,
"preference": {},
"netColors": [],
"DRCRULE": {},
},
}
write_pro2_doc.last_stats = WriteStats( # type: ignore[attr-defined]
objects=len(objects),
bbox_x=bbox_x, bbox_y=bbox_y, bbox_w=bbox_w, bbox_h=bbox_h,
)
return {"success": True, "code": 0, "result": result}