The downstream colleague consumes oshwhub Std (lceda) dict-format JSON,
not KiCad. The EPRO2 decryption part (per-doc plaintext .epro2 streams
in data/raw/<uuid>/source/) is what we already provide; the missing
piece is converting EPRO2 op-streams into the same `dataStr.shape`
tilde-delimited format their parser already speaks.
New tools/epro2/std/ module, peer of tools/epro2/kicad/, kept
deliberately separate so the KiCad path stays untouched:
- pcb_writer.write_pcb_std() — high-fidelity, validated against a Std
PCB sample at data/raw/oshwhub/3e2f893d.../25931ddab8.json. Maps
LINE→TRACK, VIA→VIA, POUR→COPPERAREA (with SVG `M..L..Z` path),
POLY→CIRCLE/SOLIDREGION, COMPONENT+FOOTPRINT→LIB nested with
#@$-separated PADs (placement rotation + translate applied so pad
coords land at PCB-absolute positions). Layer-id mapping (EPRO2 5↔7
flipped vs Std solder/paste, 11→10 outline, 12→11 multi, SIGNAL
inner 15+ → Std 21+) noted inline.
- sch_writer.write_sch_std() — best-effort. Our corpus has zero Std
schematic samples (docType=1) so verb field orders follow the
EasyEDA Std public spec, not direct observation. Emits W (wire),
N (net flag, including the 5-Voltage Global Net Name power-port
pattern), T (text), LIB (placement with #@$-nested PIN/T). If
downstream's parser bails the fix is almost certainly a positional
field tweak, not a re-architecture.
- __main__.py — flat output `<doc_uuid>.json` per doc directly under
--out (mirrors Std's own data layout); --all-pcb / --all-sch / --all.
Smoke test on ESP-VoCat: 6 PCB + 9 SCH = 15 JSON files, libs_unresolved=0
across the board. Compact JSON (separators=(",",":")) matches Std's
single-line format. Numbers use _num() — integers without trailing .0,
floats trimmed.
71 → 82 unit tests pass.
Open questions for downstream: (1) confirm SCH verb field orders, (2)
do they want any of the upstream metadata fields we drop (master,
owner, created_at, etc — those live on the crawler side, not the
schematic itself)?
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The doc had been growing incrementally as each host got probed; reshape
it as a polished benchmark with TL;DR top, methodology section
(including safety constraints + caveats), per-host detailed tables,
final crawler settings, batch-50 walltime breakdown, and a reproduce
recipe.
Five hosts fully covered:
pro.lceda.cn API 5.0s -> 0.5s (10×)
lceda.cn doc 5.0s -> 0.5s (10×)
oshwhub detail 2.0s -> 1.0s ( 2×)
oshwhub listing 2.0s -> 1.0s ( 2×)
modules.lceda CDN 0.2s (already optimized)
Net effect on batch-50 plan: sleep total ~32min -> ~3min, walltime
~2h -> ~10-15min.
Key finding: the original 5s/req on Pro was set out of "logged-in
account is precious" caution with zero empirical evidence. Sustained
burst probe (25 distinct UUIDs at 0.5s, no recovery) showed 0/25 errors
and median latency 410ms — the caution was unjustified.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Killed at the 14-min mark — VmRSS 1.96 GB + VmSwap 1.41 GB on a 3.3 GB
RAM box with 4 GB swap (3.6 GB used), read_bytes 24 GB (pure swap thrash),
process state D (uninterruptible disk sleep). The CPU board PCB doc
(8K+ objects, 35+ child schematic pages) overflowed our current
all-in-memory build pattern: pcb_writer builds the full output list
before to_sexpr serializes once at the end, plus the 35 write_sch_page
calls each build their own Relations + lib_symbols dedup state.
Saved what finished: 4/5 X86 boards complete (Sch-CAM-IMX415,
Schematic1, SCHEMATIC1, Sch-VTX-SSC338Q), the CPU board SCHEMATIC1_1
has all its 35 child .kicad_sch but no .kicad_pcb. Final downstream
delivery: 17 board projects across the 3 supported Pro projects, 32/32
files pass kicad-cli (sch erc + pcb export svg).
Streaming-write fix is the next logical follow-up but out of scope
for this turn.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures the two new --all crash paths fixed in 61fd3ff (odd inner
copper layers, duplicate BOARD titles) plus the Pro 2.x scope gap
(Taishan + Liangshan are JSON-format, not EPRO2 streams, so our
replay_project reads the bytes but doc_type stays None and
_group_by_board returns no SCH/PCB groupings — needs a separate
Pro 2.x writer).
Status as of this commit: ESP-VoCat 6 boards + 220V power 7 boards =
13 project dirs ready for downstream corpus. X86 motherboard is the
largest of the five (7374 docs, 1.9 GB RAM in flight) and still
running.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
KiCad pairs project files purely by basename + same directory: a folder
holding `Foo.kicad_pro`, `Foo.kicad_sch`, `Foo.kicad_pcb` opens as one
project on double-click of the .kicad_pro, with cross-tool navigation
(open footprint from schematic etc) wired up automatically.
- pro_writer.write_kicad_pro() renders the minimal KiCad 8 JSON we
need: meta.filename pinning the basename, sheets=[[<root_uuid>,
""]] binding the schematic root, and stub blocks for board /
schematic / net_settings / erc that KiCad expects to find on the
first GUI load.
- root_sch_writer.write_root_sheet() now accepts an optional
root_uuid so the caller can pass the same uuid into the .kicad_pro
and .kicad_sch (the binding fails silently with mismatched ids).
- CLI gains `--all`: groups SCH/PCB docs by their META.board uuid
(1:1 in EPRO2), strips SCH-/PCB- editor prefixes from titles to
derive a shared project basename, and emits one directory per
BOARD with paired files. BOARDs whose SCH is DELETE_DOC (LCD-BD on
ESP-VoCat) still get a .kicad_pro with sheets:[] + .kicad_pcb so
pcbnew opens cleanly.
ESP-VoCat smoke: 6 boards → 6 project dirs, all pairs validated by
kicad-cli sch erc / pcb export svg. The CoreBoard pro/sch/pcb trio
shares root uuid 366d3e53...c2fccbe4330b end-to-end.
68 → 71 unit tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase-1 left 75-358 unconnected_items per board (DRC), dominated by
GND/AGND/POWER nets that EPRO2 routes through copper pour, not discrete
traces. Phase-2 lands those:
- pcb_writer._decode_zone_path handles the three POUR.path encodings
seen in ESP-VoCat: rectangle (['R', x, y, w, h, ...]), circle
(['CIRCLE', cx, cy, r]) approximated as a 36-segment polygon, and
polyline (numeric pairs with 'L'/'ARC' verb tokens).
- Each POUR on a copper layer turns into a (zone (polygon ...) ...)
block plus a (filled_polygon ...) that mirrors the boundary.
Why mirror, not auto-fill: kicad-cli pcb drc does NOT run the zone
filler before checking — only the KiCad GUI does. Without a
pre-computed (filled_polygon ...), DRC sees zones as empty regions and
reports the entire net as unconnected. Mirroring the boundary as the
fill is "connectivity-correct, clearance-imprecise" — KiCad users can
still hit Edit > Fill Zones to refine thermals and pad clearances. We
chose this over reading EPRO2's POURED.pourFill (the editor's own
post-fill polygons) because POURED paths use ARC tokens we'd need to
fully decode, and the user-drawn POUR boundary is already the
authoritative "intended copper" region.
ESP-VoCat DRC totals: 883 → 730 unconnected_items (-17% project-wide).
CoreBoard, the 4-layer board with the most pour coverage, drops 358 →
205 (-43%). Other boards see no movement because their unconnected
items are non-pour issues — pads outside the user-drawn POUR
rectangle, or internal $1N nets via vias on the wrong net (separate
problem, separate fix).
65 → 68 unit tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase-1 scope: produce a .kicad_pcb that kicad-cli loads cleanly and
that has the right geometry (nets, footprints, tracks, vias, board
outline) — not a 1:1 EDA round-trip. Skipped on purpose for Phase 2:
copper pours (POUR/POURED), manual FILL, teardrops, board-level
strings/images, ARC circle-center recovery.
What lands:
- pcb_writer.write_pcb(): header/general, data-driven layer table
(F.Cu = ord 0; B.Cu = ord 31; SIGNAL inner ids 15+ allocated to
In1.Cu/In2.Cu/... in EPRO2-id sorted order so used inner layers
stay contiguous), net-name → integer id map (id 0 reserved for the
empty net per KiCad convention), LINE→segment / LINE→gr_line on
Edge.Cuts, layer-11 POLY paths walked into Edge.Cuts gr_line chains
(the actual board outline lives on POLY here, not LINE — without
this stats showed edge=0), VIA→via.
- footprint_writer.write_footprint_placement(): inline (footprint ...)
blocks per PCB COMPONENT. EPRO2 RECT/ELLIPSE/OVAL/POLYGON pad
shapes mapped to KiCad rect/circle/oval/custom; SMD vs THT detected
by PAD.hole presence; SLOT holes use (drill oval w h). Pad nets
resolved cross-doc via the existing PCB.PAD_NET → footprint.pad
chain in ProjectRelations. layerId=2 component → (layer B.Cu) +
text on B.SilkS so bottom-side parts render correctly.
Smoke test on ESP-VoCat (6 PCBs): all 6 pass `kicad-cli pcb export svg`
and render. DRC on smallest (MicBoard) reports 145 violations + 75
unconnected — most of the unconnected are GND nets that the EPRO2
source resolves through POUR copper, which Phase 2 will export.
CLI: `python -m tools.epro2.kicad <project> --all-pcb --out <dir>`
emits one .kicad_pcb per PCB doc.
52 → 65 unit tests pass. Float comparisons in tests use math.isclose
because the s-expr 6-decimal trim doesn't preserve strict equality
through `value * MIL_TO_MM` round-trips.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Colleague-facing explainer at docs/sources/pro_crawl_vs_export.md.
Addresses the "I see 278 .epro2 files but my browser only downloaded
one" confusion: web download is a ZIP container (extension is a UX
choice, not a format), our crawl produces per-doc message streams.
Both carry equivalent EPRO2 data; only real gap is IMAGE/ binary
previews which we don't fetch yet.
Why per-doc and not ZIP: the ZIP path has no public endpoint —
three HARs confirm the export button fires zero HTTP requests, it's
pure client-side JSZip on data already loaded by the editor. Our
crawler hits the same chain endpoints the editor uses internally,
which delivers per-doc streams.
Log entry references the 278 vs 266 doc-count delta for ESP-VoCat
(we walk full history chain, web export is a current snapshot).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three coupled changes so kicad-cli sch erc runs at the project level
(across all sheets of one schematic) instead of single-sheet:
1. (label) → (global_label (shape passive)). EPRO2 nets are
project-global by construction (named rails span every page in the
SCH and physically wire across PCBs); KiCad's local label is sheet-
scoped and triggers `label_dangling` for any name not duplicated on
the same page.
2. New root_sch_writer that groups SCH_PAGE docs by their parent SCH
(META.schematic), emits one root .kicad_sch per group with one
(sheet ...) entry per child, and threads the root-assigned uuid back
into each child's (sheet_instances) so KiCad can bind them.
--all-sch now defaults to this; --flat falls back to one-file-per-page.
3. EPRO2's "5-Voltage" placeholder COMPONENT (partId
pid8a0e77bacb214e, 365 instances on ESP-VoCat) is the editor's power
port. The rail name lives in the placement's `Global Net Name` ATTR,
not in the PART. We now emit a (global_label "<rail>") at the
placement coords whenever that attr is set (101/365 of them on
ESP-VoCat — the rest are unconfigured drafts).
ESP-VoCat 5 hierarchical roots: 2325 → 2265 violations. Modest because
5 of 6 SCHs are single-page (no cross-sheet nets to resolve), and the
one 4-page schematic (CoreBoard) shares only a handful of names across
sheets — most net names are de-facto sheet-local. The remaining ~190
pin_not_connected are dominated by 0402-style passives whose pin tip
lies on a wire's interior, not at an endpoint; KiCad needs an explicit
(junction) at those points and we don't yet emit one. Marked as the
next follow-up in log.md.
47 → 52 unit tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bisect found two semantics mismatches between EPRO2 and KiCad that cause
the 850 real-connectivity ERC violations on the ESP-VoCat ref project:
1. sym_writer was emitting lib coords without negating Y, but KiCad lib
uses Y-up and re-flips Y on placement (Y-down schematic). So vertically
arranged pins ended up at Y-mirrored absolute positions and wires that
reach the geometric pin tip in EPRO2 missed the rendered pin tip in
KiCad. Fix: lib_y = -epro2_y, lib_rot = (360 - rot) % 360 for pin/text.
2. sch_writer was treating each LINE as an isolated wire — but EPRO2
binds segments into nets by NAME (WIRE.NET attr), not just geometry.
Multi-segment nets like GND/VBUS show up as N disconnected stubs to
KiCad. Fix: per-LINE, look up lineGroup → WIRE → NET attr and emit a
`(label "<NET>")` at the LINE's start. Same-named labels on distinct
physical wires is how KiCad's ERC recognizes a multi-segment net.
ESP-VoCat 9 sheets:
wire_dangling 444 → 52 (-88%)
pin_not_connected 406 → 196 (-52%)
real connectivity total 850 → 248 (-71%)
Why we did NOT round to grid (the obvious-looking fix): EPRO2 places
some pins on a 10-mil pitch (e.g. magnetic socket); rounding to KiCad's
default 50-mil ERC grid would collapse those pins. The 248 residual is
fundamentally cross-sheet — single-sheet ERC can't see a net's other
endpoints on sibling sheets — and is a Phase-3 (hierarchical sheet)
problem, not a per-sheet one.
41 → 46 unit tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Probed listing API and learned: total field is exposed (Pro=21,202 / Std=12,493),
pageSize accepts >=1000 (full corpus = 35 requests / 71s), sort param is silently
ignored. Dump all listings via scripts/dump_listing_index.py to local jsonl so
downstream batch-selection no longer hits the API.
Why: needed quantitative anchors before scaling Pro batch beyond top-5. License
is detail-page only (~19h serial scan), so we want to filter on grade/like
*locally* first to shortlist before paying that cost. Quality-tier counts now
known: A-tier (grade>=3 & like>=10) = 2,806 across both origins.
- scripts/dump_listing_index.py: one-shot scraper, polite QPS, streams to jsonl
- docs/sources/oshwhub_listing_full.md: human-readable report with growth
trends, quality tiers, owner concentration, and storage-budget anchors
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>