docs: explain per-doc .epro2 crawl vs web-export .epro2 ZIP
Colleague-facing explainer at docs/sources/pro_crawl_vs_export.md. Addresses the "I see 278 .epro2 files but my browser only downloaded one" confusion: web download is a ZIP container (extension is a UX choice, not a format), our crawl produces per-doc message streams. Both carry equivalent EPRO2 data; only real gap is IMAGE/ binary previews which we don't fetch yet. Why per-doc and not ZIP: the ZIP path has no public endpoint — three HARs confirm the export button fires zero HTTP requests, it's pure client-side JSZip on data already loaded by the editor. Our crawler hits the same chain endpoints the editor uses internally, which delivers per-doc streams. Log entry references the 278 vs 266 doc-count delta for ESP-VoCat (we walk full history chain, web export is a current snapshot). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
16
log.md
16
log.md
@@ -4,6 +4,22 @@
|
||||
|
||||
---
|
||||
|
||||
## 2026-04-29 01:00 科普文档:爬取 per-doc .epro2 vs 网页端 .epro2 ZIP 整包
|
||||
|
||||
**Claude 会话**
|
||||
|
||||
接 chain replay sleep 优化(commit `1e06ba6`)后续。同事看到 `data/raw/oshwhub/<uuid>/source/` 下面躺着 278 个 `.epro2` 而不是 1 个,会直觉以为抓错了——他们认知里的 `.epro2` 是网页端"下载工程包"那个 1.4 MB 单文件。
|
||||
|
||||
实际上:
|
||||
- **网页端 `.epro2`** = ZIP 容器(扩展名误导),里面 `.epru`(拼成一坨的 EPRO2 流)+ `project2.json` + `IMAGE/` 6 张组件预览图
|
||||
- **爬取 `.epro2`** = 工程内每个文档(SYMBOL / FOOTPRINT / DEVICE / SCH_PAGE / PCB ……)自己的 EPRO2 消息流,per-doc 一文件
|
||||
- 两者**信息量基本等价**(ESP-VoCat 我们 278 vs 网页 266,多的 12 个是 history chain 上演化掉的容器层旧版本);唯一真实差异是 IMAGE/ 二进制图(我们 blob 引用爬到了但没拉本体——已知 gap)
|
||||
- 我们走 per-doc 不走 ZIP 的硬约束:**ZIP 那条路服务端没有公开端点**,是纯前端 JS 现拼现压(三份 HAR 验证:导出按钮零 HTTP 流量)
|
||||
|
||||
写到 `docs/sources/pro_crawl_vs_export.md`,给同事看。结构:TL;DR → ESP-VoCat 实例 → docType 分布对比表 → 数量差异解释 → 体积对比 → 选型决策表。
|
||||
|
||||
---
|
||||
|
||||
## 2026-04-29 00:30 KiCad 导出 Phase 3 hierarchical:root + global_label + 5-Voltage 电源端口
|
||||
|
||||
**Claude 会话**
|
||||
|
||||
Reference in New Issue
Block a user