Add EasyEDA Std project source ingestion (10 boards backfilled)
打通 oshwhub origin=std 项目的工程源(schematic + PCB dataStr)抓取链路。原
plan.md §1.6 假设需要登录,实测 lceda.cn/api/documents/<doc>?uuid=<doc>&path=<doc>
对公开项目匿名可访问 —— 无需 cookie,无账号封禁风险。
调研:4 轮探测留痕在 data/state/std_probe[1-5]/(gitignored);翻 Std 编辑器
v6.5.51 的 main.min.js bundle 找到 ajaxDetail 端点;按 docType 区分两种
响应 shape(schematic 项目视图 vs PCB 文档视图)。
Crawler:
- make_source_client() 用浏览器 UA + lceda.cn/editor Referer,因为
oshwhub /api/project/<uuid> 端点拒绝 FacereDataset/0.1 UA(CLAUDE.md
UA 例外条款:目标站主动封自定义 UA + 公开静态资源)
- fetch_std_source(): 项目元 → version_documents → 逐文档 dataStr → 落
source/<doc>.json + source/manifest.json
- --with-source(爬新项目时一并抓源)/ --backfill-source(仅扫已有)
- QPS ≤ 0.2 (SLEEP_SOURCE = 5s) 自律
Schema: 加 source_format / source_path / source_documents / editor_version
(前 3 进 enum 锁定,便于后续 Pro / KiCad 源对齐)。
回填结果:10/10 成功,45 个文档,33.2 MB;schema validate 全通。
docTypes 主要是 1 (schematic) 与 3 (pcb);USB 电压电流表只有 PCB 文档(4 个:
主板+盖板+底板+面板,作者未上传原理图源)。
完整调研:docs/sources/easyeda_std_source.md。
This commit is contained in:
275
docs/sources/easyeda_std_source.md
Normal file
275
docs/sources/easyeda_std_source.md
Normal file
@@ -0,0 +1,275 @@
|
||||
# EasyEDA 标准版工程源 (lceda.cn / u.lceda.cn) — 数据源调研
|
||||
|
||||
**定位**:立创 EasyEDA **标准版**(Std,oshwhub 上 `origin: "std"` 的项目)工程源抓取链。与专业版(Pro,见 `easyeda_pro_source.md`)并列。
|
||||
**首版调研**:2026-04-28
|
||||
**状态**:源 API、响应 schema、对公开项目可用性 **全部打通**;**匿名可访问**,无需 cookie。
|
||||
|
||||
---
|
||||
|
||||
## TL;DR
|
||||
|
||||
| 事项 | 结论 |
|
||||
|---|---|
|
||||
| 登录态 | **不需要**(公开项目的 `dataStr` 匿名可拉) |
|
||||
| 源端点 | `https://lceda.cn/api/documents/<doc_uuid>?uuid=<doc>&path=<doc>` |
|
||||
| 入口 | `https://oshwhub.com/api/project/<project_uuid>` 给出 `version_documents[]` |
|
||||
| 源格式 | **EasyEDA JSON**(扁平 `dataStr` 结构,与 `easyeda2kicad.py` 兼容) |
|
||||
| 加密 | **无**(与 Pro 的 AES-128-GCM 不同) |
|
||||
| docType | 1 = Schematic,3 = PCB(其它待观察:5/2/...) |
|
||||
| 实测样本 | ST-LINK V2-1:1 schematic(148 KB) + 1 PCB(552 KB),共 ~700 KB |
|
||||
| 编辑器版本 | `6.5.51`(2026-04,参考 `/editor/6.5.51/js/main.min.js`) |
|
||||
| 速率 | 推荐 QPS ≤ 0.5(沿用主爬虫节流,公开 CDN 性质) |
|
||||
|
||||
---
|
||||
|
||||
## 1. 与专业版 (Pro) 的关键差异
|
||||
|
||||
| 维度 | EasyEDA Std (本文档) | EasyEDA Pro (`pro.lceda.cn`) |
|
||||
|---|---|---|
|
||||
| oshwhub `origin` 标记 | `"std"` | `"pro"` |
|
||||
| 编辑器入口 | `lceda.cn/editor`(v6.5.x) | `pro.lceda.cn/editor`(v3.2.x) |
|
||||
| 源端点 | `lceda.cn/api/documents/<doc>` | `pro.lceda.cn/api/v4/projects/...` 4 步链 |
|
||||
| 鉴权 | **匿名可访(公开项目)** | 必须登录(`lceda_pro_session`) |
|
||||
| 源加密 | **无**(明文 JSON) | AES-128-GCM + gzip |
|
||||
| 源格式 | 扁平 EasyEDA JSON(`{head, canvas, shape, ...}`) | EPRO2 消息流(事件溯源) |
|
||||
| 多 document | 入口给 `version_documents[]`,逐个 GET | `/structures` + history chain 重放 |
|
||||
| KiCad 转换工具 | `easyeda2kicad.py`(pypi,活跃维护) | 无现成工具 |
|
||||
|
||||
---
|
||||
|
||||
## 2. 抓取流程(已验证)
|
||||
|
||||
### 2.1 入口:oshwhub 项目元数据
|
||||
|
||||
```
|
||||
GET https://oshwhub.com/api/project/<PROJECT_UUID>
|
||||
```
|
||||
|
||||
匿名(带浏览器 UA),返回 `result.version_documents[]`:
|
||||
|
||||
```jsonc
|
||||
{
|
||||
"version_documents": [
|
||||
{
|
||||
"uuid": "88c1a5f1dc424ac196807f0efa3c7060", // schematic doc
|
||||
"master": "24e3bdb27ec24d4abca5f37d6d1220e3", // current head hash
|
||||
"docType": 1, // 1 = schematic
|
||||
"thumb": "//image.lceda.cn/histories/<master>.png",
|
||||
"components": { "<comp_uuid>": <count>, ... },
|
||||
"histories": [ "<history_uuid>", ... ] // version chain
|
||||
},
|
||||
{
|
||||
"uuid": "aab000c77a6c4285a1326033ea19ea81", // pcb doc
|
||||
"master": "11cf69b71f71475593438914f771ec2e",
|
||||
"docType": 3, // 3 = pcb
|
||||
...
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
> **历史**:`oshwhub.md §3.5` 之前记此端点匿名 `code:104001` 失败。本次复测 200 通过 — 唯一变量是浏览器 UA(`Mozilla/5.0 ...Chrome/147`)。原 `FacereDataset/0.1` UA 可能被 oshwhub 后端拒绝在该端点。
|
||||
|
||||
### 2.2 源加载:每文档一次 GET
|
||||
|
||||
```
|
||||
GET https://lceda.cn/api/documents/<DOC_UUID>?uuid=<DOC_UUID>&path=<DOC_UUID>
|
||||
Headers:
|
||||
User-Agent: Mozilla/5.0 ... Chrome/...
|
||||
Referer: https://lceda.cn/editor
|
||||
Accept: application/json, text/plain, */*
|
||||
```
|
||||
|
||||
> **注**:响应 `Content-Type: text/html` 是服务端配置 bug,body 实际是 JSON。
|
||||
|
||||
返回两种 shape,按 docType 区分:
|
||||
|
||||
#### Schematic (docType=1) — "project view" wrapper
|
||||
|
||||
```jsonc
|
||||
{
|
||||
"success": true,
|
||||
"result": {
|
||||
"uuid": "<project_uuid>", // 注意:project 级
|
||||
"title": "...", "license": "...",
|
||||
"owner": {...}, "creator": {...},
|
||||
"schematics": [
|
||||
{
|
||||
"uuid": "<schematic_doc_uuid>",
|
||||
"docType": 1,
|
||||
"master": "...", "datastrid": "...",
|
||||
"components": {...},
|
||||
"dataStr": { // ← EasyEDA JSON 源!
|
||||
"head": {...}, // 文档头(编辑器版本、原点等)
|
||||
"canvas": "...", // 画布配置(字符串编码)
|
||||
"shape": [...], // 全部几何/电气元素列表
|
||||
"BBox": {...},
|
||||
"colors": {...}
|
||||
}
|
||||
}
|
||||
],
|
||||
"boms": "[[..csv-string-encoded..]]",
|
||||
"document_sort": ["<doc_uuid>"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### PCB (docType=3) — "document view" 直接返回
|
||||
|
||||
```jsonc
|
||||
{
|
||||
"success": true,
|
||||
"result": {
|
||||
"uuid": "<pcb_doc_uuid>", // 注意:document 级
|
||||
"puuid": "<project_uuid>", // 反向引用项目
|
||||
"docType": 3,
|
||||
"master": "...", "datastrid": "...",
|
||||
"components": {...},
|
||||
"dataStr": {
|
||||
"head": {...},
|
||||
"canvas": "...",
|
||||
"shape": [...],
|
||||
"layers": [...], // PCB 特有:层堆叠
|
||||
"objects": [...], // 3D model 引用
|
||||
"BBox": {...},
|
||||
"preference": {...},
|
||||
"DRCRULE": {...},
|
||||
"netColors": {...}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 2.3 dataStr 字段说明
|
||||
|
||||
| 字段 | 适用 | 含义 |
|
||||
|---|---|---|
|
||||
| `head` | 两者 | 文档头:`{docType, editorVersion, x, y, c_para: {...}}` |
|
||||
| `canvas` | 两者 | 画布参数(gridSize / unit / 视图配置);通常是 `#` 分隔的字符串 |
|
||||
| `shape` | 两者 | 主要数据:所有图元 string-encoded(每条 `~` / `#` 分隔 token) |
|
||||
| `BBox` | 两者 | 边界盒 `{x, y, width, height}` |
|
||||
| `colors` | schematic | 颜色覆盖 |
|
||||
| `layers` | pcb | 层堆叠:每层 `id ~ name ~ color ~ visible ~ ...` |
|
||||
| `objects` | pcb | 3D model 引用 |
|
||||
| `preference` | pcb | DRC / 单位偏好 |
|
||||
| `DRCRULE` | pcb | 设计规则 |
|
||||
| `netColors` | pcb | 网络颜色 |
|
||||
|
||||
**`shape` 字段是核心载荷**。每条字符串形如:
|
||||
|
||||
```
|
||||
LIB~x~y~attrs~rotation~importFlag~uuid~lockedFlag^^pin1^^pin2^^...
|
||||
PAD~circle~x~y~width~height~layer~...
|
||||
W~strokeColor~strokeWidth~...~points...
|
||||
```
|
||||
|
||||
各 type 的具体 token 顺序文档化在 [docs.lceda.cn EasyEDA 文件格式](https://docs.lceda.cn/cn/DocumentFormat/0-EasyEDA-File-Format-Index/index.html)(编辑器 HTML 嵌入了该文档链接)。
|
||||
|
||||
### 2.4 完整抓取伪代码
|
||||
|
||||
```python
|
||||
import httpx, json
|
||||
UA = "Mozilla/5.0 ... Chrome/147.0.0.0 ..."
|
||||
|
||||
def fetch_std_source(project_uuid: str) -> dict:
|
||||
h = {"User-Agent": UA, "Referer": "https://lceda.cn/editor",
|
||||
"Accept": "application/json, text/plain, */*"}
|
||||
with httpx.Client(http2=True, headers=h, timeout=60) as c:
|
||||
# 1. project meta → version_documents
|
||||
r = c.get(f"https://oshwhub.com/api/project/{project_uuid}").json()
|
||||
docs = r["result"]["version_documents"]
|
||||
|
||||
# 2. each doc → dataStr
|
||||
out = {"project_uuid": project_uuid, "documents": []}
|
||||
for d in docs:
|
||||
doc_uuid = d["uuid"]
|
||||
r2 = c.get(f"https://lceda.cn/api/documents/{doc_uuid}",
|
||||
params={"uuid": doc_uuid, "path": doc_uuid}).json()
|
||||
out["documents"].append({
|
||||
"doc_uuid": doc_uuid, "docType": d["docType"],
|
||||
"master": d["master"], "response": r2["result"],
|
||||
})
|
||||
return out
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. 速率 / 礼貌
|
||||
|
||||
- **匿名访问** → 无账号封禁风险;但仍属"高频拉公开 CDN",按 CLAUDE.md QPS ≤ 1/site 约束
|
||||
- 实测 6 sec 间隔无 429;放量时建议 sleep 5s(QPS ≤ 0.2)
|
||||
- UA 应使用浏览器 UA(`FacereDataset/0.1` 在 oshwhub `/api/project/` 上被拒)。**Trade-off**:这违反 CLAUDE.md "UA 不伪造浏览器" 默认规则;申报情形:目标站对自定义 UA 主动封,公开静态资源 → 切浏览器 UA 是允许例外。在 commit message 注明
|
||||
- 单项目大小:median 待测;ST-LINK 700 KB,预计 1–5 MB / 项目(10 项目共约 30 MB)
|
||||
|
||||
---
|
||||
|
||||
## 4. 落盘约定(与主 crawler 一致)
|
||||
|
||||
每个项目额外加:
|
||||
|
||||
```
|
||||
data/raw/oshwhub/<project_uuid>/
|
||||
├── ...(已有元数据)
|
||||
└── source/
|
||||
├── manifest.json # 项目级 source meta:documents 列表、抓取时间、editor_version
|
||||
├── <doc_uuid>.json # 单文档 dataStr 完整响应(直接保存 JSON)
|
||||
└── ...
|
||||
```
|
||||
|
||||
`metadata.json` 中加:
|
||||
|
||||
```jsonc
|
||||
{
|
||||
...,
|
||||
"source_format": "easyeda-std",
|
||||
"source_path": "source/",
|
||||
"source_documents": [
|
||||
{"doc_uuid": "...", "docType": 1, "master": "...", "path": "source/<doc_uuid>.json", "size": 148033, "sha256": "..."},
|
||||
...
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. 已验证 / 未解决
|
||||
|
||||
| # | 项 | 状态 | 备注 |
|
||||
|---|---|---|---|
|
||||
| 1 | 公开项目匿名拉 dataStr | **✅** | ST-LINK V2-1 schematic + PCB 全通 |
|
||||
| 2 | 入口 `oshwhub /api/project/<uuid>` 匿名 | **✅** | 仅需浏览器 UA |
|
||||
| 3 | docType 全集 | ⏳ 部分 | 已见 1(schematic)/ 3(pcb)/ 5("project root" wrapper);其它如 SubPart / Footprint Library 待观察 |
|
||||
| 4 | 多原理图(multi-sheet)项目 | ❌ 未测 | 立创 Std 是否也支持 multi-sheet?10 个样本里待筛 |
|
||||
| 5 | 私有 / 未发布项目源 | ❌ 不在范围 | 按 CLAUDE.md 红线:仅抓 `public: true` |
|
||||
| 6 | 历史版本(非 master) | ❌ 未测 | `/api/histories/<hash>` 端点未通;`/api/documents/<doc>/histories` 给版本列表 |
|
||||
| 7 | EasyEDA → KiCad 转换 | ⏳ 计划中 | `easyeda2kicad.py` 第三方工具,见 `plan.md §1.7` |
|
||||
| 8 | `boms` / `document_sort` 字段语义 | ⏳ 部分 | `boms` 是 csv-string-encoded JSON;`document_sort` 是 doc_uuid 数组(按 sheet 顺序) |
|
||||
|
||||
---
|
||||
|
||||
## 附录 A — 一次性重跑
|
||||
|
||||
```bash
|
||||
# 完整链路:项目 → version_documents → 每文档 dataStr
|
||||
uv run python -c "
|
||||
import httpx, json
|
||||
UA='Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/147.0.0.0 Safari/537.36'
|
||||
PROJ='298873b7fdbe44f8ba0e7351e023bc2c'
|
||||
h={'User-Agent':UA,'Referer':'https://lceda.cn/editor','Accept':'application/json, text/plain, */*'}
|
||||
with httpx.Client(http2=True, headers=h, timeout=60) as c:
|
||||
docs=c.get(f'https://oshwhub.com/api/project/{PROJ}').json()['result']['version_documents']
|
||||
print(f'docs: {len(docs)}')
|
||||
for d in docs:
|
||||
u=d['uuid']
|
||||
r=c.get(f'https://lceda.cn/api/documents/{u}',params={'uuid':u,'path':u})
|
||||
print(f' doc={u[:8]} docType={d[\"docType\"]} status={r.status_code} size={len(r.content)}')
|
||||
"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 附录 B — 变更历史
|
||||
|
||||
| 日期 | 变更 |
|
||||
|---|---|
|
||||
| 2026-04-28 | 首版:全链路打通(oshwhub 入口 + lceda 文档端点 + 两种 docType 响应 shape);确认匿名可访;样本 ST-LINK V2-1 |
|
||||
Reference in New Issue
Block a user