Phase 1 MVP: crawl 10 high-quality oshwhub projects into LFS

Why:
- Charles 指定:先爬 10 个高质量项目存 Gitea LFS,一个项目一个文件夹,
  保留原文件和 URL。先以小批量验证 schema + LFS 流水线,放量前再拍板
  存储规模。

What:
- crawlers/oshwhub: 列表 API (`/api/project?sort=hot`) + SSR HTML 解析,
  一次性产出 metadata / description / cover / files / _urls
- schemas/project.schema.json: 跨源统一 schema
- docs/sources/oshwhub.md: API 入口 / 字段映射 / 陷阱调研
- pyproject.toml: httpx[http2] 单依赖
- .gitattributes: data/raw/**/files/** 一律走 LFS(规则写窄,避免误伤 schemas/*.json 等)
- .gitignore: 移除 data/raw/* 排除(改走 LFS 入库)

10 个项目覆盖:调试器 / 加热台 / 盖革计数器 / 数控电源 / 焊台 /
智能手表 / USB 测电流 / ZVS 感应加热 / AI 开发板 / 红外热成像。
共 52 附件 ≈ 524 MB 入 LFS,筛选判据 grade=4 & likes>=100 & 多样性。

Known gaps(见 plan.md § Phase 1.4):
- EasyEDA 源 JSON 需登录 (u.lceda.cn),v0.1 跳过
- fs-web-stream.jlc.com 的工程源下载未测
- scripts/validate.py 自动 schema 校验未实现

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Zhang Jiahao
2026-04-23 19:34:09 +08:00
parent bf2370f83b
commit 5ffa10f256
103 changed files with 2279 additions and 28 deletions

View File

@@ -0,0 +1,16 @@
{
"detail_url": "https://oshwhub.com/XACT/rt300-mkv",
"cover_url": "https://image.lceda.cn/pullimage/G0Yvb3LsUUqWspquWfqe3ray1cucrfmoPvXxT7H2.jpeg",
"attachments": [
{
"name": "video_20240608_002422.mp4",
"url": "https://image.lceda.cn/oshwhub/project/attachments/1fdcbde12c2f4a6faf8c519be4e72984.mp4",
"original_id": "d76ad6faae0240518eb7640ba8fc02cc"
},
{
"name": "RT300-MKV UserManual_24JUL25.pdf",
"url": "https://image.lceda.cn/oshwhub/project/attachments/a57ca3d5e7b54417aae014e2f12f17fa.pdf",
"original_id": "5dc0b16067344ed6bfc15471f2a3bd2a"
}
]
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 273 KiB

View File

@@ -0,0 +1,9 @@
# RT300-MKV 250W 数控升降压桌面可调电源
RT300-MKV-EXTREME 同步混合功率级自动范围升降压型数控可调电源
---
- Source: https://oshwhub.com/XACT/rt300-mkv
- Author: XACT (XACT)
- License: CC BY-NC-SA 4.0
- Published: 2024-08-01T01:03:58.000Z

View File

@@ -0,0 +1,63 @@
{
"source": "oshwhub",
"source_url": "https://oshwhub.com/XACT/rt300-mkv",
"project_id": "91206ca73e96455f946bfcdd73e814fd",
"title": "RT300-MKV 250W 数控升降压桌面可调电源",
"description_short": "RT300-MKV-EXTREME 同步混合功率级自动范围升降压型数控可调电源",
"description_path": "description.md",
"author": {
"username": "XACT",
"display_name": "XACT",
"user_id": "e622ba5d959740ed90f4845a433def3a"
},
"license": "CC BY-NC-SA 4.0",
"tags": [],
"created_at": "2020-12-25T06:25:11.000Z",
"updated_at": "2025-12-19T01:07:55.000Z",
"published_at": "2024-08-01T01:03:58.000Z",
"crawled_at": "2026-04-23T11:26:39.736414+00:00",
"metrics": {
"likes": 867,
"stars": 1735,
"forks": 782,
"views": 185523,
"watch": 0,
"comments": 231
},
"cover": {
"url": "https://image.lceda.cn/pullimage/G0Yvb3LsUUqWspquWfqe3ray1cucrfmoPvXxT7H2.jpeg",
"path": "cover.jpeg"
},
"files": [
{
"name": "video_20240608_002422.mp4",
"url": "https://image.lceda.cn/oshwhub/project/attachments/1fdcbde12c2f4a6faf8c519be4e72984.mp4",
"original_id": "d76ad6faae0240518eb7640ba8fc02cc",
"ext": "mp4",
"mime": "video/mp4",
"size": 81553791,
"md5": "becffa3b615f415101f85d18b9bbfcf9",
"path": "files/video_20240608_002422.mp4",
"sha256": "06864530d9ef794927c8826b7e2e5bd7349587f9b6035e76606331bcba25fbd3"
},
{
"name": "RT300-MKV UserManual_24JUL25.pdf",
"url": "https://image.lceda.cn/oshwhub/project/attachments/a57ca3d5e7b54417aae014e2f12f17fa.pdf",
"original_id": "5dc0b16067344ed6bfc15471f2a3bd2a",
"ext": "pdf",
"mime": "application/pdf",
"size": 3320936,
"md5": "7f97a35fda72a0462ec8401d739ffd41",
"path": "files/RT300-MKV UserManual_24JUL25.pdf",
"sha256": "19eacef139ed672806a7bf492668b3f09e729d8c6eac6d2e5ed79aca8704685e"
}
],
"raw_fields": {
"path": "XACT/rt300-mkv",
"grade": 4,
"origin": "std",
"public": true,
"publish": true,
"skipped_files": []
}
}