Files
FacereDataset/data/raw/oshwhub/201d355452dc48d988f0ddbd981a8a4d/metadata.json
Knowit c6fd111d6d crawler: --no-cover, --concurrency, drop cross-host sleep + batch-50 Step 1 done
Three crawler ergonomics for batch operations:

--no-cover  Skip cover image download. For scan-only modes (license/meta
            scrape) this drops ~1.3s/project and avoids slow-CDN hangs.

--concurrency N  ThreadPoolExecutor wrapping the per-project loop. Default
                 1 = serial (current behavior). Anonymous endpoints tolerate
                 5+ comfortably; output uses a print lock for readable
                 interleaved progress. fetch_cover plumbs through crawl_one.

Drop cross-host sleep #1: in crawl_one between detail HTML (oshwhub.com)
and cover image (image.lceda.cn). Different hosts — sleep was unnecessary.
Saves ~1s/project. Sleep #2 (post-cover, before next iteration) stays — it
gates the next oshwhub.com hit.

download_to gains max_seconds wall budget (default 60s, cover uses 15s).
Defends against pathologically slow CDN connections — observed 10 KB/s
on image.lceda.cn for one project, would have hung 6+ min on a 3.6 MB
cover otherwise. httpx default timeout resets per chunk, so streaming
downloads need an external wall-clock guard.

batch-50 Step 1 (license/meta scrape) shipped:
  50/50 candidates have metadata.json + license recorded
  License distribution: GPL 3.0 32, Public Domain 6, NC variants 8,
                        CERN-OHL 1, MIT 1, CC BY 3.0 1
  Forge-friendly (non-NC): 41/50 (82%)
  Declared attachments: 180 files / 2.36 GB (median 18 MB/proj, max 304 MB)
  Walltime: 3min 26s for 28 projects at concurrency=5 (server-side
            HTML render bound, not sleep-bound)

One orphan partial cover (a670e60a...) cleaned up — leftover from the
first aborted run before the timeout fix landed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 01:35:11 +08:00

50 lines
1.5 KiB
JSON
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
{
"source": "oshwhub",
"source_url": "https://oshwhub.com/45coll/zi-ping-heng-di-lai-luo-san-jiao_10-10-ban-ben",
"project_id": "201d355452dc48d988f0ddbd981a8a4d",
"title": "自平衡的莱洛三角_esp32_可充电_10*10版本",
"description_short": "esp32作为主控5V充3串锂电池可实现摇摆自平衡。",
"description_path": "description.md",
"author": {
"username": "45coll",
"display_name": "45coll",
"user_id": "580e8f5f47ef40be8cf8b3b09b3a14f5"
},
"license": "GPL 3.0",
"tags": [],
"created_at": "2021-09-28T01:21:48.000Z",
"updated_at": "2026-03-11T09:15:59.000Z",
"published_at": "2022-07-21T05:23:04.000Z",
"crawled_at": "2026-04-28T17:27:41.943262+00:00",
"metrics": {
"likes": 653,
"stars": 1351,
"forks": 535,
"views": 171592,
"watch": 0,
"comments": 178
},
"cover": {
"url": "https://image.lceda.cn/avatars/2021/11/SkYmC62QJJtGJDvetpAcBmMfeBoVNrbMLB0I8Ocn.png",
"path": null
},
"files": [
{
"name": "演示视频.mp4",
"url": "https://image.lceda.cn/attachments/2021/11/dyjLDpRavK2NoVUmmmP7deRG6CJnSCnaoJ3wcp24.mp4",
"original_id": "5ea95e588a43449096ac329c4586c439",
"ext": "mp4",
"mime": "video/mp4",
"size": 31444585,
"md5": "bdc5a31f194291453a383cd388961b88"
}
],
"raw_fields": {
"path": "45coll/zi-ping-heng-di-lai-luo-san-jiao_10-10-ban-ben",
"grade": 4,
"origin": "std",
"public": true,
"publish": true,
"skipped_files": []
}
}