Files
FacereDataset/data/raw/oshwhub/c19a4fc2119f4a44860ccb55ed171e8e/metadata.json
Knowit c6fd111d6d crawler: --no-cover, --concurrency, drop cross-host sleep + batch-50 Step 1 done
Three crawler ergonomics for batch operations:

--no-cover  Skip cover image download. For scan-only modes (license/meta
            scrape) this drops ~1.3s/project and avoids slow-CDN hangs.

--concurrency N  ThreadPoolExecutor wrapping the per-project loop. Default
                 1 = serial (current behavior). Anonymous endpoints tolerate
                 5+ comfortably; output uses a print lock for readable
                 interleaved progress. fetch_cover plumbs through crawl_one.

Drop cross-host sleep #1: in crawl_one between detail HTML (oshwhub.com)
and cover image (image.lceda.cn). Different hosts — sleep was unnecessary.
Saves ~1s/project. Sleep #2 (post-cover, before next iteration) stays — it
gates the next oshwhub.com hit.

download_to gains max_seconds wall budget (default 60s, cover uses 15s).
Defends against pathologically slow CDN connections — observed 10 KB/s
on image.lceda.cn for one project, would have hung 6+ min on a 3.6 MB
cover otherwise. httpx default timeout resets per chunk, so streaming
downloads need an external wall-clock guard.

batch-50 Step 1 (license/meta scrape) shipped:
  50/50 candidates have metadata.json + license recorded
  License distribution: GPL 3.0 32, Public Domain 6, NC variants 8,
                        CERN-OHL 1, MIT 1, CC BY 3.0 1
  Forge-friendly (non-NC): 41/50 (82%)
  Declared attachments: 180 files / 2.36 GB (median 18 MB/proj, max 304 MB)
  Walltime: 3min 26s for 28 projects at concurrency=5 (server-side
            HTML render bound, not sleep-bound)

One orphan partial cover (a670e60a...) cleaned up — leftover from the
first aborted run before the timeout fix landed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 01:35:11 +08:00

59 lines
1.9 KiB
JSON
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
{
"source": "oshwhub",
"source_url": "https://oshwhub.com/youi/P150C-Pro-D-shu-kong-dian-zi-fu-zai",
"project_id": "c19a4fc2119f4a44860ccb55ed171e8e",
"title": "P150C Pro-D数控电子负载 电池容量测试仪",
"description_short": "P150C Pro-D系列直流恒流型数控电子负载是基于STC8H系列单片机为主控芯片开发而成该产品支持±150V电压测量/15A电流测量/150W功率耗散测量精度高达0.5%,并且支持多种测量模式",
"description_path": "description.md",
"author": {
"username": "youi",
"display_name": "cnwans",
"user_id": "3487f85b25e8466a98c86f32e6f9315d"
},
"license": "GPL 3.0",
"tags": [],
"created_at": "2023-03-20T13:28:17.000Z",
"updated_at": "2026-04-01T16:37:22.000Z",
"published_at": "2025-06-26T01:41:28.000Z",
"crawled_at": "2026-04-28T17:30:54.506725+00:00",
"metrics": {
"likes": 246,
"stars": 644,
"forks": 476,
"views": 78354,
"watch": 0,
"comments": 268
},
"cover": {
"url": "https://image.lceda.cn/pullimage/DiPddiHrzbLYEMB3Twl5qM0szhEeGOvojkc1X79a.jpeg",
"path": null
},
"files": [
{
"name": "操作演示视频.mp4",
"url": "https://image.lceda.cn/attachments/2023/3/Tz9DaCWWXcqfm8smLrYCTdFXBPsUO3znIdrQ0R38.mp4",
"original_id": "3788f7ca68de45a8829b33700df32bc1",
"ext": "mp4",
"mime": "video/mp4",
"size": 39322777,
"md5": "870f5f22a39d39ba1548fa153c9bbc8e"
},
{
"name": "挑战940W功率.mp4",
"url": "https://image.lceda.cn/oshwhub/project/attachments/05da80981c4742088892edae49d4ec86.mp4",
"original_id": "f44810fb07cb4974b2cdf13dd0db4a1a",
"ext": "mp4",
"mime": "video/mp4",
"size": 54888698,
"md5": "71924f16d633448ebd92846e81d9b90e"
}
],
"raw_fields": {
"path": "youi/P150C-Pro-D-shu-kong-dian-zi-fu-zai",
"grade": 3,
"origin": "std",
"public": true,
"publish": true,
"skipped_files": []
}
}