plan: batch-200 expansion (100 Pro + 100 Std)
Doubles down on what worked in batch-50: - dev1 (Guangzhou) is primary execution host - Owner cap=2 for diversity - --max-source-mb 200 to defend against X86-class outliers - Pro 2.x deprecated-board fix is already in (commitc3cac97) - SSH transport for dev1 -> gitea (commit8220c99) Candidate pool: 200 picks from A-tier (grade>=3 & like>=10) minus already-crawled 65 Remaining A-tier corpus is 2,741 (Pro 1326 + Std 1415) 173 unique authors, like median 258, grade dist 4:118 / 3:82 Estimated walltime ~25-35 min on dev1 for Step 1-4 (no attachments). LFS increment ~2.5 GB (source only) or +10 GB if Step 5 attachments included. Either way well within Gitea's 200 GB migration threshold. Step 5 (attachment download) deferred — not on the critical path for EPRO2/Std → KiCad work, can revisit when license-filtered Forge projection demands it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
1
.gitignore
vendored
1
.gitignore
vendored
@@ -7,6 +7,7 @@ data/state/*
|
||||
!data/state/oshwhub_listing_full.jsonl
|
||||
# 例外:扩抓批次的"冻结候选清单"——计划文档以这份为准,可重生成
|
||||
!data/state/oshwhub_batch50_candidates.jsonl
|
||||
!data/state/oshwhub_batch200_candidates.jsonl
|
||||
|
||||
# data/raw 入库(工程二进制走 LFS,见 .gitattributes)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user