oshwhub: pin listing index snapshot (33,695 rows, 29 MB) into git
Previous commit added the dump script + report but the actual jsonl was caught by data/state/* gitignore. Add a targeted exception so the snapshot travels with the repo — anyone who clones can do local filtering without re-hitting the API. The data is regenerable (scripts/dump_listing_index.py is one-shot, ~1 min), but pinning a dated snapshot lets us reason about "the state of the corpus on 2026-04-28" reproducibly. Future re-dumps overwrite the same path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2
.gitignore
vendored
2
.gitignore
vendored
@@ -3,6 +3,8 @@ data/processed/*
|
||||
data/state/*
|
||||
!data/processed/.gitkeep
|
||||
!data/state/.gitkeep
|
||||
# 例外:oshwhub 全量 listing 索引快照入库(28 MB jsonl,可重抓但要钉个版本)
|
||||
!data/state/oshwhub_listing_full.jsonl
|
||||
|
||||
# data/raw 入库(工程二进制走 LFS,见 .gitattributes)
|
||||
|
||||
|
||||
33695
data/state/oshwhub_listing_full.jsonl
Normal file
33695
data/state/oshwhub_listing_full.jsonl
Normal file
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user