oshwhub: pin listing index snapshot (33,695 rows, 29 MB) into git

Previous commit added the dump script + report but the actual jsonl was caught
by data/state/* gitignore. Add a targeted exception so the snapshot travels with
the repo — anyone who clones can do local filtering without re-hitting the API.

The data is regenerable (scripts/dump_listing_index.py is one-shot, ~1 min), but
pinning a dated snapshot lets us reason about "the state of the corpus on
2026-04-28" reproducibly. Future re-dumps overwrite the same path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-28 23:32:58 +08:00
parent d89a7cdf9c
commit 5e63924474
2 changed files with 33697 additions and 0 deletions

2
.gitignore vendored
View File

@@ -3,6 +3,8 @@ data/processed/*
data/state/* data/state/*
!data/processed/.gitkeep !data/processed/.gitkeep
!data/state/.gitkeep !data/state/.gitkeep
# 例外oshwhub 全量 listing 索引快照入库28 MB jsonl可重抓但要钉个版本
!data/state/oshwhub_listing_full.jsonl
# data/raw 入库(工程二进制走 LFS见 .gitattributes # data/raw 入库(工程二进制走 LFS见 .gitattributes

File diff suppressed because it is too large Load Diff