Both _run_backfill_source and _run_backfill_pro_source now honor
--concurrency N (default 1 keeps current sequential behavior). Shared
dispatch helper _run_backfill_concurrent + _discover_backfill_targets
factored out — the two paths had drifted but were structurally the same.
Thread safety:
- httpx.Client is sync-thread-safe per docs; one client shared across
threads is correct
- Per-project file writes (metadata.json + source/*) don't conflict
since each thread owns one project dir
- Oversize state file is shared; serialized via a Lock around
_record_oversize
- Print is wrapped in a Lock for readable progress
Expected speedup on dev1 (Guangzhou): batch-200 Pro 100 项 sequential
~14 min -> concurrency 5 ~3-4 min. Std similar 2-3x. Server-side limit
isn't likely to bite at this scale (probe showed Pro QPS=2 sustained
clean; concurrency 5 puts effective rate around 4-5 req/s).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
56 KiB
56 KiB