Initial commit: PastPaper Master full stack

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 12:15:35 +07:00
commit 7a09167261
105 changed files with 24799 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,12 @@
+.env
+.env.*
+node_modules/
+__pycache__/
+*.pyc
+.DS_Store
+dist/
+.claude/
+.venv/
+pastpaper-scraper/
+pastpaper/
+*.pdf
--- a/080c1b16be5aa2e1ea87d5175894fb3c.jpg
+++ b/080c1b16be5aa2e1ea87d5175894fb3c.jpg
--- a/HANDOFF_COMP2211.md
+++ b/HANDOFF_COMP2211.md
@@ -0,0 +1,328 @@
+# COMP2211 Handoff
+
+## Current Status
+
+`COMP2211` course-library papers are now fully loaded into Supabase and normalized to subquestion-level granularity.
+
+Canonical papers currently in DB:
+
+- `COMP2211-2022-fall-midterm`
+- `COMP2211-2022-spring-midterm`
+- `COMP2211-2022-spring-final-part-a`
+- `COMP2211-2022-spring-final-part-b`
+- `COMP2211-2023-spring-midterm`
+- `COMP2211-2024-spring-midterm`
+- `COMP2211-2024-spring-final`
+
+All seven papers are:
+
+- `status = ready`
+- split to subquestion level
+- tagged with `analytics_topic`, `topic_primary`, `topic_tags`, `skill_tags`
+
+Question counts:
+
+- 2022 fall midterm: `43`
+- 2022 spring midterm: `38`
+- 2022 spring final part A: `24`
+- 2022 spring final part B: `19`
+- 2023 spring midterm: `36`
+- 2024 spring midterm: `42`
+- 2024 spring final: `48`
+
+## Key Files
+
+Schema / SQL:
+
+- [001_init_schema.sql](/Users/soda/Desktop/PastPaper%20Master/supabase/migrations/001_init_schema.sql)
+- [002_course_library_fields.sql](/Users/soda/Desktop/PastPaper%20Master/supabase/migrations/002_course_library_fields.sql)
+- [003_question_taxonomy_fields.sql](/Users/soda/Desktop/PastPaper%20Master/supabase/migrations/003_question_taxonomy_fields.sql)
+- [004_decouple_course_library_from_auth.sql](/Users/soda/Desktop/PastPaper%20Master/supabase/migrations/004_decouple_course_library_from_auth.sql)
+- [005_allow_long_question_format_alias.sql](/Users/soda/Desktop/PastPaper%20Master/supabase/migrations/005_allow_long_question_format_alias.sql)
+- [006_make_scores_numeric.sql](/Users/soda/Desktop/PastPaper%20Master/supabase/migrations/006_make_scores_numeric.sql)
+
+Course-library seeds:
+
+- [comp2211_course_library_papers.sql](/Users/soda/Desktop/PastPaper%20Master/supabase/seeds/comp2211_course_library_papers.sql)
+- [comp2211_problem_taxonomy_backfill.sql](/Users/soda/Desktop/PastPaper%20Master/supabase/seeds/comp2211_problem_taxonomy_backfill.sql)
+- [comp2211_problem_level_questions.sql](/Users/soda/Desktop/PastPaper%20Master/supabase/seeds/comp2211_problem_level_questions.sql)
+
+Manual splitters used for final subquestion rebuild:
+
+- [split_comp2211_2022_spring_midterm.py](/Users/soda/Desktop/PastPaper%20Master/backend/split_comp2211_2022_spring_midterm.py)
+- [split_comp2211_2022_spring_final_part_a.py](/Users/soda/Desktop/PastPaper%20Master/backend/split_comp2211_2022_spring_final_part_a.py)
+- [split_comp2211_2022_spring_final_part_b.py](/Users/soda/Desktop/PastPaper%20Master/backend/split_comp2211_2022_spring_final_part_b.py)
+- [split_comp2211_2023_spring_midterm.py](/Users/soda/Desktop/PastPaper%20Master/backend/split_comp2211_2023_spring_midterm.py)
+- [split_comp2211_2024_spring_midterm.py](/Users/soda/Desktop/PastPaper%20Master/backend/split_comp2211_2024_spring_midterm.py)
+- [split_comp2211_2024_spring_final.py](/Users/soda/Desktop/PastPaper%20Master/backend/split_comp2211_2024_spring_final.py)
+
+Deprecated filler script:
+
+- [fill_manual_study_aids.py](/Users/soda/Desktop/PastPaper%20Master/backend/fill_manual_study_aids.py)
+
+Audit / taxonomy references:
+
+- [COMP2211.json](/Users/soda/Desktop/PastPaper%20Master/pastpaper-scraper/manifests/COMP2211.json)
+- [COMP2211_taxonomy.json](/Users/soda/Desktop/PastPaper%20Master/pastpaper-scraper/manifests/COMP2211_taxonomy.json)
+- [summary.json](/Users/soda/Desktop/PastPaper%20Master/pastpaper-scraper/reviews/COMP2211/summary.json)
+- [problem_topics.json](/Users/soda/Desktop/PastPaper%20Master/pastpaper-scraper/reviews/COMP2211/problem_topics.json)
+- [problem_seed.json](/Users/soda/Desktop/PastPaper%20Master/pastpaper-scraper/reviews/COMP2211/problem_seed.json)
+
+Frontend / backend areas already adapted to real taxonomy:
+
+- [frontend/src/pages/HomePage.tsx](/Users/soda/Desktop/PastPaper%20Master/frontend/src/pages/HomePage.tsx)
+- [frontend/src/pages/AnalyticsPage.tsx](/Users/soda/Desktop/PastPaper%20Master/frontend/src/pages/ErrorBookPage.tsx)
+- [frontend/src/components/workbench/SimilarHistoryPanel.tsx](/Users/soda/Desktop/PastPaper%20Master/frontend/src/components/workbench/SimilarHistoryPanel.tsx)
+- [backend/app/routers/analytics.py](/Users/soda/Desktop/PastPaper%20Master/backend/app/routers/analytics.py)
+- [backend/app/routers/questions.py](/Users/soda/Desktop/PastPaper%20Master/backend/app/routers/questions.py)
+- [backend/app/routers/attempts.py](/Users/soda/Desktop/PastPaper%20Master/backend/app/routers/attempts.py)
+
+## Important Product / Data Decisions Already Made
+
+### Course library vs user upload
+
+This is now separated semantically inside `papers`:
+
+- `source_kind = 'course_library'` for platform-owned papers
+- `source_kind = 'user_upload'` for user-contributed papers
+
+Course-library papers no longer require `user_id`.
+
+### Taxonomy model
+
+`question_type` is not the main analytics dimension.
+
+Current intended usage:
+
+- `question_type` / `question_format`: rendering and answer interaction
+- `analytics_topic`: normalized analytics bucket
+- `topic_tags`: multi-tag topical indexing
+- `skill_tags`: finer-grained retrieval / grading / similarity support
+
+### Score field
+
+Scores are `NUMERIC`, not integer, because many subquestions use fractional marks like `1.5`.
+
+## Known Issues
+
+### 1. Similar question retrieval is still not truly production-ready
+
+Current state:
+
+- backend route exists
+- frontend panel exists
+- demo fallback still exists in the UI when retrieval returns empty / fails
+
+What needs to be done:
+
+- remove demo fallback behavior once real retrieval is stable
+- improve ranking beyond current basic topic/type matching
+- ideally add indexed text retrieval, then embeddings if needed
+
+Recommended order:
+
+1. build deterministic same-course retrieval first
+2. rank by `analytics_topic`, `topic_tags`, `skill_tags`, `question_format`, text similarity
+3. only then consider vector search
+
+### 2. Analytics is real, but still not the final version
+
+Current state:
+
+- analytics already reads real DB data
+- taxonomy fields are being used
+
+Still missing:
+
+- better topic normalization for edge cases
+- per-paper and per-subtopic drill-down
+- cleaner stats for mixed-format questions
+- confidence around aggregated counts across all courses, not only `COMP2211`
+
+### 3. LaTeX / math rendering is still fragile
+
+Known symptoms:
+
+- OCR / extracted math strings are noisy
+- some generated HTML contains malformed or hard-to-read math fragments
+- not all backend feedback is rendered with the same quality
+
+What needs work:
+
+- normalize math strings before rendering
+- improve KaTeX preprocessing
+- avoid dumping broken extracted formulas directly into UI
+- ensure solution / feedback content is consistently rendered through the same component path
+
+### 4. Presentation quality is still uneven
+
+Data is now real, but UI still needs polish:
+
+- question nav is still too weak for long real papers
+- status / difficulty / topic chips can be clearer
+- workbench hierarchy is inconsistent across question types
+- some pages still read like an internal demo rather than a finished study product
+
+### 5. User upload flow still lacks dedup / library filtering
+
+This is the next big backend product task.
+
+Desired logic:
+
+- when user uploads a paper, compare against existing course-library papers
+- if it is already covered, do not create a duplicate paper
+- if it is new, ingest it as `user_upload`
+- if high quality and non-duplicate, optionally promote into library workflow later
+
+### 6. Most non-Spring-2024 study aids are contaminated by template filler content
+
+Current state:
+
+- `COMP2211-2022-fall-midterm` has question-level LLM-authored study aids
+- `COMP2211-2024-spring-midterm` is the intended quality bar
+- the remaining papers were backfilled with a deprecated template script and should not be treated as production-quality AI content
+
+Impact:
+
+- `knowledge_reminder` is often generic topic boilerplate
+- `ai_hint` often points to a parent problem header instead of the actual subquestion
+- `solution` is often just wrapped reference text, not a true worked solution
+
+Required action:
+
+1. detect and clear templated study aids from affected papers
+2. regenerate them through the real LLM path in [paper_processor.py](/Users/soda/Desktop/PastPaper%20Master/backend/app/services/paper_processor.py)
+3. review output quality before marking the papers as complete
+
+## Next Major Workstreams
+
+### A. Real similar-question retrieval
+
+Goal:
+
+- no demo fallback
+- same-course retrieval that feels trustworthy
+
+Suggested implementation:
+
+1. add a richer retrieval score in [questions.py](/Users/soda/Desktop/PastPaper%20Master/backend/app/routers/questions.py)
+2. use:
+   - same `course_code`
+   - same `analytics_topic`
+   - overlapping `topic_tags`
+   - overlapping `skill_tags`
+   - same or compatible `question_format`
+   - lexical similarity on `question_text`
+3. expose match reasons in response if useful
+4. update UI to show why a question was retrieved
+
+Potential DB improvement:
+
+- add `search_text` / `tsvector` on `paper_questions`
+- later optionally add `embedding`
+
+### B. Real paper / topic statistics
+
+Goal:
+
+- analytics should be fully trustworthy at subquestion level
+
+Suggested improvements:
+
+- topic frequency by `analytics_topic`
+- question-format distribution by subquestion, not by top-level problem
+- per-paper breakdown
+- high-yield topic trend across years
+- topic-to-question index page for drill mode
+
+### C. LaTeX and content rendering cleanup
+
+Goal:
+
+- all math-heavy content should render legibly
+
+Suggested work:
+
+- centralize HTML + KaTeX normalization
+- strip broken OCR artifacts before render
+- make study-aid content generation avoid malformed formula formatting
+- ensure grading feedback and solutions share the same renderer pipeline
+
+### D. User upload deduplication and library filtering
+
+Goal:
+
+- new uploads should not pollute the DB with duplicates
+
+Suggested logic:
+
+1. normalize upload metadata
+2. compare against existing papers in same course:
+   - year / term / exam_type / part_label
+   - title similarity
+   - extracted first-page markers
+   - optional text fingerprint
+3. if duplicate:
+   - attach to existing paper or reject with explanation
+4. if not duplicate:
+   - create `user_upload`
+   - process normally
+
+Likely schema additions later:
+
+- content fingerprint field on `papers`
+- upload provenance fields
+- moderation / promotion state for community uploads
+
+### E. UI / UX pass
+
+Priority items:
+
+- stronger question navigation for real papers
+- clearer ready / processing / failed states
+- better paper list and filtering UX
+- richer workbench metadata:
+  - topic
+  - difficulty
+  - format
+  - score
+  - answered / wrong / mastered state
+- unify visual style across analytics, error book, workbench
+
+## Suggested Development Order
+
+1. Remove similar-question demo fallback and ship real retrieval
+2. Improve analytics and topic drill views using subquestion-level data
+3. Fix LaTeX / rendering quality
+4. Build upload dedup / filtering against existing library papers
+5. Do a focused UI / UX pass after the real data flows are stable
+
+## Operational Notes
+
+### Frontend entry issue that was fixed
+
+Homepage was previously still using mock papers and an old hardcoded `COMP2211` id.
+It now reads real papers from `listPapers()`.
+
+### Manual content generation
+
+The current `COMP2211` three-piece study aids were filled manually through local scripts and deterministic templates, not through external LLM batch processing. This is deliberate and keeps the current dataset stable.
+
+### If rebuilding papers again
+
+For `COMP2211`, use the manual splitters rather than rerunning generic extraction blindly. `2024-spring-midterm` especially required reconstruction from PDF page spans because the earlier top-level extraction had already truncated `Problem 5` and `Problem 7`.
+
+## Ready-to-Verify Checklist
+
+If you want to sanity-check the current product quickly:
+
+1. Open home page and filter `COMP2211`
+2. Open each paper and confirm `status = ready`
+3. Check question count matches:
+   - `43 / 38 / 24 / 19 / 36 / 42 / 48`
+4. Open analytics page for `COMP2211`
+5. Open several papers and verify:
+   - question nav loads
+   - AI trio exists
+   - topics render
+   - similar-question panel does not block the page
--- a/TECHNICAL.md
+++ b/TECHNICAL.md
@@ -0,0 +1,516 @@
+# PastPaper Master — 技术文档
+
+## 系统架构总览
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                      Frontend (React 19 + Vite 7)               │
+│  Pages: Home / Upload / Workbench / ErrorBook                   │
+│  PDF: react-pdf v10 | Math: KaTeX 0.16 | Style: Tailwind v4    │
+└────────────────────────────┬────────────────────────────────────┘
+                             │ /api (Vite proxy → :8000)
+┌────────────────────────────▼────────────────────────────────────┐
+│                      Backend (FastAPI + Python)                  │
+│  Routers: papers / attempts / questions                         │
+│  Services: paper_processor / grader / llm_clients / text_extractor│
+└────────┬───────────────────┬──────────────────┬─────────────────┘
+         │                   │                  │
+   ┌─────▼─────┐   ┌────────▼───────┐  ┌───────▼──────┐
+   │ Supabase   │   │ GPT-4o         │  │ Qwen-plus    │
+   │ PostgreSQL │   │ (laozhang API) │  │ (DashScope)  │
+   │ + Storage  │   │ 结构化/OCR/变体 │  │ AI三件套/判分 │
+   └───────────┘   └────────────────┘  └──────────────┘
+```
+
+**技术栈一览：**
+- **Frontend**: React 19, TypeScript, Vite 7, Tailwind CSS v4, react-pdf v10, KaTeX 0.16
+- **Backend**: FastAPI, Python 3.12, uv (包管理)
+- **Database**: Supabase (PostgreSQL + Row Level Security)
+- **Storage**: Supabase Storage (buckets: `papers`, `attempt-photos`)
+- **LLM**: GPT-4o (laozhang API 代理), Qwen-plus (阿里 DashScope)
+
+---
+
+## 数据库 Schema
+
+> 文件: `supabase/migrations/001_init_schema.sql`
+
+### Table: `papers` — 试卷
+
+| 字段 | 类型 | 说明 |
+|------|------|------|
+| id | UUID PK | 自动生成 |
+| user_id | UUID FK → auth.users | 上传者 |
+| course_code | TEXT | 课程代码, e.g. "COMP2011" |
+| year / term / exam_type | INT/TEXT/TEXT | 元信息 |
+| paper_file_url | TEXT | 试卷 PDF (Supabase Storage) |
+| answer_file_url | TEXT? | 答案 PDF (可选) |
+| status | TEXT | `uploaded` → `processing` → `ready` / `error` |
+| paper_extracted_text | TEXT | PyMuPDF 提取的原始文本 (缓存) |
+| total_score / question_count | INT | AI 提取的整卷概览 |
+| topics_summary | JSONB | `{"Linked List": 40, "Recursion": 30}` |
+| difficulty_level | TEXT | easy / medium / hard |
+
+### Table: `paper_questions` — 逐题数据
+
+| 字段 | 类型 | 说明 |
+|------|------|------|
+| id | UUID PK | |
+| paper_id | UUID FK → papers | |
+| question_number | TEXT | "1", "1a", "2b" |
+| parent_question | TEXT? | 子题父题号: "1a" → "1" |
+| display_order | INT | 排序 |
+| question_type | TEXT | `mc` / `true_false` / `fill_blank` / `long_question` |
+| question_text | TEXT | 题目原文 |
+| score / page_number | INT | 分值, PDF 页码 (PDF-题目联动用) |
+| options | JSONB | MC 选项: `[{"label":"A","text":"..."}]` |
+| correct_option | TEXT | MC 正确选项 |
+| correct_answer | TEXT | 填空题正确答案 |
+| raw_answer_text | TEXT | 答案 PDF 原始解<E5A78B><E8A7A3> |
+| topics | TEXT[] | 知识点标签 |
+| difficulty | TEXT | easy / medium / hard |
+| knowledge_reminder | TEXT | AI 知识点提醒 (HTML+KaTeX) |
+| ai_hint | TEXT | AI 思路提示 (HTML+KaTeX) |
+| solution | TEXT | AI 完整解题过程 (HTML+KaTeX) |
+
+### Table: `user_attempts` — 用户答题记录
+
+| 字段 | 类型 | 说明 |
+|------|------|------|
+| id | UUID PK | |
+| user_id / question_id | UUID FK | |
+| attempt_type | TEXT | `select` / `input` / `photo` |
+| user_answer | TEXT | 用户的选项或输入 |
+| photo_url / photo_ocr_text | TEXT | 拍照上传的图片和 OCR 结果 |
+| is_correct | BOOL | AI 判定 |
+| feedback | TEXT | HTML 逐步错误分析 |
+| error_at_step | INT | 第几步出错 |
+| in_error_book / mastered | BOOL | 错题本状态 |
+
+---
+
+## 核心功能一：试卷分析管线
+
+### 流程概述
+
+```
+用户上传 PDF → 后台 BackgroundTask → 5 步管线 → 状态变 ready
+```
+
+### 文件
+
+| 文件 | 作用 |
+|------|------|
+| `backend/app/routers/papers.py` | 上传接口, 触发后台处理 |
+| `backend/app/services/paper_processor.py` | **核心管线**, 5 步处理逻辑 |
+| `backend/app/services/text_extractor.py` | PDF → 文本提取 (PyMuPDF) |
+| `backend/app/services/llm_clients.py` | GPT-4o / Qwen 客户端单例 |
+
+### 管线 5 步 (`paper_processor.py: process_paper()`)
+
+**Step 1 — PDF 文本提取**
+- 使用 PyMuPDF (`fitz`) 逐页提取文本
+- 如果某页文本 < 50 字符 (可能是扫描件), 额外保存该页为 base64 图片备用
+- 提取结果缓存到 `papers.paper_extracted_text`
+
+```python
+# text_extractor.py
+extract_pdf(file_bytes) → ExtractedContent(pages_text, page_images, total_pages, has_images)
+get_full_text(extracted) → "--- Page 1 ---\n{text}\n\n--- Page 2 ---\n..."
+```
+
+**Step 2 — GPT-4o 结构化拆题**
+- Model: `gpt-4o`, temperature=0, response_format=json_object
+- 输入: 整卷文本
+- 输出: JSON 包含 total_score, difficulty_level, topics_summary, questions[]
+- 每题提取: question_number, parent_question, question_type, question_text, score, page_number, options, topics, difficulty
+- 更新 `papers` 表的概览字段 (total_score, question_count, topics_summary, difficulty_level)
+
+**Step 3 — 答案匹配 (如果有答案 PDF)**
+- Model: `gpt-4o`, temperature=0
+- 输入: 题目结构 JSON + 答案文本
+- 输出: 逐题匹配 — correct_option / correct_answer / raw_answer_text
+- 选择题 → correct_option, 填空题 → correct_answer, 大题 → raw_answer_text
+
+**Step 4 — Qwen 生成 AI 三件套 (逐题)**
+- Model: `qwen-plus`, temperature=0.3
+- 逐题调用, 输入题目信息 + 标准答案
+- 输出 JSON 三件套:
+  - `knowledge_reminder`: 前置知识要点 (HTML+KaTeX)
+  - `ai_hint`: 不给答案的思路引导 (HTML+KaTeX)
+  - `solution`: 完整逐步解题过程 (HTML+KaTeX)
+- 写入 `paper_questions` 表
+
+**Step 5 — 标记完成**
+- `papers.status` 更新为 `ready`
+- 如果任何步骤抛异常, status 设为 `error`, 错误信息写入 `error_message`
+
+### 关键 Prompt 设计
+
+**STRUCTURE_PROMPT** — 结构化拆题
+- 限定 question_type 只能是 mc / true_false / fill_blank / long_question
+- 判断题 (True/False) 用 `true_false` 类型，options 为 `[{label:"True",text:"True"},{label:"False",text:"False"}]`
+- 选择题必须提取 options 数组
+- 子题通过 parent_question 关联 (e.g. "1a" parent 是 "1")
+- 要求推断 page_number, topics, difficulty
+
+**ANSWER_MATCH_PROMPT** — 答案匹配
+- 输入包含 questions_json (题号+题型) 和 answer_text
+- 按题型输出不同字段: MC → correct_option, fill → correct_answer, 大题 → raw_answer_text
+
+**ANALYSIS_PROMPT** — AI 三件套
+- Solution 要求带完整过程 (Step 1, 2, 3...), 不能只给答案
+- 选择题要解释为什么对、为什么其他选项错
+- 标注常见错误: `<div class="common-error">...</div>`
+- KaTeX 规则: 块级 `$$...$$`, 行内 `$...$`
+
+---
+
+## 核心功能二：PDF 滚动 + 题目联动
+
+### 文件
+
+| 文件 | 作用 |
+|------|------|
+| `frontend/src/components/workbench/PdfViewer.tsx` | PDF 连续滚动渲染 + 可见页检测 |
+| `frontend/src/components/workbench/QuestionNav.tsx` | 题目水平导航栏 |
+| `frontend/src/pages/WorkbenchPage.tsx` | 双向联动调度中枢 |
+
+### 实现方案
+
+**布局**: 左侧 60% PDF, 右侧 40% 题目面板
+
+**PDF 连续滚动 (`PdfViewer.tsx`)**
+- 使用 `react-pdf` 的 `<Document>` + `<Page>` 组件
+- 所有页面垂直排列在可滚动容器中 (不是单页切换)
+- `ResizeObserver` 监听容器宽度, 动态设置 Page width
+- 手动跳转: 输入页码 → `scrollIntoView`
+
+**双向联动:**
+
+1. **题目 → PDF (点击题目, PDF 滚动到对应页)**
+   - QuestionNav 点击 → `handleQuestionSelect(index)` → 记录 `lastUserSelectTime = Date.now()` + `setCurrentIndex`
+   - PdfViewer 收到 `currentPage` prop 变化 → `useEffect` 触发 `el.scrollIntoView({ behavior: "smooth" })`
+   - 设置 `programmaticScroll.current = true`, 2s 后重置
+
+2. **PDF → 题目 (滚动 PDF, 右侧自动切换到当前题)**
+   - `IntersectionObserver` 监听所有 `<Page>` 元素, threshold: `[0, 0.25, 0.5, 0.75, 1]`
+   - 追踪每页的 `intersectionRatio`, 选出可见占比最高的页码
+   - 如果 `programmaticScroll.current === true`, 跳过回调
+   - 触发 `onPageChange(bestPage)` → WorkbenchPage `handlePdfPageChange`
+   - `handlePdfPageChange`: 找到 `page_number <= currentPage` 的最后一题, 更新 `currentIndex`
+
+**防止跳转抢夺 (双层保护):**
+- **WorkbenchPage 层 (核心)**: `lastUserSelectTime` ref — 用户点击题目后 2 秒内, `handlePdfPageChange` 直接 return, 不响应任何 Observer 回调。解决长文档 smooth scroll 经过中间页触发 Observer 导致题目被切走的问题
+- **PdfViewer 层 (辅助)**: `programmaticScroll` ref — scrollIntoView 期间 Observer 回调跳过, 2s 后重置
+
+---
+
+## 核心功能三：做题交互 (MC / 填空)
+
+### 文件
+
+| 文件 | 作用 |
+|------|------|
+| `frontend/src/components/workbench/QuestionDetail.tsx` | 题目展示 + 答题交互 |
+| `frontend/src/components/workbench/AiTrioPanel.tsx` | 知<><E79FA5>点/提示/解析 折叠面板 |
+| `frontend/src/components/shared/CollapsibleSection.tsx` | 可折叠区域组件 |
+| `frontend/src/components/shared/KaTeXRenderer.tsx` | HTML+KaTeX 渲染器 |
+
+### QuestionDetail 交互逻辑
+
+**选择题 (MC):**
+- 状态: `selectedOption`, `checked`
+- 点击选项 → 高亮蓝色 (未检查时)
+- 点击 "Check Answer" → `checked=true`
+- 正确: 选项变绿 + "Correct!" / 错误: 选中项变红, 正确项变绿 + 显示正确答案
+- 切换题目时自动重置状态 (`useEffect` on `question.id`)
+
+**判断题 (True/False):**
+- 状态: `tfAnswers: Record<string, "True" | "False">`, `tfChecked`
+- 每个 statement 右侧有 T / F 两个按钮, 独立切换
+- 选中高亮蓝色, 全部选完后可点 "Submit Answers"
+- 提交后提示查看 solution 对答案 (因为逐条正确答案暂未单独存储)
+
+**填空题 (Fill Blank):**
+- 文本输入框 + "Check" 按钮
+- Enter 键可直接检查
+- 大小写不敏感比较 (`toLowerCase()`)
+- 检查后输入框变色: 绿色 (对) / 红色 (错)
+
+**回调**: `onAnswerResult(isCorrect, userAnswer)` → WorkbenchPage → `recordAttempt` API
+
+### AiTrioPanel
+
+- 三个 `CollapsibleSection`: Knowledge Reminder (蓝, 默认展开), AI Hint (琥珀), Solution (绿)
+- `CollapsibleSection` 使用 CSS `grid-template-rows: 0fr → 1fr` 动画平滑展开收起
+- 内容通过 `KaTeXRenderer` 渲染 (HTML + KaTeX 公式)
+
+---
+
+## 核心功能四：变体题生成 (Similar Question)
+
+### 文件
+
+| 文件 | 作用 |
+|------|------|
+| `backend/app/routers/questions.py` | `POST /{question_id}/variant` 端点 |
+| `backend/app/services/grader.py` | `generate_variant()` — GPT-4o 生成变体 |
+| `frontend/src/components/workbench/ActionBar.tsx` | "Similar Question" 按钮, 异步触发 |
+| `frontend/src/pages/WorkbenchPage.tsx` | Variants Tab 状态管理 |
+| `frontend/src/components/workbench/VariantDetail.tsx` | 变体题作答界面 |
+
+### 后端
+
+- `POST /api/questions/{question_id}/variant`
+- 从 DB 查原题 → 调 `generate_variant(question)` → 附上原题的 `knowledge_reminder` → 返回
+- Model: `gpt-4o`, temperature=0.5, response_format=json_object
+- VARIANT_PROMPT 要求: 同知识点, 相似难度, 不同数据/场景, 输出 HTML 格式 (非 markdown)
+- 输出字段: question_text, question_type, options (if MC), correct_answer, ai_hint, solution
+
+### 前端交互 (Tab-based 异步流程)
+
+**状态管理 (`WorkbenchPage.tsx`):**
+```typescript
+interface StoredVariant {
+  id: string;                    // placeholder ID, e.g. "variant-1"
+  sourceQuestionNumber: string;  // 原题题号
+  variant: VariantQuestion;      // 生成结果
+  status: "generating" | "ready";
+}
+```
+
+**流程:**
+1. 用户点击 "Similar Question" → `ActionBar` 调 `onVariantStart(placeholderId, questionNumber)`
+2. WorkbenchPage 创建 `status: "generating"` 的占位项, 用户可继续做题不受阻塞
+3. API 返回后 → `onVariantReady(placeholderId, variant)` → 状态更新为 `ready`
+4. 失败 → `onVariantFailed(placeholderId)` → 删除占位项
+
+**右侧面板三种视图:**
+- **Questions Tab**: 题目导航 + QuestionDetail + AiTrioPanel + ActionBar
+- **Variants Tab**: 变体列表 (Generating.../Ready), 每项显示题号和预览文本
+- **Variant Detail**: 点击 "Start" 后整个右侧替换为 VariantDetail 组件 + "Back" 按钮
+
+**VariantDetail 组件**: 紫色主题, 包含完整 MC/填空交互 + AI 三件套 (CollapsibleSection)
+
+---
+
+## 核心功能五：拍照批改
+
+### 文件
+
+| 文件 | 作用 |
+|------|------|
+| `backend/app/routers/attempts.py` | `POST /photo` — 上传+OCR+批改 |
+| `backend/app/services/grader.py` | `ocr_photo()` + `grade_answer()` |
+| `frontend/src/components/workbench/PhotoUpload.tsx` | 拍照上传 Modal |
+| `frontend/src/components/workbench/ActionBar.tsx` | "Upload handwritten answer" 按钮 |
+
+### 后端流程
+
+1. 接收图片 → 上传到 Supabase Storage `attempt-photos` bucket
+2. `ocr_photo(photo_bytes)` — GPT-4o Vision 识别手写内容
+   - 输入: base64 图片
+   - 输出: 学生答案文本 (含 LaTeX 公式)
+3. `grade_answer(question, student_answer)` — Qwen-plus 批改
+   - 输入: 题目信息 + 标准答案 + 学生答案
+   - 输出: `{ is_correct, score_given, feedback (HTML), error_at_step }`
+4. 写入 `user_attempts` 表 (含 photo_url, photo_ocr_text, feedback, is_correct)
+5. 答错自动 `in_error_book = true`
+
+### 前端
+
+- PhotoUpload: Modal 弹窗, 支持拖拽/点击选择图片
+- 预览 → 提交 → 显示 OCR 识别结果 + AI 批改反馈
+- 所有题型均可使用 (MC / 填空 / 大题)
+
+---
+
+## 核心功能六：错题本
+
+### 文件
+
+| 文件 | 作用 |
+|------|------|
+| `backend/app/routers/attempts.py` | `GET /error-book` + `PATCH /{attempt_id}` |
+| `frontend/src/pages/ErrorBookPage.tsx` | 错题本页面 |
+| `frontend/src/lib/api.ts` | `getErrorBook()` + `updateAttempt()` |
+
+### 后端
+
+- `GET /api/attempts/error-book?user_id=xxx`
+  - 查询 `in_error_book=true AND mastered=false`
+  - JOIN `paper_questions` 返回完整题目信息
+- `PATCH /api/attempts/{attempt_id}`
+  - 更新 `in_error_book` 或 `mastered` 标记
+
+### 前端
+
+- 列表展示: 题目信息 + 用户答案 + AI 反馈
+- 操作: "Review in Workbench" (跳转) / "Mastered" (标记掌握) / "Remove" (移出错题本)
+
+---
+
+## 核心功能七：答题记录
+
+### 文件
+
+| 文件 | 作用 |
+|------|------|
+| `backend/app/routers/attempts.py` | `POST /` — 记录答题 |
+| `frontend/src/components/workbench/ActionBar.tsx` | "Got it right" / "Got it wrong" 按钮 |
+
+### 流程
+
+- "Got it right" → `POST /api/attempts/` with `attempt_type: "select", is_correct: true`
+- "Got it wrong" → `POST /api/attempts/` with `attempt_type: "select", is_correct: false`
+  - 后端自动 `in_error_book = true`
+- Toast 提示操作结果
+
+---
+
+## API 接口汇总
+
+### Papers Router (`/api/papers`)
+
+| Method | Path | 说明 |
+|--------|------|------|
+| GET | `/` | 列出所有试卷 (可按 user_id 过滤) |
+| POST | `/upload` | 上传试卷 PDF + 可选答案 PDF |
+| GET | `/{paper_id}` | 获<><E88EB7><EFBFBD>单份试卷信息 |
+| GET | `/{paper_id}/questions` | 获取试卷所有题目 |
+
+### Attempts Router (`/api/attempts`)
+
+| Method | Path | 说明 |
+|--------|------|------|
+| POST | `/` | 记录一次答题 |
+| POST | `/photo` | 拍照上传 + OCR + AI 批改 |
+| GET | `/error-book?user_id=` | 获取错题本 |
+| PATCH | `/{attempt_id}` | 更新错题本/掌握状态 |
+
+### Questions Router (`/api/questions`)
+
+| Method | Path | 说明 |
+|--------|------|------|
+| POST | `/{question_id}/variant` | 生成变体题 |
+
+---
+
+## 前端路由
+
+| 路径 | 页面 | 文件 |
+|------|------|------|
+| `/` | 首页 — 试卷列表 | `src/pages/HomePage.tsx` |
+| `/upload` | 上传试卷 | `src/pages/UploadPage.tsx` |
+| `/paper/:id` | 做题工作台 | `src/pages/WorkbenchPage.tsx` |
+| `/error-book` | 错题本 | `src/pages/ErrorBookPage.tsx` |
+
+---
+
+## 前端组件树 (Workbench)
+
+```
+WorkbenchPage
+├── Header                          # 顶部导航 (课程+试卷标题)
+├── PdfViewer                       # 左侧 60% — PDF 连续滚动
+└── Right Panel (40%)
+    ├── [Questions Tab]
+    │   ├── QuestionNav             # 题目水平导航 Q1 Q2 Q3...
+    │   ├── QuestionDetail          # 题目展示 + MC/填空交互
+    │   ├── AiTrioPanel             # 知识点/提示/解析 (3x CollapsibleSection)
+    │   └── ActionBar               # 底部按钮 (对/错/变体/拍照)
+    ├── [Variants Tab]
+    │   └── Variant Cards           # 变体列表 (Generating.../Ready)
+    └── [Variant Detail View]       # 替换整个右侧
+        ├── Back Button
+        └── VariantDetail           # 变体题作答 + AI 三件套
+```
+
+---
+
+## LLM 调用模型分工
+
+| 任务 | 模型 | Provider | 文件 |
+|------|------|----------|------|
+| 结构化拆题 | gpt-4o | laozhang API | paper_processor.py |
+| 答案匹配 | gpt-4o | laozhang API | paper_processor.py |
+| AI 三件套 (knowledge/hint/solution) | qwen-plus | DashScope | paper_processor.py |
+| 手写 OCR | gpt-4o (Vision) | laozhang API | grader.py |
+| 答案批改 | qwen-plus | DashScope | grader.py |
+| 变体题生成 | gpt-4o | laozhang API | grader.py |
+
+---
+
+## 配置与环境变量
+
+> 文件: `backend/app/config.py`, `.env`
+
+| 变量 | 说明 |
+|------|------|
+| SUPABASE_URL | Supabase 项目 URL |
+| SUPABASE_ANON_KEY | 前端用匿名 Key |
+| SUPABASE_SERVICE_ROLE_KEY | 后端用 Service Role Key (绕过 RLS) |
+| LAOZHANG_BASE_URL | GPT-4o 代理 API 地址 |
+| LAOZHANG_API_KEY | GPT-4o 代理 API Key |
+| DASHSCOPE_BASE_URL | 阿里 DashScope API |
+| DASHSCOPE_API_KEY | DashScope API Key |
+
+---
+
+## 文件完整索引
+
+### Backend (`backend/app/`)
+
+```
+main.py                        # FastAPI 入口, CORS, 路由注册
+config.py                      # Pydantic Settings, 环境变量
+routers/
+  papers.py                    # 试卷 CRUD + 上传触发处理
+  attempts.py                  # 答题记录 + 拍照OCR批改 + 错题本
+  questions.py                 # 变体题生成
+services/
+  paper_processor.py           # 核心5步管线: PDF→结构化→答案匹配→AI三件套
+  text_extractor.py            # PyMuPDF 文本提取
+  grader.py                    # OCR + 批改 + 变体生成 (Prompt + LLM 调用)
+  llm_clients.py               # GPT-4o / Qwen 客户端单例
+  supabase_client.py           # Supabase 客户端
+```
+
+### Frontend (`frontend/src/`)
+
+```
+App.tsx                        # React Router 路由定义
+main.tsx                       # ReactDOM 入口
+lib/
+  api.ts                       # 所有 API 调用封装 (9 个函数)
+types/
+  api.ts                       # TypeScript 类型定义
+hooks/
+  usePaper.ts                  # 轮询获取试卷状态 (3s interval)
+  useQuestions.ts              # 获取题目列表
+pages/
+  HomePage.tsx                 # 首页 — 试卷列表
+  UploadPage.tsx               # 上传页
+  WorkbenchPage.tsx            # 做题工作台 — 核心调度组件
+  ErrorBookPage.tsx            # 错题本
+components/
+  layout/
+    Header.tsx                 # 顶部导航栏
+  shared/
+    KaTeXRenderer.tsx          # HTML+KaTeX 公式渲染
+    CollapsibleSection.tsx     # 折叠面板 (grid动画)
+    StatusBadge.tsx            # 状态标签
+  upload/
+    UploadForm.tsx             # 上传表单
+    FilePickerField.tsx        # 文件选择器
+  workbench/
+    PdfViewer.tsx              # PDF 连续滚动 + IntersectionObserver
+    QuestionNav.tsx            # 题目导航栏
+    QuestionDetail.tsx         # 题目展示 + MC/填空交互
+    AiTrioPanel.tsx            # AI 三件套面板
+    ActionBar.tsx              # 底部操作按钮
+    PhotoUpload.tsx            # 拍照上传 Modal
+    VariantDetail.tsx          # 变体题内联作答
+    VariantModal.tsx           # (已废弃, 被 VariantDetail 替代)
+```
--- a/backend/Dockerfile
+++ b/backend/Dockerfile
@@ -0,0 +1,16 @@
+FROM python:3.12-slim
+
+WORKDIR /app
+
+# System deps for PyMuPDF
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    libmupdf-dev gcc g++ && \
+    rm -rf /var/lib/apt/lists/*
+
+COPY pyproject.toml .
+RUN pip install --no-cache-dir .
+
+COPY app/ app/
+
+EXPOSE 8000
+CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
--- a/backend/add_progress_columns.sql
+++ b/backend/add_progress_columns.sql
@@ -0,0 +1,4 @@
+ALTER TABLE papers
+ADD COLUMN IF NOT EXISTS processing_step text DEFAULT NULL,
+ADD COLUMN IF NOT EXISTS processing_progress integer DEFAULT 0,
+ADD COLUMN IF NOT EXISTS processing_total integer DEFAULT 0;
--- a/backend/app/init.py
+++ b/backend/app/init.py
--- a/backend/app/config.py
+++ b/backend/app/config.py
@@ -0,0 +1,36 @@
+from pydantic_settings import BaseSettings
+from functools import lru_cache
+import os
+
+
+class Settings(BaseSettings):
+    # Supabase
+    supabase_url: str
+    supabase_anon_key: str
+    supabase_service_role_key: str
+
+    # LLM - laozhang (gpt-4o, gpt-4o-mini)
+    laozhang_base_url: str = "https://api.laozhang.ai/v1"
+    laozhang_api_key: str = ""
+
+    # LLM - DashScope (qwen-plus)
+    dashscope_base_url: str = "https://dashscope.aliyuncs.com/compatible-mode/v1"
+    dashscope_api_key: str = ""
+
+    # LLM - DeepSeek
+    deepseek_base_url: str = "https://api.deepseek.com/v1"
+    deepseek_api_key: str = ""
+
+    # Google Gemini (official)
+    google_gemini_api_key: str = ""
+
+    model_config = {
+        "env_file": os.path.join(os.path.dirname(__file__), "../../.env"),
+        "env_file_encoding": "utf-8",
+        "extra": "ignore",
+    }
+
+
+@lru_cache
+def get_settings() -> Settings:
+    return Settings()
--- a/backend/app/dependencies/init.py
+++ b/backend/app/dependencies/init.py
--- a/backend/app/dependencies/auth.py
+++ b/backend/app/dependencies/auth.py
@@ -0,0 +1,34 @@
+"""Auth dependency: validate Supabase JWT and return user_id"""
+
+from fastapi import Depends, HTTPException, status
+from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
+from app.services.supabase_client import get_supabase
+
+bearer_scheme = HTTPBearer(auto_error=False)
+
+
+async def get_current_user_id(
+    credentials: HTTPAuthorizationCredentials | None = Depends(bearer_scheme),
+) -> str:
+    """Extract and validate Bearer token, return user_id."""
+    if not credentials:
+        raise HTTPException(
+            status_code=status.HTTP_401_UNAUTHORIZED,
+            detail="Not authenticated",
+        )
+    token = credentials.credentials
+    sb = get_supabase()
+    try:
+        result = sb.auth.get_user(token)
+        user = result.user
+        if not user:
+            raise HTTPException(
+                status_code=status.HTTP_401_UNAUTHORIZED,
+                detail="Invalid token",
+            )
+        return user.id
+    except Exception:
+        raise HTTPException(
+            status_code=status.HTTP_401_UNAUTHORIZED,
+            detail="Invalid or expired token",
+        )
--- a/backend/app/main.py
+++ b/backend/app/main.py
@@ -0,0 +1,59 @@
+import asyncio
+import threading
+from contextlib import asynccontextmanager
+
+from fastapi import FastAPI
+from fastapi.middleware.cors import CORSMiddleware
+from app.routers import analytics, papers, attempts, questions
+
+
+def _resume_stale_papers():
+    """启动时检查卡在 processing 的 paper，自动续传 AI trio"""
+    try:
+        from app.services.supabase_client import get_supabase
+        from app.services.paper_processor import process_paper
+
+        sb = get_supabase()
+        stale = sb.table("papers").select("id").eq("status", "processing").execute().data
+        if not stale:
+            return
+
+        for p in stale:
+            paper_id = p["id"]
+            print(f"[STARTUP] Resuming processing for paper {paper_id[:8]}...")
+
+            def run(pid=paper_id):
+                asyncio.run(process_paper(pid, b"", None))
+
+            threading.Thread(target=run, daemon=True).start()
+    except Exception as e:
+        print(f"[STARTUP] Resume skipped: {e}")
+
+
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+    # Startup
+    _resume_stale_papers()
+    yield
+    # Shutdown (nothing to do)
+
+
+app = FastAPI(title="PastPaper Master API", version="0.1.0", lifespan=lifespan)
+
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],  # 开发阶段先放开，上线收紧
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+
+app.include_router(papers.router, prefix="/api/papers", tags=["papers"])
+app.include_router(attempts.router, prefix="/api/attempts", tags=["attempts"])
+app.include_router(questions.router, prefix="/api/questions", tags=["questions"])
+app.include_router(analytics.router, prefix="/api/analytics", tags=["analytics"])
+
+
+@app.get("/health")
+def health():
+    return {"status": "ok"}
--- a/backend/app/routers/init.py
+++ b/backend/app/routers/init.py
--- a/backend/app/routers/analytics.py
+++ b/backend/app/routers/analytics.py
@@ -0,0 +1,285 @@
+"""Course-level analytics endpoints."""
+
+from __future__ import annotations
+
+from collections import Counter, defaultdict
+
+from fastapi import APIRouter
+
+from app.services.supabase_client import get_supabase
+
+router = APIRouter()
+
+
+DIFFICULTY_SCORE = {"easy": 1, "medium": 2, "hard": 3}
+DIFFICULTY_LABEL = {1: "Easy", 2: "Medium", 3: "Hard"}
+
+# ── Topic normalization ──────────────────────────────────────
+# Map variant spellings to canonical label
+_TOPIC_ALIASES: dict[str, str] = {
+    "numpy": "NumPy",
+    "naïve bayes": "Naive Bayes",
+    "naïve bayes classifier": "Naive Bayes",
+    "naive bayes classifier": "Naive Bayes",
+    "bayes classifier": "Naive Bayes",
+    "bayes model": "Naive Bayes",
+    "bayes' theorem": "Naive Bayes",
+    "bayes' rule": "Naive Bayes",
+    "k-nearest neighbors": "K-Nearest Neighbors (KNN)",
+    "knn": "K-Nearest Neighbors (KNN)",
+    "k-means clustering": "K-Means Clustering",
+    "k-means": "K-Means Clustering",
+    "k means": "K-Means Clustering",
+    "multilayer perceptron": "Multilayer Perceptron (MLP)",
+    "multi-layer perceptron": "Multilayer Perceptron (MLP)",
+    "multi-layer perceptron (mlp)": "Multilayer Perceptron (MLP)",
+    "mlp": "Multilayer Perceptron (MLP)",
+    "single layer perceptron": "Perceptron",
+    "convolutional neural network": "CNN",
+    "convolutional neural network (cnn)": "CNN",
+    "convolutional neural networks": "CNN",
+    "cnn architecture": "CNN",
+    "cnn properties": "CNN",
+    "python fundamentals": "Python",
+    "python programming": "Python",
+    "python implementation": "Python",
+    "advanced python programming": "Python",
+    "python programming: convolutional neural network": "CNN",
+    "cross-validation": "Cross Validation",
+    "model evaluation implementation": "Model Evaluation",
+    "digital image processing": "Image Processing",
+    "computer vision": "Image Processing",
+    "array slicing": "Array Slicing",
+    "slicing": "Array Slicing",
+    "array indexing": "Array Slicing",
+    "array reshaping": "Reshape",
+    "array views": "Array Slicing",
+    "view vs copy": "Array Slicing",
+    "boolean indexing": "Array Slicing",
+    "arange": "NumPy",
+    "newaxis": "NumPy",
+    "expand dims": "NumPy",
+    "transpose": "NumPy",
+    "type casting": "NumPy",
+    "element-wise operation": "NumPy",
+    "array reduction": "NumPy",
+    "multi-dimensional array": "NumPy",
+    "dot product": "NumPy",
+    "vectorization": "NumPy",
+    "activation functions": "Activation Function",
+    "linear activation function": "Activation Function",
+    "neural network architecture": "Neural Networks",
+    "hidden layer": "Neural Networks",
+    "deep learning": "Neural Networks",
+    "deep learning frameworks": "Neural Networks",
+    "alpha-beta pruning": "Alpha-Beta Pruning",
+    "minimax algorithm": "Minimax",
+    "ethics of ai": "AI Ethics",
+    "ethics": "AI Ethics",
+    "cosine distance": "Cosine Similarity",
+    "distance calculation": "Distance Metrics",
+    "euclidean distance": "Distance Metrics",
+    "manhattan distance": "Distance Metrics",
+    "hamming distance": "Distance Metrics",
+    "precision": "Model Evaluation",
+    "recall": "Model Evaluation",
+    "f1 score": "Model Evaluation",
+    "macro f1 score": "Model Evaluation",
+    "accuracy": "Model Evaluation",
+    "classification accuracy": "Model Evaluation",
+    "confusion matrix": "Model Evaluation",
+    "convolution operation": "Convolution",
+    "dilated convolution": "Convolution",
+    "3d convolution": "Convolution",
+    "gaussian likelihood": "Probability",
+    "gaussian distribution": "Probability",
+    "categorical likelihood": "Probability",
+    "conditional probability": "Probability",
+    "total probability theorem": "Probability",
+    "probability assumptions": "Probability",
+    "tensorflow": "Keras",
+    "model summary": "Keras",
+    "model construction": "Keras",
+    "trainable parameters": "Parameter Calculation",
+    "parameter reduction": "Parameter Calculation",
+    "output shape calculation": "Parameter Calculation",
+    "shape calculation": "Parameter Calculation",
+}
+
+
+def normalize_topic(label: str) -> str:
+    return _TOPIC_ALIASES.get(label.lower().strip(), label)
+
+
+def extract_topic_labels(question: dict) -> list[str]:
+    labels: list[str] = []
+    raw_labels: list[str] = []
+
+    analytics_topic = question.get("analytics_topic")
+    if analytics_topic:
+        raw_labels.append(analytics_topic)
+
+    for tag in question.get("topic_tags") or []:
+        if tag and tag not in raw_labels:
+            raw_labels.append(tag)
+
+    if not raw_labels:
+        for tag in question.get("topics") or []:
+            if tag and tag not in raw_labels:
+                raw_labels.append(tag)
+
+    # Normalize and deduplicate
+    seen: set[str] = set()
+    for raw in raw_labels:
+        norm = normalize_topic(raw)
+        if norm not in seen:
+            seen.add(norm)
+            labels.append(norm)
+
+    return labels
+
+
+def extract_question_family(question: dict) -> str:
+    return (
+        question.get("question_format")
+        or question.get("question_type")
+        or "unknown"
+    )
+
+
+@router.get("/courses")
+async def list_courses():
+    """返回所有有 ready 状态试卷的课程列表"""
+    sb = get_supabase()
+    rows = (
+        sb.table("papers")
+        .select("course_code")
+        .eq("status", "ready")
+        .execute()
+        .data
+    )
+    codes = sorted({row["course_code"] for row in rows if row.get("course_code")})
+    return codes
+
+
+@router.get("/course/{course_code}")
+async def get_course_analytics(course_code: str):
+    sb = get_supabase()
+
+    papers = (
+        sb.table("papers")
+        .select("id, course_code, year, term, exam_type, part_label, status")
+        .eq("course_code", course_code.upper())
+        .eq("status", "ready")
+        .order("year", desc=True)
+        .execute()
+        .data
+    )
+    if not papers:
+        return {
+            "course_code": course_code.upper(),
+            "kpi": {"papers": 0, "questions": 0, "topics": 0, "difficulty": "N/A"},
+            "topic_frequency": [],
+            "question_types": [],
+            "difficulty_distribution": {"easy": 0, "medium": 0, "hard": 0},
+            "high_yield_topics": [],
+        }
+
+    paper_ids = [paper["id"] for paper in papers]
+    questions = (
+        sb.table("paper_questions")
+        .select(
+            "id, paper_id, question_number, question_type, question_format, "
+            "question_text, score, topics, analytics_topic, topic_tags, difficulty"
+        )
+        .in_("paper_id", paper_ids)
+        .order("display_order")
+        .execute()
+        .data
+    )
+
+    papers_by_id = {paper["id"]: paper for paper in papers}
+    total_questions = len(questions)
+    topic_counter: Counter[str] = Counter()
+    type_counter: Counter[str] = Counter()
+    difficulty_counter: Counter[str] = Counter()
+    topic_examples: dict[str, list[dict]] = defaultdict(list)
+    difficulty_scores: list[int] = []
+    all_question_items: list[dict] = []
+
+    for question in questions:
+        question_type = extract_question_family(question)
+        type_counter[question_type] += 1
+
+        difficulty = question.get("difficulty")
+        if difficulty in DIFFICULTY_SCORE:
+            difficulty_counter[difficulty] += 1
+            difficulty_scores.append(DIFFICULTY_SCORE[difficulty])
+
+        paper = papers_by_id.get(question["paper_id"], {})
+        source_label = (
+            f"{paper.get('year', '')} {paper.get('term', '').title()} "
+            f"{paper.get('exam_type', '').title()}"
+        ).strip()
+        if paper.get("part_label"):
+            source_label = f"{source_label} Part {paper['part_label']}"
+
+        topics = extract_topic_labels(question)
+        q_item = {
+            "paper_id": paper.get("id"),
+            "source": source_label,
+            "question_number": question["question_number"],
+            "preview": question["question_text"][:220],
+            "difficulty": question.get("difficulty"),
+            "question_type": question_type,
+            "year": paper.get("year"),
+            "term": paper.get("term"),
+            "exam_type": paper.get("exam_type"),
+            "topics": topics,
+        }
+        all_question_items.append(q_item)
+
+        for topic in topics:
+            topic_counter[topic] += 1
+            topic_examples[topic].append(q_item)
+
+    avg_difficulty = "N/A"
+    if difficulty_scores:
+        rounded = round(sum(difficulty_scores) / len(difficulty_scores))
+        avg_difficulty = DIFFICULTY_LABEL.get(rounded, "Medium")
+
+    topic_frequency = []
+    for topic, count in topic_counter.most_common():
+        pct = round((count / total_questions) * 100) if total_questions else 0
+        topic_frequency.append(
+            {
+                "label": topic,
+                "count": count,
+                "pct": pct,
+                "questions": topic_examples[topic],
+            }
+        )
+
+    question_types = []
+    for label, count in type_counter.most_common():
+        pct = round((count / total_questions) * 100) if total_questions else 0
+        question_types.append({"label": label, "count": count, "pct": pct})
+
+    return {
+        "course_code": course_code.upper(),
+        "kpi": {
+            "papers": len(papers),
+            "questions": total_questions,
+            "topics": len(topic_counter),
+            "difficulty": avg_difficulty,
+        },
+        "topic_frequency": topic_frequency,
+        "question_types": question_types,
+        "all_questions": all_question_items,
+        "difficulty_distribution": {
+            "easy": difficulty_counter.get("easy", 0),
+            "medium": difficulty_counter.get("medium", 0),
+            "hard": difficulty_counter.get("hard", 0),
+        },
+        "high_yield_topics": [topic for topic, _ in topic_counter.most_common(5)],
+    }
--- a/backend/app/routers/attempts.py
+++ b/backend/app/routers/attempts.py
@@ -0,0 +1,208 @@
+"""用户答题记录 + 拍照批改 + 错题本"""
+
+import asyncio
+from fastapi import APIRouter, UploadFile, File, Form, HTTPException, Depends
+from pydantic import BaseModel
+from app.services.supabase_client import get_supabase
+from app.services.grader import ocr_photo, grade_answer
+from app.dependencies.auth import get_current_user_id
+
+router = APIRouter()
+
+
+class AttemptCreate(BaseModel):
+    question_id: str
+    attempt_type: str  # "select" | "input" | "photo"
+    user_answer: str | None = None
+    is_correct: bool | None = None
+
+
+class AttemptUpdate(BaseModel):
+    in_error_book: bool | None = None
+    mastered: bool | None = None
+
+
+@router.post("/")
+async def create_attempt(data: AttemptCreate, user_id: str = Depends(get_current_user_id)):
+    """记录一次答题"""
+    sb = get_supabase()
+    record = {
+        "user_id": user_id,
+        "question_id": data.question_id,
+        "attempt_type": data.attempt_type,
+        "user_answer": data.user_answer,
+        "is_correct": data.is_correct,
+    }
+    # Auto add to error book if wrong
+    if data.is_correct is False:
+        record["in_error_book"] = True
+
+    result = sb.table("user_attempts").insert(record).execute()
+    return result.data[0]
+
+
+@router.post("/photo")
+async def photo_attempt(
+    question_id: str = Form(...),
+    photo: UploadFile = File(...),
+    user_id: str = Depends(get_current_user_id),
+):
+    """拍照上传 → OCR → AI批改"""
+    sb = get_supabase()
+
+    # 1. Read photo
+    photo_bytes = await photo.read()
+
+    # 2. Upload to storage
+    storage_path = f"attempts/{user_id}/{question_id}/{photo.filename}"
+    sb.storage.from_("attempt-photos").upload(
+        storage_path, photo_bytes,
+        file_options={"content-type": photo.content_type or "image/jpeg", "upsert": "true"},
+    )
+    photo_url = sb.storage.from_("attempt-photos").get_public_url(storage_path)
+
+    # 3. OCR (run in thread pool to avoid blocking event loop)
+    ocr_text = await asyncio.to_thread(ocr_photo, photo_bytes)
+
+    # 4. Fetch question for grading context
+    q_result = sb.table("paper_questions").select("*").eq("id", question_id).execute()
+    if not q_result.data:
+        raise HTTPException(status_code=404, detail="Question not found")
+    question = q_result.data[0]
+
+    # 5. AI grading (run in thread pool)
+    grade_result = await asyncio.to_thread(grade_answer, question, ocr_text)
+
+    # 6. Save attempt
+    record = {
+        "user_id": user_id,
+        "question_id": question_id,
+        "attempt_type": "photo",
+        "photo_url": photo_url,
+        "photo_ocr_text": ocr_text,
+        "is_correct": grade_result.get("is_correct", False),
+        "feedback": grade_result.get("feedback", ""),
+        "error_at_step": grade_result.get("error_at_step"),
+        "in_error_book": not grade_result.get("is_correct", False),
+    }
+    result = sb.table("user_attempts").insert(record).execute()
+
+    return {
+        "attempt": result.data[0],
+        "ocr_text": ocr_text,
+        "grade": grade_result,
+    }
+
+
+@router.get("/error-book")
+async def get_error_book(
+    course_code: str | None = None,
+    user_id: str = Depends(get_current_user_id),
+):
+    """获取错题本"""
+    sb = get_supabase()
+    attempts = (
+        sb.table("user_attempts")
+        .select("*")
+        .eq("user_id", user_id)
+        .eq("in_error_book", True)
+        .eq("mastered", False)
+        .order("created_at", desc=True)
+        .execute()
+        .data
+    )
+    if not attempts:
+        return []
+
+    question_ids = list({attempt["question_id"] for attempt in attempts})
+    questions = (
+        sb.table("paper_questions")
+        .select("*")
+        .in_("id", question_ids)
+        .execute()
+        .data
+    )
+    questions_by_id = {question["id"]: question for question in questions}
+
+    paper_ids = list({question["paper_id"] for question in questions})
+    papers = (
+        sb.table("papers")
+        .select("id, course_code, year, term, exam_type, part_label")
+        .in_("id", paper_ids)
+        .execute()
+        .data
+    )
+    papers_by_id = {paper["id"]: paper for paper in papers}
+
+    enriched = []
+    for attempt in attempts:
+        question = questions_by_id.get(attempt["question_id"])
+        if not question:
+            continue
+        paper = papers_by_id.get(question["paper_id"])
+        if course_code and paper and paper.get("course_code") != course_code.upper():
+            continue
+
+        enriched.append(
+            {
+                **attempt,
+                "paper_questions": {
+                    **question,
+                    "paper": paper,
+                },
+            }
+        )
+    return enriched
+
+
+@router.get("/by-paper/{paper_id}")
+async def get_paper_attempts(paper_id: str, user_id: str = Depends(get_current_user_id)):
+    """获取某张试卷所有题目的最新判卷记录"""
+    sb = get_supabase()
+    attempts = (
+        sb.table("user_attempts")
+        .select("question_id, is_correct, feedback, photo_ocr_text, attempt_type, created_at")
+        .eq("user_id", user_id)
+        .order("created_at", desc=True)
+        .execute()
+        .data
+    )
+    # 只保留 photo 类型的，且只保留每题最新一条
+    question_ids = (
+        sb.table("paper_questions")
+        .select("id")
+        .eq("paper_id", paper_id)
+        .execute()
+        .data
+    )
+    qid_set = {q["id"] for q in question_ids}
+    seen: set[str] = set()
+    result = []
+    for a in attempts:
+        if a["question_id"] not in qid_set:
+            continue
+        if a["question_id"] in seen:
+            continue
+        if a["attempt_type"] != "photo":
+            continue
+        seen.add(a["question_id"])
+        result.append(a)
+    return result
+
+
+@router.patch("/{attempt_id}")
+async def update_attempt(attempt_id: str, data: AttemptUpdate):
+    """更新错题状态（标记掌握等）"""
+    sb = get_supabase()
+    update = {}
+    if data.in_error_book is not None:
+        update["in_error_book"] = data.in_error_book
+    if data.mastered is not None:
+        update["mastered"] = data.mastered
+    if not update:
+        raise HTTPException(status_code=400, detail="Nothing to update")
+
+    result = sb.table("user_attempts").update(update).eq("id", attempt_id).execute()
+    if not result.data:
+        raise HTTPException(status_code=404, detail="Attempt not found")
+    return result.data[0]
--- a/backend/app/routers/papers.py
+++ b/backend/app/routers/papers.py
@@ -0,0 +1,142 @@
+"""试卷上传 + 处理管线"""
+
+import asyncio
+import threading
+from fastapi import APIRouter, UploadFile, File, Form, HTTPException, Depends
+from app.services.supabase_client import get_supabase
+from app.services.text_extractor import extract_pdf, get_full_text
+from app.services.paper_processor import process_paper
+from app.dependencies.auth import get_current_user_id
+
+router = APIRouter()
+
+
+def _upload_and_process_sync(
+    paper_id: str,
+    storage_path: str,
+    paper_bytes: bytes,
+    answer_bytes: bytes | None,
+):
+    """在独立线程中运行：Storage 上传 + AI 处理"""
+    sb = get_supabase()
+    try:
+        paper_storage_path = f"{storage_path}/paper.pdf"
+        sb.storage.from_("papers").upload(
+            paper_storage_path, paper_bytes,
+            file_options={"content-type": "application/pdf", "upsert": "true"},
+        )
+        paper_url = sb.storage.from_("papers").get_public_url(paper_storage_path)
+
+        update_data: dict = {"paper_file_url": paper_url}
+
+        if answer_bytes:
+            answer_storage_path = f"{storage_path}/answer.pdf"
+            sb.storage.from_("papers").upload(
+                answer_storage_path, answer_bytes,
+                file_options={"content-type": "application/pdf", "upsert": "true"},
+            )
+            update_data["answer_file_url"] = sb.storage.from_("papers").get_public_url(answer_storage_path)
+
+        sb.table("papers").update(update_data).eq("id", paper_id).execute()
+    except Exception:
+        pass
+
+    # process_paper 是 async，在新事件循环里跑
+    asyncio.run(process_paper(paper_id, paper_bytes, answer_bytes))
+
+
+@router.get("/")
+async def list_papers():
+    """获取试卷列表（公共资产，所有用户共享）"""
+    sb = get_supabase()
+    return (
+        sb.table("papers")
+        .select("id, course_code, year, term, exam_type, status, question_count, total_score, difficulty_level, processing_step, processing_progress, processing_total, created_at")
+        .order("created_at", desc=True)
+        .execute()
+        .data
+    )
+
+
+@router.get("/mine")
+async def my_papers(user_id: str = Depends(get_current_user_id)):
+    """当前用户上传的试卷（含 processing 状态）"""
+    sb = get_supabase()
+    return (
+        sb.table("papers")
+        .select("id, course_code, year, term, exam_type, part_label, status, question_count, processing_step, processing_progress, processing_total, created_at")
+        .eq("user_id", user_id)
+        .order("created_at", desc=True)
+        .execute()
+        .data
+    )
+
+
+@router.post("/upload")
+async def upload_paper(
+    paper_file: UploadFile = File(...),
+    answer_file: UploadFile | None = File(None),
+    course_code: str = Form(...),
+    year: int = Form(...),
+    term: str = Form(...),
+    exam_type: str = Form(...),
+    user_id: str = Depends(get_current_user_id),
+):
+    """上传试卷 PDF（可选答案 PDF），触发后台处理"""
+    sb = get_supabase()
+
+    # 1. 读取文件内容（已在内存中，快）
+    paper_bytes = await paper_file.read()
+    answer_bytes = await answer_file.read() if answer_file else None
+
+    # 2. 立即创建记录（status=processing），马上返回
+    storage_path = f"{course_code.upper()}/{year}_{term}_{exam_type}"
+    paper_record = sb.table("papers").insert({
+        "user_id": user_id,
+        "course_code": course_code.upper(),
+        "year": year,
+        "term": term,
+        "exam_type": exam_type,
+        "paper_file_url": "",   # 后台上传后更新
+        "answer_file_url": None,
+        "status": "processing",
+    }).execute()
+
+    paper_id = paper_record.data[0]["id"]
+
+    # 3. 在独立线程中运行，完全不阻塞事件循环
+    threading.Thread(
+        target=_upload_and_process_sync,
+        args=(paper_id, storage_path, paper_bytes, answer_bytes),
+        daemon=True,
+    ).start()
+
+    return {
+        "paper_id": paper_id,
+        "status": "processing",
+        "message": "试卷已上传，正在处理中...",
+    }
+
+
+@router.get("/{paper_id}")
+async def get_paper(paper_id: str):
+    """获取试卷信息 + 处理状态"""
+    sb = get_supabase()
+    result = sb.table("papers").select("*").eq("id", paper_id).execute()
+    if not result.data:
+        raise HTTPException(status_code=404, detail="Paper not found")
+    return result.data[0]
+
+
+@router.get("/{paper_id}/questions")
+async def get_questions(paper_id: str):
+    """获取试卷的所有题目（含 AI 三件套）"""
+    sb = get_supabase()
+    result = (
+        sb.table("paper_questions")
+        .select("*")
+        .eq("paper_id", paper_id)
+        .order("display_order")
+        .execute()
+    )
+    return result.data
--- a/backend/app/routers/questions.py
+++ b/backend/app/routers/questions.py
@@ -0,0 +1,325 @@
+"""题目相关：变式题生成 + 相似题召回"""
+
+from __future__ import annotations
+
+import asyncio
+import time
+from fastapi import APIRouter, HTTPException, Depends
+from pydantic import BaseModel
+from app.services.supabase_client import get_supabase
+from app.services.grader import generate_variant
+from app.dependencies.auth import get_current_user_id
+
+# Simple in-memory cache: question_id → (timestamp, result)
+_similar_cache: dict[str, tuple[float, list]] = {}
+_CACHE_TTL = 300  # 5 minutes
+
+
+class VariantUpdate(BaseModel):
+    favorited: bool | None = None
+
+router = APIRouter()
+
+
+def normalized_labels(values: list[str] | None) -> dict[str, str]:
+    labels: dict[str, str] = {}
+    for value in values or []:
+        if value:
+            labels[value.lower()] = value
+    return labels
+
+
+def question_family(question: dict) -> str:
+    return question.get("question_format") or question.get("question_type") or "unknown"
+
+
+def display_topics(question: dict) -> list[str]:
+    labels: list[str] = []
+    analytics_topic = question.get("analytics_topic")
+    if analytics_topic:
+        labels.append(analytics_topic)
+    for topic in question.get("topic_tags") or []:
+        if topic and topic not in labels:
+            labels.append(topic)
+    if labels:
+        return labels
+    for topic in question.get("topics") or []:
+        if topic and topic not in labels:
+            labels.append(topic)
+    return labels
+
+
+def similarity_score(
+    target: dict,
+    candidate: dict,
+    text_score: float = 0.0,
+) -> tuple[int, list[str]]:
+    score = 0
+    reasons: list[str] = []
+
+    # Primary topic bucket: 40 pts
+    target_topic = target.get("analytics_topic")
+    candidate_topic = candidate.get("analytics_topic")
+    if target_topic and target_topic == candidate_topic:
+        score += 40
+        reasons.append(f"Same topic: {target_topic}")
+
+    # Concept overlap: up to 20 pts
+    target_topics = normalized_labels(target.get("topic_tags"))
+    candidate_topics = normalized_labels(candidate.get("topic_tags"))
+    shared_topics = sorted(set(target_topics) & set(candidate_topics))
+    if shared_topics:
+        score += min(len(shared_topics) * 10, 20)
+        # Only show concept reason if analytics_topic didn't already match (avoid redundancy)
+        if not (target_topic and target_topic == candidate_topic):
+            reasons.append(
+                "Shared concept: "
+                + ", ".join(target_topics[key] for key in shared_topics[:2])
+            )
+
+    # Skill overlap: up to 20 pts
+    target_skills = normalized_labels(target.get("skill_tags"))
+    candidate_skills = normalized_labels(candidate.get("skill_tags"))
+    shared_skills = sorted(set(target_skills) & set(candidate_skills))
+    if shared_skills:
+        score += min(len(shared_skills) * 10, 20)
+        reasons.append(
+            "Shared skill: "
+            + ", ".join(target_skills[key] for key in shared_skills[:2])
+        )
+
+    # Same question format: 10 pts
+    if question_family(candidate) == question_family(target):
+        score += 10
+        reasons.append("Same format")
+
+    # Same difficulty: 5 pts
+    if candidate.get("difficulty") and candidate.get("difficulty") == target.get("difficulty"):
+        score += 5
+        reasons.append("Same difficulty")
+
+    # Full-text similarity from PostgreSQL ts_rank_cd: up to 20 pts
+    if text_score > 0:
+        text_pts = min(round(text_score * 60), 20)
+        score += text_pts
+        if text_pts >= 4:
+            reasons.append("Similar wording")
+
+    return min(score, 99), reasons
+
+
+@router.get("/variants/favorited")
+async def get_favorited_variants(user_id: str = Depends(get_current_user_id)):
+    """获取用户收藏的所有 variant（用于 Error Book）"""
+    sb = get_supabase()
+    rows = (
+        sb.table("question_variants")
+        .select("*, paper_questions(question_number, paper_id, papers(id, course_code, year, term, exam_type, part_label))")
+        .eq("user_id", user_id)
+        .eq("favorited", True)
+        .order("created_at", desc=True)
+        .execute()
+        .data
+    )
+    return rows
+
+
+@router.post("/{question_id}/variant")
+async def create_variant(question_id: str, user_id: str = Depends(get_current_user_id)):
+    """生成变式题并入库"""
+    sb = get_supabase()
+    result = sb.table("paper_questions").select("*").eq("id", question_id).execute()
+    if not result.data:
+        raise HTTPException(status_code=404, detail="Question not found")
+
+    question = result.data[0]
+    variant_data = await asyncio.to_thread(generate_variant, question)
+    variant_data["knowledge_reminder"] = question.get("knowledge_reminder", "")
+
+    saved = sb.table("question_variants").insert({
+        "user_id": user_id,
+        "source_question_id": question_id,
+        "variant_data": variant_data,
+        "favorited": False,
+    }).execute()
+
+    row = saved.data[0]
+    row["source_question_number"] = question["question_number"]
+    return row
+
+
+@router.get("/{question_id}/variants")
+async def list_variants(question_id: str, user_id: str = Depends(get_current_user_id)):
+    """获取某道题的用户所有 variant"""
+    sb = get_supabase()
+    q_result = sb.table("paper_questions").select("question_number").eq("id", question_id).execute()
+    question_number = q_result.data[0]["question_number"] if q_result.data else ""
+
+    rows = (
+        sb.table("question_variants")
+        .select("*")
+        .eq("user_id", user_id)
+        .eq("source_question_id", question_id)
+        .order("created_at", desc=True)
+        .execute()
+        .data
+    )
+    for row in rows:
+        row["source_question_number"] = question_number
+    return rows
+
+
+@router.patch("/variant/{variant_id}")
+async def update_variant(variant_id: str, data: VariantUpdate, user_id: str = Depends(get_current_user_id)):
+    """更新 variant（收藏/取消收藏）"""
+    sb = get_supabase()
+    update: dict = {}
+    if data.favorited is not None:
+        update["favorited"] = data.favorited
+    if not update:
+        raise HTTPException(status_code=400, detail="Nothing to update")
+
+    result = (
+        sb.table("question_variants")
+        .update(update)
+        .eq("id", variant_id)
+        .eq("user_id", user_id)
+        .execute()
+    )
+    if not result.data:
+        raise HTTPException(status_code=404, detail="Variant not found")
+    return result.data[0]
+
+
+@router.delete("/variant/{variant_id}", status_code=204)
+async def delete_variant(variant_id: str, user_id: str = Depends(get_current_user_id)):
+    """删除 variant"""
+    sb = get_supabase()
+    sb.table("question_variants").delete().eq("id", variant_id).eq("user_id", user_id).execute()
+
+
+@router.get("/{question_id}/similar")
+async def get_similar_questions(question_id: str, limit: int = 6):
+    """Retrieve similar questions from the same course."""
+    # Cache hit
+    cached = _similar_cache.get(question_id)
+    if cached and (time.time() - cached[0]) < _CACHE_TTL:
+        return cached[1][:max(1, min(limit, 12))]
+
+    sb = get_supabase()
+    result = sb.table("paper_questions").select("*, similar_questions").eq("id", question_id).execute()
+    if not result.data:
+        raise HTTPException(status_code=404, detail="Question not found")
+
+    target = result.data[0]
+
+    # Return pre-computed immediately; schedule background refresh
+    if target.get("similar_questions"):
+        precomputed = target["similar_questions"]
+        _similar_cache[question_id] = (time.time(), precomputed)
+        return precomputed[:max(1, min(limit, 12))]
+
+    paper_result = sb.table("papers").select("id, course_code").eq("id", target["paper_id"]).execute()
+    # (fallback: compute on-the-fly for questions not yet backfilled)
+    if not paper_result.data:
+        raise HTTPException(status_code=404, detail="Paper not found")
+
+    course_code = paper_result.data[0]["course_code"]
+    papers = (
+        sb.table("papers")
+        .select("id, course_code, year, term, exam_type, part_label")
+        .eq("course_code", course_code)
+        .eq("status", "ready")
+        .execute()
+        .data
+    )
+    paper_ids = [paper["id"] for paper in papers if paper["id"] != target["paper_id"]]
+    if not paper_ids:
+        return []
+
+    papers_by_id = {paper["id"]: paper for paper in papers}
+
+    # Pre-filter by analytics_topic in DB when possible (cuts candidates from ~250 to ~30)
+    candidates_query = (
+        sb.table("paper_questions")
+        .select(
+            "id, paper_id, question_number, question_type, question_format, "
+            "question_text, score, topics, analytics_topic, topic_tags, skill_tags, "
+            "difficulty, knowledge_reminder, ai_hint, solution"
+        )
+        .in_("paper_id", paper_ids)
+    )
+    target_topic = target.get("analytics_topic")
+    if target_topic:
+        candidates_query = candidates_query.eq("analytics_topic", target_topic)
+
+    candidates = candidates_query.execute().data
+    if not candidates:
+        return []
+
+    # Batch full-text scores from PostgreSQL (skip if too many candidates — slow)
+    text_scores: dict[str, float] = {}
+    if len(candidates) <= 50:
+        try:
+            rpc_result = sb.rpc(
+                "text_similarity_scores",
+                {
+                    "query_text": target.get("question_text") or "",
+                    "candidate_ids": [c["id"] for c in candidates],
+                },
+            ).execute()
+            for row in rpc_result.data or []:
+                text_scores[row["question_id"]] = float(row["text_score"] or 0)
+        except Exception:
+            pass
+
+    ranked = []
+    for candidate in candidates:
+        text_score = text_scores.get(candidate["id"], 0.0)
+        match_percent, reasons = similarity_score(target, candidate, text_score)
+        if match_percent < 20:
+            continue
+        paper = papers_by_id.get(candidate["paper_id"], {})
+        source = (
+            f"{paper.get('year', '')} {paper.get('term', '').title()} "
+            f"{paper.get('exam_type', '').title()}"
+        ).strip()
+        if paper.get("part_label"):
+            source = f"{source} Part {paper['part_label']}"
+        ranked.append(
+            {
+                "id": candidate["id"],
+                "paper_id": candidate["paper_id"],
+                "source": source,
+                "question_number": candidate["question_number"],
+                "match_percent": match_percent,
+                "match_reasons": reasons,
+                "question_type": question_family(candidate),
+                "question_text": candidate["question_text"],
+                "topics": display_topics(candidate),
+                "difficulty": candidate.get("difficulty"),
+                "knowledge_reminder": candidate.get("knowledge_reminder", ""),
+                "ai_hint": candidate.get("ai_hint", ""),
+                "solution": candidate.get("solution", ""),
+            }
+        )
+
+    ranked.sort(key=lambda item: (-item["match_percent"], item["source"], item["question_number"]))
+
+    # Keep only the best-scoring question per paper
+    seen_papers: set[str] = set()
+    deduped = []
+    for item in ranked:
+        if item["paper_id"] not in seen_papers:
+            seen_papers.add(item["paper_id"])
+            deduped.append(item)
+
+    _similar_cache[question_id] = (time.time(), deduped)
+
+    # Persist to DB so future requests are instant
+    try:
+        sb.table("paper_questions").update({"similar_questions": deduped}).eq("id", question_id).execute()
+    except Exception:
+        pass
+
+    return deduped[:max(1, min(limit, 12))]
--- a/backend/app/services/init.py
+++ b/backend/app/services/init.py
--- a/backend/app/services/grader.py
+++ b/backend/app/services/grader.py
@@ -0,0 +1,146 @@
+"""OCR, grading, and variant generation prompts"""
+
+import json
+import base64
+from app.services.llm_clients import get_vision_client, get_deepseek_client
+
+OCR_PROMPT = """You are an expert at recognizing handwritten answers. Analyze this photo of a student's handwritten answer and extract the text and mathematical formulas.
+
+Requirements:
+- Faithfully extract what the student wrote, do not modify or correct
+- Use LaTeX format for math formulas (e.g. $x^2 + 1$)
+- If there are multiple steps, list them in original order
+- If some handwriting is unclear, mark with [unclear]
+
+Return only the extracted text, no additional explanation."""
+
+GRADING_PROMPT = """You are an expert academic grader. Grade the following student answer. ALL output must be in English.
+
+Question info:
+- Number: {question_number}
+- Type: {question_type}
+- Question: {question_text}
+- Score: {score}
+
+Reference answer / solution:
+{reference_answer}
+
+Student answer:
+{student_answer}
+
+Grade and return JSON:
+{{
+  "is_correct": true/false,
+  "score_given": 0-{score},
+  "feedback": "<HTML> Step-by-step analysis of the student's answer, pointing out correct parts and errors, using KaTeX formulas </HTML>",
+  "error_at_step": null or the step number where errors begin (integer)
+}}
+
+Grading rules:
+- MC / fill-blank: only correct if answer matches exactly
+- Long questions: give partial credit for correct steps even if the final answer is wrong
+- feedback in HTML format, supports KaTeX ($..$ inline, $$...$$ block)
+- Mark errors with <div class="common-error">...</div>
+- Identify exactly which step the error starts"""
+
+VARIANT_PROMPT = """You are an expert exam question creator. Generate a similar but different variant question based on the original below. ALL output must be in English.
+
+Original question info:
+- Type: {question_type}
+- Question: {question_text}
+- Topics: {topics}
+- Difficulty: {difficulty}
+- Reference answer: {answer}
+
+Requirements:
+- Variant must test the same knowledge points at similar difficulty
+- Data/scenario/wording must differ — don't just change numbers
+- Must provide a complete correct answer
+
+Format requirements (CRITICAL):
+- All text in HTML format, absolutely NO markdown syntax
+- Code: <pre><code class="language-xxx">...</code></pre>, NOT ```
+- Math: $...$ (inline) or $$...$$ (block), KaTeX compatible
+- Line breaks: <br>, paragraphs: <p>
+
+Return JSON:
+{{
+  "question_text": "HTML formatted variant question",
+  "question_type": "{question_type}",
+  "options": [MC only, format {{"label":"A","text":"..."}}, ...] or null,
+  "correct_answer": "Correct answer (plain text)",
+  "ai_hint": "HTML formatted hint that guides thinking WITHOUT giving the answer",
+  "solution": "HTML formatted complete step-by-step solution"
+}}"""
+
+
+def ocr_photo(photo_bytes: bytes) -> str:
+    """Gemini Vision OCR for handwritten answers"""
+    client = get_vision_client()
+    b64 = base64.b64encode(photo_bytes).decode("utf-8")
+
+    resp = client.chat.completions.create(
+        model="gemini-2.5-flash",
+        messages=[
+            {"role": "system", "content": OCR_PROMPT},
+            {"role": "user", "content": [
+                {"type": "image_url", "image_url": {
+                    "url": f"data:image/jpeg;base64,{b64}",
+                }},
+            ]},
+        ],
+        temperature=0,
+        max_tokens=2000,
+    )
+    return resp.choices[0].message.content or ""
+
+
+def grade_answer(question: dict, student_answer: str) -> dict:
+    """Qwen grades student answer"""
+    reference = question.get("raw_answer_text") or question.get("solution") or "No reference answer"
+    score = question.get("score") or "unknown"
+
+    ds = get_deepseek_client()
+    resp = ds.chat.completions.create(
+        model="deepseek-chat",
+        messages=[
+            {"role": "system", "content": GRADING_PROMPT.format(
+                question_number=question["question_number"],
+                question_type=question["question_type"],
+                question_text=question["question_text"],
+                score=score,
+                reference_answer=reference,
+                student_answer=student_answer,
+            )},
+        ],
+        temperature=0.2,
+        response_format={"type": "json_object"},
+    )
+    return json.loads(resp.choices[0].message.content)
+
+
+def generate_variant(question: dict) -> dict:
+    """Gemini generates a variant question"""
+    answer = (
+        question.get("correct_option")
+        or question.get("correct_answer")
+        or question.get("raw_answer_text")
+        or "N/A"
+    )
+
+    ds = get_deepseek_client()
+    resp = ds.chat.completions.create(
+        model="deepseek-chat",
+        messages=[
+            {"role": "system", "content": VARIANT_PROMPT.format(
+                question_type=question["question_type"],
+                question_text=question["question_text"],
+                topics=", ".join(question.get("topics", [])),
+                difficulty=question.get("difficulty", "medium"),
+                answer=answer,
+            )},
+        ],
+        temperature=0.5,
+        response_format={"type": "json_object"},
+    )
+    return json.loads(resp.choices[0].message.content)
--- a/backend/app/services/llm_clients.py
+++ b/backend/app/services/llm_clients.py
@@ -0,0 +1,74 @@
+import httpx
+from openai import OpenAI
+from app.config import get_settings
+
+_TIMEOUT = httpx.Timeout(connect=10, read=300, write=60, pool=10)
+
+_gpt_client: OpenAI | None = None
+_qwen_client: OpenAI | None = None
+_gemini_flash_client: OpenAI | None = None
+_gemini_lite_client: OpenAI | None = None
+_deepseek_client: OpenAI | None = None
+
+
+def get_gpt_client() -> OpenAI:
+    """laozhang API — gpt-4o / gpt-4o-mini"""
+    global _gpt_client
+    if _gpt_client is None:
+        s = get_settings()
+        _gpt_client = OpenAI(
+            base_url=s.laozhang_base_url,
+            api_key=s.laozhang_api_key,
+        )
+    return _gpt_client
+
+
+def get_qwen_client() -> OpenAI:
+    """DashScope — qwen-plus"""
+    global _qwen_client
+    if _qwen_client is None:
+        s = get_settings()
+        _qwen_client = OpenAI(
+            base_url=s.dashscope_base_url,
+            api_key=s.dashscope_api_key,
+        )
+    return _qwen_client
+
+
+def get_vision_client() -> OpenAI:
+    """Google Gemini 官方 API（视觉，用于拆题+OCR）— 部署在新加坡可用"""
+    global _gemini_flash_client
+    if _gemini_flash_client is None:
+        s = get_settings()
+        _gemini_flash_client = OpenAI(
+            base_url="https://generativelanguage.googleapis.com/v1beta/openai/",
+            api_key=s.google_gemini_api_key,
+            timeout=_TIMEOUT,
+        )
+    return _gemini_flash_client
+
+
+def get_gemini_lite_client() -> OpenAI:
+    """laozhang — gemini-3.1-flash-lite-preview（轻量，用于 AI trio）"""
+    global _gemini_lite_client
+    if _gemini_lite_client is None:
+        s = get_settings()
+        _gemini_lite_client = OpenAI(
+            base_url=s.laozhang_base_url,
+            api_key=s.laozhang_api_key,
+            timeout=_TIMEOUT,
+        )
+    return _gemini_lite_client
+
+
+def get_deepseek_client() -> OpenAI:
+    """DeepSeek — deepseek-chat（用于 AI trio）"""
+    global _deepseek_client
+    if _deepseek_client is None:
+        s = get_settings()
+        _deepseek_client = OpenAI(
+            base_url=s.deepseek_base_url,
+            api_key=s.deepseek_api_key,
+            timeout=_TIMEOUT,
+        )
+    return _deepseek_client
--- a/backend/app/services/paper_processor.py
+++ b/backend/app/services/paper_processor.py
@@ -0,0 +1,576 @@
+"""试卷处理管线：PDF → 结构化题目 → AI 三件套（Vision 模式）"""
+
+import asyncio
+import base64
+import io
+import json
+import re
+import traceback
+from contextlib import redirect_stdout
+import fitz  # pymupdf
+from app.services.supabase_client import get_supabase
+from app.services.llm_clients import get_vision_client, get_deepseek_client
+
+
+def strip_nulls(obj):
+    """Recursively remove \\u0000 null bytes from strings (PostgreSQL rejects them)."""
+    if isinstance(obj, str):
+        return obj.replace("\u0000", "")
+    if isinstance(obj, dict):
+        return {k: strip_nulls(v) for k, v in obj.items()}
+    if isinstance(obj, list):
+        return [strip_nulls(i) for i in obj]
+    return obj
+
+
+# ============================================
+# Prompts
+# ============================================
+
+STRUCTURE_PROMPT = """You are an expert exam paper structure analyst. You are given images of a past exam paper. Analyze every page carefully and extract all questions into structured JSON.
+All generated values must be in English. Do not output Chinese.
+
+CRITICAL RULES for question_text:
+- Each question's question_text must be FULLY SELF-CONTAINED. Include ALL context needed to solve it.
+- For sub-questions (e.g. (a)(i)), copy the ENTIRE parent question setup (variable definitions, code blocks, problem description) into the question_text, then append the specific sub-question.
+- For Python/code questions: include ALL variable definitions and import statements verbatim, exactly as they appear in the exam, preserving multi-line arrays and data structures completely.
+- Never truncate code. If a variable is defined across multiple lines (e.g. a numpy array), include every line.
+
+Output JSON format (strictly follow):
+{
+  "total_score": 100,
+  "difficulty_level": "medium",
+  "topics_summary": {"Topic A": 40, "Topic B": 30, "Topic C": 30},
+  "questions": [
+    {
+      "question_number": "1a",
+      "parent_question": "1",
+      "question_type": "mc",
+      "question_text": "Original question text...",
+      "score": 5,
+      "page_number": 1,
+      "options": [{"label": "A", "text": "Option content"}, {"label": "B", "text": "..."}],
+      "topics": ["Linked List", "Pointer"],
+      "difficulty": "easy"
+    },
+    {
+      "question_number": "2",
+      "parent_question": null,
+      "question_type": "long_question",
+      "question_text": "Original question text...",
+      "score": 15,
+      "page_number": 2,
+      "options": null,
+      "topics": ["Recursion"],
+      "difficulty": "hard"
+    }
+  ]
+}
+
+Rules:
+- question_type must be one of: "mc" (multiple choice), "true_false" (true/false), "fill_blank" (fill in blank), "long_question" (long question)
+- True/False questions MUST use "true_false" type, with options set to [{"label":"True","text":"True"},{"label":"False","text":"False"}], correct_option as "True" or "False"
+- Multiple choice must extract the options array
+- Sub-questions use parent_question to link to parent: "1a" parent is "1"
+- Independent questions without sub-questions set parent_question to null
+- page_number inferred from where the question appears
+- topics inferred from the question content
+- difficulty: "easy" | "medium" | "hard"
+- Extract ALL questions, do not miss any
+- Keep topic labels in English only
+"""
+
+ANSWER_MATCH_PROMPT = """You are an expert exam answer matching specialist. Below is the answer text for an exam paper. Extract and match answers to their corresponding question numbers.
+All generated values must be in English. Do not output Chinese.
+
+Question structure:
+{questions_json}
+
+Answer text:
+{answer_text}
+
+Output JSON format:
+{{
+  "answers": [
+    {{
+      "question_number": "1a",
+      "correct_option": "B",
+      "correct_answer": null,
+      "raw_answer_text": "Original answer text..."
+    }},
+    {{
+      "question_number": "2",
+      "correct_option": null,
+      "correct_answer": null,
+      "raw_answer_text": "Complete solution process and answer..."
+    }}
+  ]
+}}
+
+Rules:
+- For MC questions, fill correct_option (e.g. "B")
+- For fill-blank questions, fill correct_answer (e.g. "O(n log n)")
+- For long questions, only fill raw_answer_text (complete solution process)
+- Match all questions where answers can be found
+- Keep raw_answer_text faithful to the source answer, but do not add Chinese commentary
+"""
+
+ANALYSIS_PROMPT = """You are an expert academic answer analyst. Generate three sections for the following exam question. ALL output must be in English.
+
+Question info:
+- Number: {question_number}
+- Type: {question_type}
+- Score: {score}
+- Question: {question_text}
+- Topics: {topics}
+{answer_section}
+
+Generate THREE sections in HTML format (supports KaTeX: block $$ ... $$ inline $ ... $):
+
+Output JSON:
+{{
+  "knowledge_reminder": "<HTML> Prerequisite knowledge points needed for this question, as a concise bullet list </HTML>",
+  "ai_hint": "<HTML> A hint that guides thinking direction WITHOUT giving away the answer </HTML>",
+  "solution": "<HTML> Complete step-by-step solution (Step 1, Step 2, ...) with derivations, formulas, and common mistake warnings </HTML>"
+}}
+
+Solution requirements:
+- Must include complete working process, not just the answer
+- Each step must have an explanation
+- If a reference answer is provided, derive the solution based on it
+- If no reference answer, work out the complete solution independently
+- For MC questions, explain why the correct option is right AND why others are wrong
+- Use <ol> or numbered steps
+- Mark common mistakes with <div class="common-error">...</div>
+
+KaTeX formula rules:
+- Block formula: $$ on its own line, with blank lines before and after
+- Inline formula: $x^2$ no line break
+- Matrix: \\begin{{bmatrix}} ... \\end{{bmatrix}}
+- Fraction: \\frac{{a}}{{b}}
+"""
+
+BATCH_ANALYSIS_PROMPT = """You are an expert academic answer analyst. Generate three study sections for each question below. ALL output must be in English.
+
+For every question, return:
+- knowledge_reminder: concise prerequisite bullets in HTML
+- ai_hint: a helpful hint in HTML without revealing the final answer
+- solution: a complete step-by-step solution in HTML
+
+Return JSON in this exact format:
+{{
+  "analyses": [
+    {{
+      "question_number": "1a",
+      "knowledge_reminder": "<HTML>...</HTML>",
+      "ai_hint": "<HTML>...</HTML>",
+      "solution": "<HTML>...</HTML>"
+    }}
+  ]
+}}
+
+Rules:
+- Return one item for every provided question_number
+- Keep each item matched to the same question_number
+- All text must be in English
+- HTML only, KaTeX compatible
+- For MC questions, explain why the correct option is right and why the others are wrong
+- For long questions, show a complete derivation or reasoning chain
+- Use <ol> or numbered steps in solution when appropriate
+- Mark common mistakes with <div class="common-error">...</div>
+- CRITICAL: When a question_text contains "[Context from parent question X]" followed by "[Sub-question Y]", the parent section is background context only. You MUST solve ONLY the specific sub-question labeled [Sub-question Y]. Do NOT solve other sub-questions listed in the parent context. Give one precise answer for that single sub-question only.
+
+Questions:
+{questions_payload}
+"""
+
+
+# ============================================
+# 处理管线
+# ============================================
+
+RETRYABLE_ERROR_MARKERS = (
+    "429",
+    "rate limit",
+    "rate_limit",
+    "too many requests",
+    "timeout",
+    "timed out",
+    "connection",
+)
+
+
+def is_retryable_error(exc: Exception) -> bool:
+    message = str(exc).lower()
+    return any(marker in message for marker in RETRYABLE_ERROR_MARKERS)
+
+
+def pdf_to_images(pdf_bytes: bytes, dpi: int = 96) -> list[str]:
+    """将 PDF 每页渲染为 base64 PNG 图片列表（96dpi 平衡清晰度与成本）"""
+    doc = fitz.open(stream=pdf_bytes, filetype="pdf")
+    images = []
+    mat = fitz.Matrix(dpi / 72, dpi / 72)
+    for page in doc:
+        pix = page.get_pixmap(matrix=mat, colorspace=fitz.csRGB)
+        img_bytes = pix.tobytes("png")
+        images.append(base64.b64encode(img_bytes).decode())
+    doc.close()
+    return images
+
+
+def parse_json_response(text: str) -> dict:
+    """解析模型返回的 JSON，兼容 markdown 代码块包装"""
+    text = text.strip()
+    # 去掉 ```json ... ``` 包装
+    if text.startswith("```"):
+        lines = text.splitlines()
+        text = "\n".join(lines[1:-1] if lines[-1].strip() == "```" else lines[1:])
+    # 移除 JSON 字符串中的非法控制字符（0x00-0x1F 除了 \t \n \r）
+    text = re.sub(r'[\x00-\x08\x0b\x0c\x0e-\x1f]', '', text)
+    # 修复模型返回的无效 JSON 转义序列：只修奇数个反斜杠后的非法字符
+    text = re.sub(r'(?<!\\)((?:\\\\)*)\\([^"\\/bfnrtu])', r'\1\\\\\2', text)
+    return json.loads(text)
+
+
+async def gemini_vision_json(
+    *,
+    system_prompt: str,
+    images: list[str],
+    user_text: str = "",
+    temperature: float = 0,
+    max_attempts: int = 6,
+) -> dict:
+    """发送图片 + prompt 给 Gemini vision 模型，返回 JSON"""
+    client = get_vision_client()
+    delay_seconds = 2
+
+    content: list = []
+    for b64 in images:
+        content.append({"type": "image_url", "image_url": {"url": f"data:image/png;base64,{b64}"}})
+    if user_text:
+        content.append({"type": "text", "text": user_text})
+
+    for attempt in range(1, max_attempts + 1):
+        try:
+            response = client.chat.completions.create(
+                model="gemini-2.5-flash",
+                messages=[
+                    {"role": "system", "content": system_prompt + "\n\nIMPORTANT: Your entire response must be valid JSON only. No markdown, no code fences, no extra text."},
+                    {"role": "user", "content": content},
+                ],
+                temperature=temperature,
+                max_tokens=16384,
+            )
+            return parse_json_response(response.choices[0].message.content)
+        except Exception as exc:
+            if attempt == max_attempts or not is_retryable_error(exc):
+                raise
+            await asyncio.sleep(delay_seconds)
+            delay_seconds = min(delay_seconds * 2, 30)
+
+
+async def deepseek_json_completion(
+    *,
+    system_prompt: str,
+    user_prompt: str | None = None,
+    temperature: float = 0,
+    max_attempts: int = 6,
+) -> dict:
+    """DeepSeek 纯文本 JSON completion（用于 AI trio 生成）"""
+    client = get_deepseek_client()
+    delay_seconds = 2
+
+    for attempt in range(1, max_attempts + 1):
+        try:
+            messages = [{"role": "system", "content": system_prompt}]
+            if user_prompt:
+                messages.append({"role": "user", "content": user_prompt})
+
+            response = client.chat.completions.create(
+                model="deepseek-chat",
+                messages=messages,
+                temperature=temperature,
+                max_tokens=8192,
+                response_format={"type": "json_object"},
+            )
+            raw = response.choices[0].message.content
+            raw = re.sub(r'[\x00-\x08\x0b\x0c\x0e-\x1f]', '', raw)
+            raw = re.sub(r'(?<!\\)((?:\\\\)*)\\([^"\\/bfnrtu])', r'\1\\\\\2', raw)
+            return json.loads(raw)
+        except Exception as exc:
+            if attempt == max_attempts or not is_retryable_error(exc):
+                raise
+            await asyncio.sleep(delay_seconds)
+            delay_seconds = min(delay_seconds * 2, 30)
+
+
+def chunked(items: list[dict], size: int) -> list[list[dict]]:
+    return [items[i:i + size] for i in range(0, len(items), size)]
+
+
+def _question_sort_key(qnum: str) -> tuple:
+    """自然排序题号：1a < 1b < ... < 1i < 1j < 2ai < 2aii < 10a"""
+    parts = re.findall(r'(\d+|[a-zA-Z]+|[()]+)', qnum)
+    key = []
+    for idx, p in enumerate(parts):
+        if p.isdigit():
+            key.append((0, int(p), ''))
+        elif p in ('(', ')'):
+            continue
+        else:
+            # Single letter (a-z): always sort alphabetically (a=1, b=2, ..., j=10)
+            if len(p) == 1 and p.isalpha():
+                key.append((1, ord(p.lower()) - ord('a') + 1, p))
+            else:
+                # Multi-letter: roman numerals for sub-sub-questions (i=1, ii=2, iii=3, ...)
+                romans = {'i':1,'ii':2,'iii':3,'iv':4,'v':5,'vi':6,'vii':7,'viii':8,'ix':9,'x':10,'xi':11,'xii':12,'xiii':13}
+                if p.lower() in romans:
+                    key.append((2, romans[p.lower()], p))
+                else:
+                    key.append((1, 0, p))
+    return tuple(key)
+
+
+def sort_questions(questions: list[dict]) -> list[dict]:
+    """按题号自然排序"""
+    return sorted(questions, key=lambda q: _question_sort_key(q.get("question_number", "")))
+
+
+def extract_code_block(text: str) -> str:
+    """
+    从题目文本中提取 Python 代码块。
+    策略：找到第一个明确的代码起始行（import/赋值/print），
+    然后把后续所有缩进或延续行一并带上，直到明显的非代码段落。
+    """
+    lines = text.splitlines()
+    result = []
+    in_code = False
+    open_brackets = 0
+
+    CODE_START = re.compile(r"^\s*(import |from \w|[A-Za-z_]\w*\s*=|print\()")
+
+    for line in lines:
+        stripped = line.strip()
+
+        # 已在代码块内：括号未闭合时继续收集
+        if in_code and open_brackets > 0:
+            result.append(stripped)
+            open_brackets += stripped.count("(") + stripped.count("[") + stripped.count("{")
+            open_brackets -= stripped.count(")") + stripped.count("]") + stripped.count("}")
+            continue
+
+        # 检测新的代码起始行
+        if CODE_START.match(line):
+            in_code = True
+            result.append(stripped)
+            open_brackets += stripped.count("(") + stripped.count("[") + stripped.count("{")
+            open_brackets -= stripped.count(")") + stripped.count("]") + stripped.count("}")
+            continue
+
+        # 非代码行：重置（但保留 in_code=True 以便继续接后续代码行）
+        in_code = False
+
+    return "\n".join(result)
+
+
+# 保持向后兼容
+extract_code_lines = extract_code_block
+
+
+def try_exec_python(code: str, shared_ns: dict) -> str | None:
+    """
+    在 shared_ns 命名空间中执行 code，捕获 stdout。
+    返回输出字符串，失败返回 None。
+    """
+    buf = io.StringIO()
+    try:
+        with redirect_stdout(buf):
+            exec(code, shared_ns)  # noqa: S102
+        output = buf.getvalue().strip()
+        return output if output else None
+    except Exception:
+        return None
+
+async def _resume_ai_trio(sb, paper_id: str, questions: list[dict]):
+    """为缺 solution 的题目生成 AI trio，逐条写回 DB。支持断点续传。"""
+    need = [q for q in questions if not q.get("solution")]
+    if not need:
+        # 全部已有 solution，直接标记完成
+        sb.table("papers").update({"status": "ready", "processing_step": None}).eq("id", paper_id).execute()
+        return
+
+    total_q = len(questions)
+    done_q = total_q - len(need)
+
+    # 构建 payload
+    id_map = {q["question_number"]: q["id"] for q in need}
+    # 需要完整的 question_text 来生成 AI trio
+    full_data = sb.table("paper_questions").select(
+        "id, question_number, question_type, question_text, score, correct_option, correct_answer, raw_answer_text"
+    ).eq("paper_id", paper_id).in_("id", [q["id"] for q in need]).execute().data
+
+    payloads = []
+    for q in full_data:
+        answer_section = q.get("raw_answer_text") or ""
+        if not answer_section and q.get("correct_option"):
+            answer_section = f"Correct option: {q['correct_option']}"
+        elif not answer_section and q.get("correct_answer"):
+            answer_section = f"Correct answer: {q['correct_answer']}"
+        payloads.append({
+            "question_number": q["question_number"],
+            "question_type": q["question_type"] or "long_question",
+            "score": q.get("score") or "unknown",
+            "question_text": q["question_text"] or "",
+            "reference_answer": answer_section,
+        })
+
+    batches = chunked(payloads, 3)
+    for batch_idx, batch in enumerate(batches, 1):
+        current = done_q + batch_idx * 3
+        _update_progress(sb, paper_id, f"Generating solutions ({min(current, total_q)}/{total_q} questions)", batch_idx, len(batches))
+        try:
+            result = await deepseek_json_completion(
+                system_prompt=BATCH_ANALYSIS_PROMPT.format(
+                    questions_payload=json.dumps(batch, ensure_ascii=False),
+                ),
+                temperature=0.3,
+            )
+            for item in result.get("analyses", []):
+                qnum = item.get("question_number")
+                qid = id_map.get(qnum)
+                if qid:
+                    sb.table("paper_questions").update({
+                        "knowledge_reminder": item.get("knowledge_reminder", ""),
+                        "ai_hint": item.get("ai_hint", ""),
+                        "solution": item.get("solution", ""),
+                    }).eq("id", qid).execute()
+        except Exception:
+            pass  # 单批失败不影响其他批
+        await asyncio.sleep(1)
+
+    # 标记完成
+    sb.table("papers").update({"status": "ready", "processing_step": None}).eq("id", paper_id).execute()
+
+
+def _update_progress(sb, paper_id: str, step: str, progress: int = 0, total: int = 0):
+    """更新处理进度到 DB"""
+    sb.table("papers").update({
+        "processing_step": step,
+        "processing_progress": progress,
+        "processing_total": total,
+    }).eq("id", paper_id).execute()
+
+
+async def process_paper(paper_id: str, paper_bytes: bytes, answer_bytes: bytes | None):
+    """后台处理管线: PDF pages → Vision 结构化 → AI 三件套
+
+    设计原则：每个步骤完成后立即持久化到 DB，支持断点续传。
+    """
+    sb = get_supabase()
+
+    try:
+        # 检查是否已有题目（断点续传场景）
+        existing = sb.table("paper_questions").select("id, question_number, solution").eq("paper_id", paper_id).execute().data
+
+        if existing:
+            # 已有题目 → 跳过提取，直接补 AI trio
+            await _resume_ai_trio(sb, paper_id, existing)
+            return
+
+        # ── Step 1: PDF → 图片 ──
+        _update_progress(sb, paper_id, "Rendering PDF pages...")
+        paper_images = pdf_to_images(paper_bytes)
+
+        # ── Step 2: Vision 结构化拆题 ──
+        PAGE_BATCH = 8
+        all_questions: list = []
+        meta: dict = {}
+        num_page_batches = -(-len(paper_images) // PAGE_BATCH)
+        for i in range(0, len(paper_images), PAGE_BATCH):
+            batch_imgs = paper_images[i:i + PAGE_BATCH]
+            batch_idx = i // PAGE_BATCH + 1
+            _update_progress(sb, paper_id, f"Reading pages {i+1}-{i+len(batch_imgs)}...", batch_idx, num_page_batches)
+            batch_result = await gemini_vision_json(
+                system_prompt=STRUCTURE_PROMPT,
+                images=batch_imgs,
+                user_text=f"Pages {i+1}-{i+len(batch_imgs)} of the exam paper. Extract all questions visible on these pages.",
+                temperature=0,
+            )
+            if not meta:
+                meta = {k: batch_result.get(k) for k in ("total_score", "difficulty_level", "topics_summary")}
+            all_questions.extend(batch_result.get("questions", []))
+
+        all_questions = sort_questions(all_questions)
+        questions = all_questions
+
+        # 更新 paper 概览
+        sb.table("papers").update({
+            "total_score": meta.get("total_score"),
+            "question_count": len(questions),
+            "topics_summary": meta.get("topics_summary"),
+            "difficulty_level": meta.get("difficulty_level"),
+        }).eq("id", paper_id).execute()
+
+        # ── Step 3: 答案匹配（分批，失败跳过）──
+        answers_map = {}
+        if answer_bytes:
+            _update_progress(sb, paper_id, "Matching answers...")
+            try:
+                answer_images = pdf_to_images(answer_bytes)
+                questions_json = json.dumps(
+                    [{"question_number": q["question_number"], "question_type": q["question_type"]}
+                     for q in questions], ensure_ascii=False,
+                )
+                all_answers: list = []
+                for ai in range(0, len(answer_images), 8):
+                    batch_ans_imgs = answer_images[ai:ai + 8]
+                    try:
+                        match_result = await gemini_vision_json(
+                            system_prompt=ANSWER_MATCH_PROMPT.format(
+                                questions_json=questions_json, answer_text="(See images)",
+                            ),
+                            images=batch_ans_imgs,
+                            user_text=f"Match answers to these questions: {questions_json}",
+                            temperature=0,
+                        )
+                        all_answers.extend(match_result.get("answers", []))
+                    except Exception:
+                        pass
+                answers_map = {a["question_number"]: a for a in all_answers}
+            except Exception:
+                pass
+
+        # ── Step 4: 立即写入题目到 DB（先不含 AI trio）──
+        _update_progress(sb, paper_id, "Saving questions...")
+        for i, q in enumerate(questions):
+            qnum = q["question_number"]
+            answer = answers_map.get(qnum, {})
+            sb.table("paper_questions").insert(strip_nulls({
+                "paper_id": paper_id,
+                "question_number": qnum,
+                "parent_question": q.get("parent_question"),
+                "display_order": i,
+                "question_type": q["question_type"],
+                "question_text": q["question_text"],
+                "score": q.get("score"),
+                "page_number": q.get("page_number"),
+                "options": q.get("options"),
+                "correct_option": answer.get("correct_option"),
+                "correct_answer": answer.get("correct_answer"),
+                "raw_answer_text": answer.get("raw_answer_text"),
+                "topics": q.get("topics", []),
+                "analytics_topic": q.get("topics", [None])[0],
+                "topic_tags": q.get("topics", []),
+                "difficulty": q.get("difficulty"),
+            })).execute()
+
+        # ── Step 5: AI trio（逐条更新，支持断点续传）──
+        saved = sb.table("paper_questions").select("id, question_number, solution").eq("paper_id", paper_id).execute().data
+        await _resume_ai_trio(sb, paper_id, saved)
+
+    except Exception as e:
+        sb.table("papers").update({
+            "status": "error",
+            "error_message": f"{type(e).__name__}: {str(e)}\n{traceback.format_exc()[-500:]}",
+        }).eq("id", paper_id).execute()
+        raise
--- a/backend/app/services/supabase_client.py
+++ b/backend/app/services/supabase_client.py
@@ -0,0 +1,13 @@
+from supabase import create_client, Client
+from app.config import get_settings
+
+_client: Client | None = None
+
+
+def get_supabase() -> Client:
+    """获取 Supabase client (service_role，绕过 RLS)"""
+    global _client
+    if _client is None:
+        s = get_settings()
+        _client = create_client(s.supabase_url, s.supabase_service_role_key)
+    return _client
--- a/backend/app/services/text_extractor.py
+++ b/backend/app/services/text_extractor.py
@@ -0,0 +1,48 @@
+"""PDF 文本提取 — 复用 SOS 的 text_extractor 逻辑"""
+
+import base64
+import fitz  # PyMuPDF
+from dataclasses import dataclass
+
+
+@dataclass
+class ExtractedContent:
+    pages_text: list[str]           # 每页文本
+    page_images: dict[int, str]     # 页码 → base64 图片（图片密集型页面）
+    total_pages: int
+    has_images: bool
+
+
+def extract_pdf(file_bytes: bytes) -> ExtractedContent:
+    """从 PDF 提取文本和图片"""
+    doc = fitz.open(stream=file_bytes, filetype="pdf")
+    pages_text = []
+    page_images = {}
+
+    for i, page in enumerate(doc):
+        text = page.get_text("text")
+        pages_text.append(text)
+
+        # 如果某页文本很少但有图片，可能是扫描件 → 保存为图片用于 Vision OCR
+        if len(text.strip()) < 50:
+            pix = page.get_pixmap(dpi=200)
+            img_bytes = pix.tobytes("png")
+            page_images[i] = base64.b64encode(img_bytes).decode("utf-8")
+
+    doc.close()
+
+    return ExtractedContent(
+        pages_text=pages_text,
+        page_images=page_images,
+        total_pages=len(pages_text),
+        has_images=len(page_images) > 0,
+    )
+
+
+def get_full_text(extracted: ExtractedContent) -> str:
+    """合并所有页面文本"""
+    return "\n\n".join(
+        f"--- Page {i+1} ---\n{text}"
+        for i, text in enumerate(extracted.pages_text)
+        if text.strip()
+    )
--- a/backend/backfill_ai_trio_with_context.py
+++ b/backend/backfill_ai_trio_with_context.py
@@ -0,0 +1,252 @@
+"""
+重新生成所有题目的 AI trio，子题带父题上下文。
+用法: python backfill_ai_trio_with_context.py [--paper-id <id>] [--course <code>]
+"""
+
+import asyncio
+import io
+import json
+import re
+import sys
+import time
+import argparse
+from contextlib import redirect_stdout
+from app.services.supabase_client import get_supabase
+from app.services.llm_clients import get_deepseek_client
+
+
+def extract_code_lines(text: str) -> str:
+    lines = (text or "").splitlines()
+    result = []
+    in_code = False
+    open_brackets = 0
+    CODE_START = re.compile(r"^\s*(import |from \w|[A-Za-z_]\w*\s*=|print\()")
+    for line in lines:
+        stripped = line.strip()
+        if in_code and open_brackets > 0:
+            result.append(stripped)
+            open_brackets += stripped.count("(") + stripped.count("[") + stripped.count("{")
+            open_brackets -= stripped.count(")") + stripped.count("]") + stripped.count("}")
+            continue
+        if CODE_START.match(line):
+            in_code = True
+            result.append(stripped)
+            open_brackets += stripped.count("(") + stripped.count("[") + stripped.count("{")
+            open_brackets -= stripped.count(")") + stripped.count("]") + stripped.count("}")
+            continue
+        in_code = False
+    return "\n".join(result)
+
+
+def try_exec_python(code: str, shared_ns: dict) -> str | None:
+    buf = io.StringIO()
+    try:
+        with redirect_stdout(buf):
+            exec(code, shared_ns)  # noqa: S102
+        output = buf.getvalue().strip()
+        return output if output else None
+    except Exception:
+        return None
+
+BATCH_ANALYSIS_PROMPT = """You are an expert academic answer analyst. Generate three study sections for each question below. ALL output must be in English.
+
+For every question, return:
+- knowledge_reminder: concise prerequisite bullets in HTML
+- ai_hint: a helpful hint in HTML without revealing the final answer
+- solution: a complete step-by-step solution in HTML
+
+Return JSON in this exact format:
+{{
+  "analyses": [
+    {{
+      "question_number": "1a",
+      "knowledge_reminder": "<HTML>...</HTML>",
+      "ai_hint": "<HTML>...</HTML>",
+      "solution": "<HTML>...</HTML>"
+    }}
+  ]
+}}
+
+Rules:
+- Return one item for every provided question_number
+- All text must be in English
+- HTML only, KaTeX compatible (block $$ ... $$ inline $ ... $)
+- For MC questions, explain why the correct option is right and why others are wrong
+- For long questions, show a complete derivation or reasoning chain
+- Use <ol> or numbered steps in solution when appropriate
+- Mark common mistakes with <div class="common-error">...</div>
+- CRITICAL: When a question_text contains "[Context from parent question X]" followed by "[Sub-question Y]", the parent section is background context only. You MUST solve ONLY the specific sub-question labeled [Sub-question Y]. Do NOT solve other sub-questions listed in the parent context. Give one precise answer for that single sub-question only.
+
+Questions:
+{questions_payload}
+"""
+
+
+def chunked(lst, size):
+    return [lst[i:i+size] for i in range(0, len(lst), size)]
+
+
+async def deepseek_batch(batch: list[dict]) -> list[dict]:
+    client = get_deepseek_client()
+    for attempt in range(5):
+        try:
+            resp = client.chat.completions.create(
+                model="deepseek-chat",
+                messages=[{
+                    "role": "system",
+                    "content": BATCH_ANALYSIS_PROMPT.format(
+                        questions_payload=json.dumps(batch, ensure_ascii=False)
+                    )
+                }],
+                temperature=0.3,
+                max_tokens=8192,
+                response_format={"type": "json_object"},
+            )
+            raw = re.sub(r'[\x00-\x08\x0b\x0c\x0e-\x1f]', '', resp.choices[0].message.content)
+            raw = re.sub(r'(?<!\\)((?:\\\\)*)\\([^"\\/bfnrtu])', r'\1\\\\\2', raw)
+            data = json.loads(raw)
+            return data.get("analyses", [])
+        except Exception as e:
+            print(f"  attempt {attempt+1} failed: {e}")
+            if attempt < 4:
+                await asyncio.sleep(2 ** attempt * 2)
+    return []
+
+
+async def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--paper-id", help="Only process this paper")
+    parser.add_argument("--course", help="Only process papers with this course code")
+    parser.add_argument("--missing-only", action="store_true", help="Only process questions missing solution")
+    args = parser.parse_args()
+
+    sb = get_supabase()
+
+    # Fetch all questions (with paper info for filtering)
+    query = sb.table("paper_questions").select(
+        "id, paper_id, question_number, question_type, question_text, "
+        "parent_question, score, correct_option, correct_answer, raw_answer_text, "
+        "analytics_topic, topic_tags, solution"
+    )
+    if args.paper_id:
+        query = query.eq("paper_id", args.paper_id)
+    result = query.order("paper_id").order("display_order").execute()
+    all_questions = result.data
+
+    if args.course:
+        # Filter by course via papers table
+        papers_res = sb.table("papers").select("id").eq("course_code", args.course.upper()).execute()
+        paper_ids = {p["id"] for p in papers_res.data}
+        all_questions = [q for q in all_questions if q["paper_id"] in paper_ids]
+
+    if args.missing_only:
+        all_questions = [q for q in all_questions if not q.get("solution")]
+        print(f"Questions missing solution: {len(all_questions)}")
+    else:
+        print(f"Total questions to process: {len(all_questions)}")
+
+    # Group by paper_id
+    from collections import defaultdict
+    by_paper: dict[str, list] = defaultdict(list)
+    for q in all_questions:
+        by_paper[q["paper_id"]].append(q)
+
+    total_updated = 0
+
+    for paper_id, questions in by_paper.items():
+        print(f"\nPaper {paper_id} — {len(questions)} questions")
+
+        # 所有题都可能是别的题的父题
+        parent_text_map: dict[str, str] = {
+            q["question_number"]: q["question_text"] or ""
+            for q in questions
+        }
+
+        # Build payloads with context + Python exec
+        payloads = []
+        exec_namespaces: dict[str, dict] = {}
+
+        for q in questions:
+            parent_q = q.get("parent_question")
+            if parent_q and parent_q in parent_text_map:
+                full_text = (
+                    f"[Context from parent question {parent_q}]\n"
+                    f"{parent_text_map[parent_q]}\n\n"
+                    f"[Sub-question {q['question_number']}]\n"
+                    f"{q['question_text'] or ''}"
+                )
+            else:
+                full_text = q["question_text"] or ""
+
+            answer_section = ""
+            if q.get("raw_answer_text"):
+                answer_section = q["raw_answer_text"]
+            elif q.get("correct_option"):
+                answer_section = f"Correct option: {q['correct_option']}"
+            elif q.get("correct_answer"):
+                answer_section = f"Correct answer: {q['correct_answer']}"
+
+            # 尝试 Python exec 拿真实输出
+            if not answer_section:
+                group_key = parent_q or q["question_number"]
+                if group_key not in exec_namespaces:
+                    ns: dict = {}
+                    try:
+                        import numpy as np
+                        ns["np"] = np
+                    except ImportError:
+                        pass
+                    # 先执行父题 setup 代码
+                    if parent_q and parent_q in parent_text_map:
+                        setup = extract_code_lines(parent_text_map[parent_q])
+                        try_exec_python(setup, ns)
+                    exec_namespaces[group_key] = ns
+
+                ns = exec_namespaces[group_key]
+                sub_code = extract_code_lines(q["question_text"] or "")
+                if sub_code:
+                    exec_out = try_exec_python(sub_code, ns)
+                    if exec_out is not None:
+                        answer_section = f"Executed output: {exec_out}"
+                        print(f"    [exec] {q['question_number']}: {exec_out[:60]}")
+
+            payloads.append({
+                "_id": q["id"],
+                "question_number": q["question_number"],
+                "question_type": q["question_type"] or "long_question",
+                "score": q.get("score") or "unknown",
+                "question_text": full_text,
+                "reference_answer": answer_section,
+            })
+
+        # Process in batches of 3
+        id_map = {q["question_number"]: q["id"] for q in questions}
+
+        for batch in chunked(payloads, 3):
+            # Strip internal _id before sending to model
+            model_batch = [{k: v for k, v in p.items() if k != "_id"} for p in batch]
+            nums = [p["question_number"] for p in batch]
+            print(f"  Batch {nums} ...", end=" ", flush=True)
+
+            analyses = await deepseek_batch(model_batch)
+
+            for item in analyses:
+                qnum = item.get("question_number")
+                qid = id_map.get(qnum)
+                if not qid:
+                    continue
+                sb.table("paper_questions").update({
+                    "knowledge_reminder": item.get("knowledge_reminder"),
+                    "ai_hint": item.get("ai_hint"),
+                    "solution": item.get("solution"),
+                }).eq("id", qid).execute()
+                total_updated += 1
+
+            print(f"done ({len(analyses)} updated)")
+            await asyncio.sleep(1)
+
+    print(f"\nDone. Total updated: {total_updated}")
+
+
+if __name__ == "__main__":
+    asyncio.run(main())
--- a/backend/backfill_comp2211_page_y.py
+++ b/backend/backfill_comp2211_page_y.py
@@ -0,0 +1,160 @@
+"""Backfill page_y_ratio for COMP2211 subquestions."""
+
+from __future__ import annotations
+
+import re
+import time
+from pathlib import Path
+from concurrent.futures import ThreadPoolExecutor, as_completed
+
+import fitz
+import httpx
+
+from app.services.supabase_client import get_supabase
+
+
+ROOT = Path(__file__).resolve().parent.parent
+PAPERS_DIR = ROOT / "pastpaper-scraper" / "papers" / "COMP2211"
+
+PDF_BY_EXAM_KEY = {
+    "COMP2211-2022-fall-midterm": "(COMP2211)[2022](f)midterm~=yjz8dxdd^_27002.pdf",
+    "COMP2211-2022-spring-midterm": "(COMP2211)[2022](s)midterm~=b8bidkgs^_14629.pdf",
+    "COMP2211-2022-spring-final-part-a": "(COMP2211)[2022](s)final~=b8bidkgs^_33018.pdf",
+    "COMP2211-2022-spring-final-part-b": "(COMP2211)[2022](s)final~=b8bidkgs^_40627.pdf",
+    "COMP2211-2023-spring-midterm": "(COMP2211)[2023](s)midterm~=bxbidkmj^_26587.pdf",
+    "COMP2211-2024-spring-midterm": "(COMP2211)[2024](s)midterm~=rcidkjgf^_82003.pdf",
+    "COMP2211-2024-spring-final": "(COMP2211)[2024](s)final~=igk5mmg^_90365.pdf",
+}
+
+
+def marker_candidates(question_number: str) -> list[str]:
+    if "_" in question_number:
+        left, right = question_number.split("_", 1)
+        tokens: list[str] = []
+        m = re.fullmatch(r"(\d+)([a-z])", left)
+        if m:
+            tokens.append(f"({m.group(2)})")
+        elif re.fullmatch(r"\d+[a-z]+", left):
+            tokens.append(f"({re.sub(r'^\\d+', '', left)})")
+        tokens.append(f"({right})")
+        return tokens[::-1]
+
+    m = re.fullmatch(r"(\d+)([a-z])", question_number)
+    if m:
+        return [f"({m.group(2)})", f"Problem {m.group(1)}"]
+
+    if question_number.isdigit():
+        return [f"Problem {question_number}"]
+
+    return [question_number]
+
+
+def line_matches(line_text: str, marker: str) -> bool:
+    text = re.sub(r"\s+", " ", line_text.strip())
+    if not text:
+        return False
+    if marker.startswith("("):
+        return text.startswith(marker)
+    return marker.lower() in text.lower()
+
+
+def line_y_ratio(page: fitz.Page, marker: str) -> float | None:
+    data = page.get_text("dict")
+    hits: list[float] = []
+    for block in data.get("blocks", []):
+        if block.get("type") != 0:
+            continue
+        for line in block.get("lines", []):
+            line_text = "".join(
+                span.get("text", "")
+                for span in line.get("spans", [])
+            )
+            if line_matches(line_text, marker):
+                bbox = line.get("bbox")
+                if bbox:
+                    hits.append(float(bbox[1]))
+    if not hits:
+        return None
+    y = min(hits)
+    return max(0.0, min((y - page.rect.y0) / page.rect.height, 0.98))
+
+
+def search_y_ratio(page: fitz.Page, marker: str) -> float | None:
+    ratios: list[float] = []
+    for rect in page.search_for(marker):
+        ratios.append(max(0.0, min((rect.y0 - page.rect.y0) / page.rect.height, 0.98)))
+    return min(ratios) if ratios else None
+
+
+def infer_y_ratio(page: fitz.Page, question_number: str) -> float:
+    for marker in marker_candidates(question_number):
+        ratio = line_y_ratio(page, marker)
+        if ratio is not None:
+            return ratio
+        ratio = search_y_ratio(page, marker)
+        if ratio is not None:
+            return ratio
+    return 0.05
+
+
+def main() -> None:
+    sb = get_supabase()
+    papers = (
+        sb.table("papers")
+        .select("id, source_exam_key")
+        .eq("course_code", "COMP2211")
+        .eq("source_kind", "course_library")
+        .execute()
+        .data
+        or []
+    )
+
+    updates: list[tuple[str, float]] = []
+    for paper in papers:
+        exam_key = paper["source_exam_key"]
+        pdf_name = PDF_BY_EXAM_KEY.get(exam_key)
+        if not pdf_name:
+            continue
+        pdf_path = PAPERS_DIR / pdf_name
+        doc = fitz.open(pdf_path)
+        try:
+            questions = (
+                sb.table("paper_questions")
+                .select("id, question_number, page_number")
+                .eq("paper_id", paper["id"])
+                .order("display_order")
+                .execute()
+                .data
+                or []
+            )
+            for question in questions:
+                page_number = question.get("page_number") or 1
+                page = doc[page_number - 1]
+                ratio = infer_y_ratio(page, question["question_number"])
+                updates.append((question["id"], round(ratio, 4)))
+        finally:
+            doc.close()
+
+    def apply_update(payload: tuple[str, float]) -> None:
+        question_id, ratio = payload
+        attempts = 0
+        while True:
+            try:
+                sb.table("paper_questions").update({"page_y_ratio": ratio}).eq("id", question_id).execute()
+                return
+            except httpx.HTTPError:
+                attempts += 1
+                if attempts >= 5:
+                    raise
+                time.sleep(0.4 * attempts)
+
+    with ThreadPoolExecutor(max_workers=3) as executor:
+        futures = [executor.submit(apply_update, payload) for payload in updates]
+        for future in as_completed(futures):
+            future.result()
+
+    print(f"Backfilled page_y_ratio for {len(updates)} COMP2211 questions.")
+
+
+if __name__ == "__main__":
+    main()
--- a/backend/backfill_comp2211_tags.py
+++ b/backend/backfill_comp2211_tags.py
@@ -0,0 +1,365 @@
+"""Backfill COMP2211 tags to the revised retrieval schema."""
+
+from __future__ import annotations
+
+import re
+from collections import OrderedDict
+
+from app.services.supabase_client import get_supabase
+
+
+SKILL_LABELS = {
+    "concept_check": "Concept Check",
+    "code_tracing": "Code Tracing",
+    "algorithm_tracing": "Algorithm Tracing",
+    "distance_calculation": "Distance Calculation",
+    "centroid_update": "Centroid Update",
+    "weight_update": "Weight Update",
+    "decision_boundary": "Decision Boundary",
+    "implementation": "Implementation",
+    "debugging": "Debugging",
+    "model_selection": "Model Selection",
+    "concept_explanation": "Concept Explanation",
+    "architecture_reasoning": "Architecture Reasoning",
+    "convergence_reasoning": "Convergence Reasoning",
+    "generalization_reasoning": "Generalization Reasoning",
+    "classification_decision": "Classification Decision",
+}
+
+ACRONYMS = {
+    "ai": "AI",
+    "cnn": "CNN",
+    "knn": "KNN",
+    "mlp": "MLP",
+    "nb": "NB",
+    "numpy": "NumPy",
+}
+
+
+def title_case_with_acronyms(value: str) -> str:
+    words = re.split(r"[\s_]+", value.strip())
+    parts: list[str] = []
+    for word in words:
+        if not word:
+            continue
+        lowered = word.lower()
+        parts.append(ACRONYMS.get(lowered, lowered.capitalize()))
+    return " ".join(parts)
+
+
+def normalize_skill_tag(tag: str) -> str:
+    if tag in SKILL_LABELS:
+        return SKILL_LABELS[tag]
+    return title_case_with_acronyms(tag)
+
+
+def text_blob(question: dict) -> str:
+    parts = [
+        question.get("question_text") or "",
+        question.get("raw_answer_text") or "",
+        " ".join(question.get("topic_tags") or []),
+        " ".join(question.get("skill_tags") or []),
+        question.get("analytics_topic") or "",
+    ]
+    return " ".join(parts).lower()
+
+
+def has_any(text: str, phrases: list[str]) -> bool:
+    return any(phrase in text for phrase in phrases)
+
+
+def infer_analytics_topic(question: dict) -> str:
+    text = text_blob(question)
+    broad = question.get("analytics_topic") or ""
+    skills = {normalize_skill_tag(tag) for tag in (question.get("skill_tags") or [])}
+
+    if has_any(text, ["ethics", "bias", "privacy", "autonomous vehicle", "informed consent", "human participants", "ethically"]):
+        return "Ethics of AI"
+    if has_any(text, ["minimax", "alpha-beta", "alpha beta", "game tree", "tic-tac-toe", "tic tac toe"]):
+        return "Game Trees"
+    if has_any(text, ["search algorithm", "best-first", "breadth-first", "depth-first", "a* search", "a star"]):
+        return "Search Algorithms"
+    if has_any(text, ["cross validation", "d-fold", "k-fold", "train/val", "validation set", "fold "]) or broad == "Cross Validation":
+        return "Cross Validation"
+    if has_any(text, ["confusion matrix", "precision", "recall", "macro f1", "f1 score", "accuracy score", "evaluation metric"]):
+        return "Evaluation Metrics"
+    if has_any(text, ["naive bayes", "gaussian distribution", "laplace smoothing", "likelihood", "posterior probability"]) or broad == "Naive Bayes":
+        return "Naive Bayes"
+    if has_any(text, ["bayes classifier", "conditional probability", "bayesian inference", "prior probability", "posterior"]) or broad == "Bayesian Inference":
+        return "Bayesian Inference"
+    if has_any(text, ["leader clustering", "k-means", "k means", "centroid", "elbow method", "silhouette", "cluster assignments", "closest centroid", "new cluster"]):
+        return "K-Means"
+    if has_any(text, ["k-nearest", "nearest neighbors", "weighted knn", "cosine distance", "euclidean distance", "manhattan distance", "6-cross-validation error for k", "class for cosine distance"]):
+        return "KNN"
+    if has_any(text, ["multilayer perceptron", "mlp", "back propagation", "backpropagation", "hidden layer", "output layer", "dropout", "softmax", "sigmoid function", "relu as the activation"]) or broad == "MLP":
+        return "MLP"
+    if has_any(text, ["perceptron", "decision boundary", "single neuron", "weight update", "activation function f(z)", "linearly separable"]) or broad == "Perceptron":
+        return "Perceptron"
+    if has_any(text, ["convolutional neural network", "cnn", "kernel", "padding", "stride", "pooling", "dilated convolution", "3d convolution", "otsu", "histogram", "image processing", "grayscale image"]):
+        return "CNN"
+    if has_any(text, ["numpy", "python", "np.", "broadcasting", "reshape", "transpose", "mask", "vectorized", "np.arange", "np.mean", "np.dot", "np.convolve"]):
+        return "Python and NumPy"
+
+    if broad == "KNN and Clustering":
+        if (
+            has_any(text, ["k-means", "k means", "centroid", "leader clustering", "elbow", "silhouette"])
+            or "Centroid Update" in skills
+            or "Convergence Reasoning" in skills
+            or "Algorithm Tracing" in skills
+            or "Model Selection" in skills
+        ):
+            return "K-Means"
+        return "KNN"
+
+    if broad == "Perceptron and MLP":
+        if (
+            has_any(text, ["hidden layer", "backprop", "activation function", "softmax", "relu", "sigmoid", "multilayer perceptron", "mlp"])
+            or "Architecture Reasoning" in skills
+        ):
+            return "MLP"
+        return "Perceptron"
+
+    if broad == "Probabilistic Models":
+        if has_any(text, ["naive bayes", "gaussian", "laplace", "likelihood"]):
+            return "Naive Bayes"
+        return "Bayesian Inference"
+
+    if broad == "Evaluation and Validation":
+        if has_any(text, ["cross validation", "cross-validation", "k-fold", "d-fold", "validation set", "train/val"]):
+            return "Cross Validation"
+        return "Evaluation Metrics"
+
+    if broad == "Search and Games":
+        if has_any(text, ["minimax", "alpha-beta", "alpha beta", "game tree"]):
+            return "Game Trees"
+        return "Search Algorithms"
+
+    broad_map = {
+        "Vision and CNN": "CNN",
+        "Python Fundamentals": "Python and NumPy",
+        "Ethics of AI": "Ethics of AI",
+    }
+    return broad_map.get(broad, "Python and NumPy")
+
+
+TOPIC_CONCEPTS = {
+    "Naive Bayes": [
+        ("Naive Bayes", ["naive bayes"]),
+        ("Prior", ["prior"]),
+        ("Likelihood", ["likelihood"]),
+        ("Posterior", ["posterior"]),
+        ("Gaussian", ["gaussian"]),
+        ("Laplace Smoothing", ["laplace"]),
+        ("Missing Data", ["missing data", "missing value"]),
+    ],
+    "Bayesian Inference": [
+        ("Bayesian Inference", ["bayes", "conditional probability", "posterior"]),
+        ("Conditional Probability", ["conditional probability"]),
+        ("Bayes Rule", ["bayes rule", "posterior"]),
+        ("Prior", ["prior"]),
+        ("Posterior", ["posterior"]),
+    ],
+    "KNN": [
+        ("KNN", ["k-nearest", "nearest neighbors", "knn"]),
+        ("Euclidean Distance", ["euclidean distance"]),
+        ("Manhattan Distance", ["manhattan distance"]),
+        ("Cosine Distance", ["cosine distance"]),
+        ("Weighted KNN", ["weighted k-nearest", "weighted knn", "inverse of the distance"]),
+        ("Classification", ["class label", "predict", "classification"]),
+        ("Cross Validation", ["cross-validation", "cross validation"]),
+        ("Test Error", ["test error"]),
+    ],
+    "K-Means": [
+        ("K-Means", ["k-means", "k means"]),
+        ("Centroid Update", ["centroid"]),
+        ("Convergence", ["converged", "convergence"]),
+        ("Leader Clustering", ["leader clustering"]),
+        ("Outliers", ["outlier"]),
+        ("Model Selection", ["elbow method", "silhouette", "suitable k"]),
+    ],
+    "Perceptron": [
+        ("Perceptron", ["perceptron"]),
+        ("Decision Boundary", ["decision boundary", "linearly separable"]),
+        ("Weight Update", ["weight update", "∆w", "deltaw", "backward propagation"]),
+        ("Convergence", ["converged", "convergence"]),
+        ("Activation Function", ["activation function"]),
+    ],
+    "MLP": [
+        ("MLP", ["mlp", "multilayer perceptron"]),
+        ("Backpropagation", ["back propagation", "backpropagation", "backward propagation"]),
+        ("Activation Function", ["activation function", "relu", "sigmoid", "softmax"]),
+        ("Hidden Layer", ["hidden layer"]),
+        ("Output Layer", ["output layer"]),
+        ("Parameter Count", ["number of parameters", "parameter"]),
+        ("Overfitting", ["overfitting", "dropout"]),
+    ],
+    "CNN": [
+        ("CNN", ["cnn", "convolutional neural network"]),
+        ("Convolution", ["convolution", "kernel"]),
+        ("Padding", ["padding", "reflection padding", "zero padding"]),
+        ("Stride", ["stride"]),
+        ("Pooling", ["pooling", "max pooling", "average pooling"]),
+        ("Image Processing", ["image processing", "grayscale image"]),
+        ("Histogram", ["histogram"]),
+        ("Otsu Thresholding", ["otsu"]),
+        ("Dilated Convolution", ["dilated convolution"]),
+        ("3D Convolution", ["3d convolution"]),
+        ("Dropout", ["dropout"]),
+    ],
+    "Evaluation Metrics": [
+        ("Evaluation Metrics", ["evaluation", "metric"]),
+        ("Confusion Matrix", ["confusion matrix"]),
+        ("Accuracy", ["accuracy"]),
+        ("Precision", ["precision"]),
+        ("Recall", ["recall"]),
+        ("F1 Score", ["f1"]),
+        ("Macro F1", ["macro f1"]),
+    ],
+    "Cross Validation": [
+        ("Cross Validation", ["cross validation", "cross-validation", "d-fold", "k-fold"]),
+        ("Train Validation Split", ["validation set", "train", "test fold"]),
+        ("Model Selection", ["choose k", "which k", "fold"]),
+        ("Data Shuffling", ["shuffle", "shuffling"]),
+    ],
+    "Python and NumPy": [
+        ("Python and NumPy", ["numpy", "python"]),
+        ("NumPy", ["numpy", "np."]),
+        ("Broadcasting", ["broadcast"]),
+        ("Array Indexing", ["index", "slice"]),
+        ("Vectorization", ["no explicit loops", "vectorized"]),
+        ("Matrix Multiplication", ["matmul", "matrix multiplication", "@"]),
+        ("Reshape", ["reshape"]),
+        ("Transpose", ["transpose"]),
+        ("Masking", ["mask"]),
+        ("Convolution", ["convolve"]),
+    ],
+    "Search Algorithms": [
+        ("Search Algorithms", ["search"]),
+        ("Breadth-First Search", ["breadth-first", "breadth first", "bfs"]),
+        ("Depth-First Search", ["depth-first", "depth first", "dfs"]),
+        ("Best-First Search", ["best-first", "best first"]),
+        ("A* Search", ["a* search", "a star", "astar"]),
+        ("Heuristic", ["heuristic"]),
+    ],
+    "Game Trees": [
+        ("Game Trees", ["game tree", "minimax", "alpha-beta", "alpha beta"]),
+        ("Minimax", ["minimax"]),
+        ("Alpha-Beta Pruning", ["alpha-beta", "alpha beta", "pruned"]),
+        ("Utility", ["utility"]),
+    ],
+    "Ethics of AI": [
+        ("Ethics of AI", ["ethics", "ethical"]),
+        ("Bias", ["bias"]),
+        ("Privacy", ["privacy"]),
+        ("Fairness", ["fair"]),
+        ("Research Ethics", ["informed consent", "human participants"]),
+        ("Governance", ["monitoring", "production", "organizations"]),
+        ("Autonomous Vehicles", ["autonomous vehicle"]),
+    ],
+}
+
+
+TOPIC_DEFAULTS = {
+    "Naive Bayes": ["Likelihood", "Posterior"],
+    "Bayesian Inference": ["Conditional Probability", "Bayes Rule"],
+    "KNN": ["Classification", "Distance Calculation"],
+    "K-Means": ["Centroid Update", "Convergence"],
+    "Perceptron": ["Decision Boundary", "Weight Update"],
+    "MLP": ["Activation Function", "Hidden Layer"],
+    "CNN": ["Convolution", "Padding"],
+    "Evaluation Metrics": ["Confusion Matrix", "F1 Score"],
+    "Cross Validation": ["Train Validation Split", "Model Selection"],
+    "Python and NumPy": ["NumPy", "Vectorization"],
+    "Search Algorithms": ["Breadth-First Search", "Heuristic"],
+    "Game Trees": ["Minimax", "Alpha-Beta Pruning"],
+    "Ethics of AI": ["Bias", "Fairness"],
+}
+
+DEFAULT_SKILLS = {
+    "Naive Bayes": ["Probability Reasoning"],
+    "Bayesian Inference": ["Probability Reasoning"],
+    "KNN": ["Classification Decision"],
+    "K-Means": ["Centroid Update"],
+    "Perceptron": ["Decision Boundary"],
+    "MLP": ["Concept Explanation"],
+    "CNN": ["Concept Explanation"],
+    "Evaluation Metrics": ["Metric Reasoning"],
+    "Cross Validation": ["Model Selection"],
+    "Python and NumPy": ["Code Tracing"],
+    "Search Algorithms": ["Algorithm Tracing"],
+    "Game Trees": ["Game Reasoning"],
+    "Ethics of AI": ["Ethical Reasoning"],
+}
+
+
+def unique_keep_order(values: list[str]) -> list[str]:
+    return list(OrderedDict((value, None) for value in values if value).keys())
+
+
+def build_topic_tags(question: dict, analytics_topic: str) -> list[str]:
+    text = text_blob(question)
+    tags: list[str] = [analytics_topic]
+    for label, keywords in TOPIC_CONCEPTS.get(analytics_topic, []):
+        if label == analytics_topic:
+            continue
+        if has_any(text, keywords):
+            tags.append(label)
+    for default in TOPIC_DEFAULTS.get(analytics_topic, []):
+        if len(unique_keep_order(tags)) >= 2:
+            break
+        tags.append(default)
+    tags = unique_keep_order(tags)
+    return tags[:5]
+
+
+def build_skill_tags(question: dict, analytics_topic: str) -> list[str]:
+    raw = question.get("skill_tags") or []
+    converted = unique_keep_order([normalize_skill_tag(tag) for tag in raw])
+    if not converted:
+        converted = DEFAULT_SKILLS.get(analytics_topic, ["Concept Check"])
+    return converted[:3]
+
+
+def main() -> None:
+    sb = get_supabase()
+    papers = (
+        sb.table("papers")
+        .select("id")
+        .eq("course_code", "COMP2211")
+        .eq("source_kind", "course_library")
+        .execute()
+        .data
+    )
+    paper_ids = [paper["id"] for paper in papers]
+    if not paper_ids:
+        print("No COMP2211 course-library papers found.")
+        return
+
+    questions = (
+        sb.table("paper_questions")
+        .select("id, paper_id, question_number, question_text, raw_answer_text, analytics_topic, topic_tags, skill_tags, topics")
+        .in_("paper_id", paper_ids)
+        .order("paper_id")
+        .order("display_order")
+        .execute()
+        .data
+    )
+
+    for question in questions:
+        analytics_topic = infer_analytics_topic(question)
+        topic_tags = build_topic_tags(question, analytics_topic)
+        skill_tags = build_skill_tags(question, analytics_topic)
+        payload = {
+            "analytics_topic": analytics_topic,
+            "topic_primary": analytics_topic,
+            "topic_tags": topic_tags,
+            "topics": topic_tags,
+            "skill_tags": skill_tags,
+        }
+        sb.table("paper_questions").update(payload).eq("id", question["id"]).execute()
+
+    print(f"Backfilled {len(questions)} COMP2211 questions.")
+
+
+if __name__ == "__main__":
+    main()
--- a/backend/backfill_null_ai_trio.py
+++ b/backend/backfill_null_ai_trio.py
@@ -0,0 +1,169 @@
+"""Backfill AI trio for questions where knowledge_reminder IS NULL.
+
+For each question, generates fields in two separate LLM calls to avoid token truncation:
+  Call 1 → knowledge_reminder + ai_hint  (short, ~500 tokens output)
+  Call 2 → solution                      (long, up to 4096 tokens output)
+
+Run from the backend directory:
+    uv run python backfill_null_ai_trio.py [--dry-run]
+"""
+
+from __future__ import annotations
+
+import asyncio
+import json
+import sys
+from app.services.supabase_client import get_supabase
+from app.services.paper_processor import qwen_json_completion
+
+
+KNOWLEDGE_HINT_PROMPT = """\
+You are an expert tutor. Given a past-paper question, produce two short study aids in English.
+
+Return JSON exactly:
+{{
+  "knowledge_reminder": "2-4 sentences summarising the key concept or formula the student must recall.",
+  "ai_hint": "1-3 sentence nudge that guides WITHOUT giving the answer away."
+}}
+
+Question:
+{payload}
+"""
+
+SOLUTION_PROMPT = """\
+You are an expert tutor. Given a past-paper question and its reference answer, write a clear, \
+step-by-step model solution in English. Show all working. Be thorough but stop when the answer \
+is complete — do not pad.
+
+Return JSON exactly:
+{{
+  "solution": "<full step-by-step solution as a single string, use \\n for line breaks>"
+}}
+
+Question:
+{payload}
+"""
+
+
+def build_payload(q: dict) -> dict:
+    ref = ""
+    if q.get("raw_answer_text"):
+        ref = q["raw_answer_text"]
+    elif q.get("correct_option"):
+        ref = f"Correct option: {q['correct_option']}"
+    elif q.get("correct_answer"):
+        ref = f"Correct answer: {q['correct_answer']}"
+
+    return {
+        "question_number": q["question_number"],
+        "question_type": q["question_type"] or "long_question",
+        "score": q.get("score") or "unknown",
+        "question_text": q.get("question_text") or "",
+        "topics": q.get("topics") or [],
+        "reference_answer": ref,
+    }
+
+
+async def process_one(sb, q: dict, dry_run: bool) -> bool:
+    payload_str = json.dumps(build_payload(q), ensure_ascii=False)
+    row_id = q["id"]
+    qnum = q["question_number"]
+
+    if dry_run:
+        print(f"    [dry-run] would process {qnum}")
+        return True
+
+    update: dict = {}
+
+    # ── Call 1: knowledge_reminder + ai_hint ─────────────────────────
+    try:
+        r1 = await qwen_json_completion(
+            system_prompt=KNOWLEDGE_HINT_PROMPT.format(payload=payload_str),
+            temperature=0.3,
+            max_tokens=1024,
+        )
+        if r1.get("knowledge_reminder"):
+            update["knowledge_reminder"] = r1["knowledge_reminder"]
+        if r1.get("ai_hint"):
+            update["ai_hint"] = r1["ai_hint"]
+    except Exception as e:
+        print(f"    WARN call-1 failed for {qnum}: {e}")
+
+    await asyncio.sleep(1)
+
+    # ── Call 2: solution ──────────────────────────────────────────────
+    try:
+        r2 = await qwen_json_completion(
+            system_prompt=SOLUTION_PROMPT.format(payload=payload_str),
+            temperature=0.3,
+            max_tokens=4096,
+        )
+        if r2.get("solution"):
+            update["solution"] = r2["solution"]
+    except Exception as e:
+        print(f"    WARN call-2 failed for {qnum}: {e}")
+
+    if not update:
+        print(f"    SKIP {qnum}: both calls returned nothing")
+        return False
+
+    sb.table("paper_questions").update(update).eq("id", row_id).execute()
+    return True
+
+
+async def backfill(dry_run: bool = False) -> None:
+    sb = get_supabase()
+
+    papers = (
+        sb.table("papers")
+        .select("id")
+        .eq("course_code", "COMP2211")
+        .eq("source_kind", "course_library")
+        .execute()
+        .data
+    )
+    paper_ids = [p["id"] for p in papers]
+    if not paper_ids:
+        print("No COMP2211 course-library papers found.")
+        return
+
+    questions = (
+        sb.table("paper_questions")
+        .select("id, paper_id, question_number, question_type, score, question_text, topics, raw_answer_text, correct_option, correct_answer")
+        .in_("paper_id", paper_ids)
+        .is_("knowledge_reminder", "null")
+        .order("paper_id")
+        .order("display_order")
+        .execute()
+        .data
+    )
+
+    if not questions:
+        print("No NULL questions found — all done!")
+        return
+
+    print(f"Found {len(questions)} questions with NULL knowledge_reminder.")
+
+    # Group by paper for cleaner output
+    from collections import defaultdict
+    by_paper: dict[str, list] = defaultdict(list)
+    for q in questions:
+        by_paper[q["paper_id"]].append(q)
+
+    total_updated = 0
+    for paper_idx, (paper_id, qs) in enumerate(by_paper.items(), 1):
+        print(f"\n[{paper_idx}/{len(by_paper)}] paper_id={paper_id} — {len(qs)} NULL questions")
+        for q in qs:
+            print(f"  Processing {q['question_number']}...", end=" ", flush=True)
+            ok = await process_one(sb, q, dry_run)
+            if ok:
+                total_updated += 1
+                print("done")
+            await asyncio.sleep(1.5)
+
+    print(f"\nDone. {total_updated}/{len(questions)} questions updated.")
+
+
+if __name__ == "__main__":
+    dry_run = "--dry-run" in sys.argv
+    asyncio.run(backfill(dry_run=dry_run))
--- a/backend/backfill_similar_questions.py
+++ b/backend/backfill_similar_questions.py
@@ -0,0 +1,135 @@
+"""Pre-compute similar_questions for all COMP2211 course-library questions.
+
+For each question, runs the same similarity logic as the API and writes the result
+into paper_questions.similar_questions (JSONB). The API will then return this
+pre-computed value directly with no computation overhead.
+
+Run from the backend directory:
+    uv run python backfill_similar_questions.py [--dry-run]
+"""
+
+from __future__ import annotations
+
+import sys
+from collections import Counter
+from app.services.supabase_client import get_supabase
+from app.routers.questions import (
+    similarity_score,
+    question_family,
+    display_topics,
+)
+
+
+def run(dry_run: bool = False) -> None:
+    sb = get_supabase()
+
+    # Fetch all ready COMP2211 papers
+    papers = (
+        sb.table("papers")
+        .select("id, year, term, exam_type, part_label")
+        .eq("course_code", "COMP2211")
+        .eq("status", "ready")
+        .execute()
+        .data
+    )
+    if not papers:
+        print("No ready COMP2211 papers found.")
+        return
+
+    papers_by_id = {p["id"]: p for p in papers}
+    paper_ids = list(papers_by_id.keys())
+
+    # Fetch all questions for these papers
+    all_questions = (
+        sb.table("paper_questions")
+        .select(
+            "id, paper_id, question_number, question_type, question_format, "
+            "question_text, score, topics, analytics_topic, topic_tags, skill_tags, "
+            "difficulty, knowledge_reminder, ai_hint, solution"
+        )
+        .in_("paper_id", paper_ids)
+        .execute()
+        .data
+    )
+    print(f"Found {len(all_questions)} questions across {len(papers)} papers.")
+
+    # Batch full-text scores not practical here; skip RPC, rely on tag/topic scoring
+    # (text_score = 0 for all, still produces good tag-based results)
+
+    updated = 0
+    skipped = 0
+
+    for i, target in enumerate(all_questions, 1):
+        target_paper_id = target["paper_id"]
+        target_topic = target.get("analytics_topic")
+
+        # Candidates: same course, different paper
+        candidates = [
+            q for q in all_questions
+            if q["paper_id"] != target_paper_id
+        ]
+
+        # Pre-filter by analytics_topic if available
+        if target_topic:
+            candidates = [c for c in candidates if c.get("analytics_topic") == target_topic]
+
+        if not candidates:
+            skipped += 1
+            print(f"  [{i}/{len(all_questions)}] {target['question_number']} — no candidates, skip")
+            continue
+
+        ranked = []
+        for candidate in candidates:
+            match_percent, reasons = similarity_score(target, candidate, text_score=0.0)
+            if match_percent < 20:
+                continue
+            paper = papers_by_id.get(candidate["paper_id"], {})
+            source = (
+                f"{paper.get('year', '')} {paper.get('term', '').title()} "
+                f"{paper.get('exam_type', '').title()}"
+            ).strip()
+            if paper.get("part_label"):
+                source = f"{source} Part {paper['part_label']}"
+            ranked.append({
+                "id": candidate["id"],
+                "paper_id": candidate["paper_id"],
+                "source": source,
+                "question_number": candidate["question_number"],
+                "match_percent": match_percent,
+                "match_reasons": reasons,
+                "question_type": question_family(candidate),
+                "question_text": candidate["question_text"],
+                "topics": display_topics(candidate),
+                "difficulty": candidate.get("difficulty"),
+                "knowledge_reminder": candidate.get("knowledge_reminder", ""),
+                "ai_hint": candidate.get("ai_hint", ""),
+                "solution": candidate.get("solution", ""),
+            })
+
+        ranked.sort(key=lambda item: (-item["match_percent"], item["source"], item["question_number"]))
+
+        # Deduplicate: best per paper
+        seen_papers: set[str] = set()
+        deduped = []
+        for item in ranked:
+            if item["paper_id"] not in seen_papers:
+                seen_papers.add(item["paper_id"])
+                deduped.append(item)
+        deduped = deduped[:12]
+
+        print(f"  [{i}/{len(all_questions)}] {target['question_number']} → {len(deduped)} similar", end="")
+
+        if dry_run:
+            print(" [dry-run]")
+            continue
+
+        sb.table("paper_questions").update({"similar_questions": deduped}).eq("id", target["id"]).execute()
+        updated += 1
+        print()
+
+    print(f"\nDone. {updated} updated, {skipped} skipped (no candidates).")
+
+
+if __name__ == "__main__":
+    dry_run = "--dry-run" in sys.argv
+    run(dry_run=dry_run)
--- a/backend/backfill_vision.py
+++ b/backend/backfill_vision.py
@@ -0,0 +1,238 @@
+"""
+用 Vision 模式重新处理所有已 ready 的试卷：
+- 从 Supabase Storage 拉 PDF → 图片 → Vision 拆题 → exec → AI trio → 更新 DB
+
+用法:
+  python backfill_vision.py --course COMP2211
+  python backfill_vision.py --paper-id <uuid>
+"""
+
+import asyncio
+import argparse
+import requests
+from app.services.supabase_client import get_supabase
+from app.services.paper_processor import (
+    process_paper,
+    strip_nulls,
+    pdf_to_images,
+    gemini_vision_json,
+    deepseek_json_completion,
+    parse_json_response,
+    extract_code_lines,
+    try_exec_python,
+    chunked,
+    sort_questions,
+    STRUCTURE_PROMPT,
+    ANSWER_MATCH_PROMPT,
+    BATCH_ANALYSIS_PROMPT,
+)
+import json
+import traceback
+
+
+async def reprocess_paper(paper: dict):
+    """重新处理单张试卷（Vision 模式）"""
+    sb = get_supabase()
+    paper_id = paper["id"]
+    label = f"{paper['course_code']} {paper['year']} {paper['term']} {paper['exam_type']}"
+    print(f"\n=== {label} ({paper_id[:8]}) ===")
+
+    # 1. 拉 PDF
+    try:
+        pdf_bytes = requests.get(paper["paper_file_url"], timeout=60).content
+    except Exception as e:
+        print(f"  SKIP: failed to fetch PDF: {e}")
+        return
+
+    answer_bytes = None
+    if paper.get("answer_file_url"):
+        try:
+            answer_bytes = requests.get(paper["answer_file_url"], timeout=60).content
+        except Exception:
+            pass
+
+    # 2. PDF → 图片
+    print(f"  Rendering {len(pdf_to_images(pdf_bytes))} pages...", end=" ", flush=True)
+    paper_images = pdf_to_images(pdf_bytes)
+    print("done")
+
+    # 3. Vision 拆题（分批，每批 8 页）
+    PAGE_BATCH = 8
+    all_questions: list = []
+    meta: dict = {}
+    print(f"  Vision extraction ({len(paper_images)} pages, {-(-len(paper_images)//PAGE_BATCH)} batches)...")
+    for i in range(0, len(paper_images), PAGE_BATCH):
+        batch_imgs = paper_images[i:i + PAGE_BATCH]
+        print(f"    Pages {i+1}-{i+len(batch_imgs)}...", end=" ", flush=True)
+        try:
+            batch_result = await gemini_vision_json(
+                system_prompt=STRUCTURE_PROMPT,
+                images=batch_imgs,
+                user_text=f"Pages {i+1}-{i+len(batch_imgs)} of the exam paper. Extract all questions visible on these pages.",
+                temperature=0,
+            )
+            if not meta:
+                meta = {k: batch_result.get(k) for k in ("total_score", "difficulty_level", "topics_summary")}
+            qs = batch_result.get("questions", [])
+            all_questions.extend(qs)
+            print(f"done ({len(qs)} questions)")
+        except Exception as e:
+            print(f"FAILED: {e}")
+    structure = {**meta, "questions": all_questions}
+    questions = sort_questions(all_questions)
+    print(f"  Total: {len(questions)} questions extracted")
+
+    # 4. 答案匹配
+    answers_map = {}
+    if answer_bytes:
+        print("  Vision answer matching...", end=" ", flush=True)
+        answer_images = pdf_to_images(answer_bytes)
+        questions_json = json.dumps(
+            [{"question_number": q["question_number"], "question_type": q["question_type"]}
+             for q in questions], ensure_ascii=False
+        )
+        try:
+            match_result = await gemini_vision_json(
+                system_prompt=ANSWER_MATCH_PROMPT.format(
+                    questions_json=questions_json, answer_text="(See images)"
+                ),
+                images=answer_images,
+                user_text=f"Match answers to these questions: {questions_json}",
+                temperature=0,
+            )
+            answers_map = {a["question_number"]: a for a in match_result.get("answers", [])}
+            print(f"done ({len(answers_map)} matched)")
+        except Exception as e:
+            print(f"FAILED: {e}")
+
+    # 5. 构建 payloads（exec Python）
+    import numpy as np
+    exec_namespaces: dict = {}
+    batched_payloads = []
+
+    for q in questions:
+        qnum = q["question_number"]
+        answer = answers_map.get(qnum, {})
+        full_text = q["question_text"] or ""
+
+        answer_section = ""
+        if answer.get("raw_answer_text"):
+            answer_section = answer["raw_answer_text"]
+        elif answer.get("correct_option"):
+            answer_section = f"Correct option: {answer['correct_option']}"
+        elif answer.get("correct_answer"):
+            answer_section = f"Correct answer: {answer['correct_answer']}"
+
+        if not answer_section:
+            parent_q = q.get("parent_question")
+            group_key = parent_q or qnum
+            if group_key not in exec_namespaces:
+                ns: dict = {"np": np}
+                setup = extract_code_lines(full_text)
+                try_exec_python(setup, ns)
+                exec_namespaces[group_key] = ns
+            ns = exec_namespaces[group_key]
+            print_lines = [l.strip() for l in full_text.splitlines() if l.strip().startswith("print(")]
+            if print_lines:
+                out = try_exec_python(print_lines[-1], ns)
+                if out is not None:
+                    answer_section = f"Executed output: {out}"
+                    print(f"    [exec] {qnum}: {out[:60]}")
+
+        batched_payloads.append({
+            "question_number": qnum,
+            "question_type": q["question_type"],
+            "score": q.get("score", "unknown"),
+            "question_text": full_text,
+            "topics": q.get("topics", []),
+            "reference_answer": answer_section,
+        })
+
+    # 6. AI trio
+    print(f"  Generating AI trio ({len(batched_payloads)} questions, {len(list(chunked(batched_payloads, 3)))} batches)...")
+    analyses: dict = {}
+    for batch in chunked(batched_payloads, 3):
+        nums = [p["question_number"] for p in batch]
+        print(f"    Batch {nums}...", end=" ", flush=True)
+        try:
+            result = await deepseek_json_completion(
+                system_prompt=BATCH_ANALYSIS_PROMPT.format(
+                    questions_payload=json.dumps(batch, ensure_ascii=False)
+                ),
+                temperature=0.3,
+            )
+            for item in result.get("analyses", []):
+                if item.get("question_number"):
+                    analyses[item["question_number"]] = item
+            print(f"done ({len(result.get('analyses', []))})")
+        except Exception as e:
+            print(f"FAILED: {e}")
+        await asyncio.sleep(1)
+
+    # 7. 删除旧题目，写入新题目
+    print("  Writing to DB...", end=" ", flush=True)
+    sb.table("paper_questions").delete().eq("paper_id", paper_id).execute()
+
+    for i, q in enumerate(questions):
+        qnum = q["question_number"]
+        answer = answers_map.get(qnum, {})
+        analysis = analyses.get(qnum, {})
+        sb.table("paper_questions").insert(strip_nulls({
+            "paper_id": paper_id,
+            "question_number": qnum,
+            "parent_question": q.get("parent_question"),
+            "display_order": i,
+            "question_type": q["question_type"],
+            "question_text": q["question_text"],
+            "score": q.get("score"),
+            "page_number": q.get("page_number"),
+            "options": q.get("options"),
+            "correct_option": answer.get("correct_option"),
+            "correct_answer": answer.get("correct_answer"),
+            "raw_answer_text": answer.get("raw_answer_text"),
+            "topics": q.get("topics", []),
+            "analytics_topic": q.get("topics", [None])[0],
+            "topic_tags": q.get("topics", []),
+            "difficulty": q.get("difficulty"),
+            "knowledge_reminder": analysis.get("knowledge_reminder", ""),
+            "ai_hint": analysis.get("ai_hint", ""),
+            "solution": analysis.get("solution", ""),
+        })).execute()
+
+    sb.table("papers").update({
+        "question_count": len(questions),
+        "total_score": structure.get("total_score"),
+        "topics_summary": structure.get("topics_summary"),
+        "difficulty_level": structure.get("difficulty_level"),
+    }).eq("id", paper_id).execute()
+
+    print(f"done ({len(questions)} questions written)")
+
+
+async def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--course", help="Course code")
+    parser.add_argument("--paper-id", help="Single paper ID")
+    args = parser.parse_args()
+
+    sb = get_supabase()
+    query = sb.table("papers").select("*").eq("status", "ready")
+    if args.paper_id:
+        query = query.eq("id", args.paper_id)
+    elif args.course:
+        query = query.eq("course_code", args.course.upper())
+    papers = query.order("created_at").execute().data
+
+    print(f"Papers to reprocess: {len(papers)}")
+    for paper in papers:
+        try:
+            await reprocess_paper(paper)
+        except Exception as e:
+            print(f"  ERROR: {e}")
+            traceback.print_exc()
+
+    print("\nAll done.")
+
+
+if __name__ == "__main__":
+    asyncio.run(main())
--- a/backend/fill_manual_study_aids.py
+++ b/backend/fill_manual_study_aids.py
@@ -0,0 +1,29 @@
+"""Deprecated: study aids must come from LLM output, not template fillers."""
+
+from __future__ import annotations
+
+import sys
+
+
+MESSAGE = """
+fill_manual_study_aids.py is intentionally disabled.
+
+Reason:
+- knowledge_reminder / ai_hint / solution must be generated by LLM
+- template-based filler content polluted the COMP2211 course library
+
+Use one of these paths instead:
+1. Regenerate study aids through the real LLM pipeline in app/services/paper_processor.py
+2. Rebuild paper_questions from a reviewed source and then run LLM generation
+
+This script must not be used to backfill production study aids.
+""".strip()
+
+
+def main() -> None:
+    print(MESSAGE, file=sys.stderr)
+    raise SystemExit(1)
+
+
+if __name__ == "__main__":
+    main()
--- a/backend/import_course_manifest.py
+++ b/backend/import_course_manifest.py
@@ -0,0 +1,240 @@
+"""Import a canonical course manifest into Supabase-backed papers."""
+
+from __future__ import annotations
+
+import argparse
+import asyncio
+import json
+from pathlib import Path
+from typing import Any
+
+from app.services.paper_processor import process_paper
+from app.services.supabase_client import get_supabase
+
+
+def parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(
+        description="Import a canonical course paper manifest into Supabase."
+    )
+    parser.add_argument(
+        "--manifest",
+        type=Path,
+        required=True,
+        help="Path to the manifest JSON file.",
+    )
+    parser.add_argument(
+        "--papers-root",
+        type=Path,
+        required=True,
+        help="Root folder that contains the course PDF files referenced by the manifest.",
+    )
+    parser.add_argument(
+        "--user-id",
+        required=False,
+        help="Existing auth.users UUID used as the owner of imported course-library rows.",
+    )
+    parser.add_argument(
+        "--course-code",
+        help="Optional filter to only import entries from one course.",
+    )
+    parser.add_argument(
+        "--exam-key",
+        action="append",
+        dest="exam_keys",
+        default=[],
+        help="Optional exam_key filter. Repeat the flag to import multiple entries.",
+    )
+    parser.add_argument(
+        "--process",
+        action="store_true",
+        help="Run the full paper processing pipeline after the files are uploaded.",
+    )
+    parser.add_argument(
+        "--dry-run",
+        action="store_true",
+        help="Print what would be imported without uploading or writing database rows.",
+    )
+    return parser.parse_args()
+
+
+def load_manifest(path: Path) -> list[dict[str, Any]]:
+    with path.open("r", encoding="utf-8") as f:
+        data = json.load(f)
+    if not isinstance(data, list):
+        raise ValueError("Manifest must be a JSON array.")
+    return data
+
+
+def should_import(entry: dict[str, Any], args: argparse.Namespace) -> bool:
+    if args.course_code and entry.get("course_code") != args.course_code:
+        return False
+    if args.exam_keys and entry.get("exam_key") not in set(args.exam_keys):
+        return False
+    return bool(entry.get("importable"))
+
+
+def resolve_file_path(root: Path, filename: str | None) -> Path | None:
+    if not filename:
+        return None
+
+    direct = root / filename
+    if direct.exists():
+        return direct
+
+    all_files = [candidate for candidate in root.iterdir() if candidate.is_file()]
+
+    def normalize(name: str) -> str:
+        return name.replace(" (1)", "")
+
+    target_name = normalize(filename)
+    normalized = [candidate for candidate in all_files if normalize(candidate.name) == target_name]
+    if len(normalized) == 1:
+        return normalized[0]
+
+    path = Path(filename)
+    normalized_stem = normalize(path.stem)
+    suffix = path.suffix
+    stem_matches = [
+        candidate
+        for candidate in all_files
+        if candidate.suffix == suffix and normalize(candidate.stem) == normalized_stem
+    ]
+    if len(stem_matches) == 1:
+        return stem_matches[0]
+
+    return None
+
+
+def read_file_bytes(root: Path, filename: str | None) -> bytes | None:
+    if not filename:
+        return None
+    path = resolve_file_path(root, filename)
+    if path is None or not path.exists():
+        raise FileNotFoundError(f"Referenced file does not exist under {root}: {filename}")
+    return path.read_bytes()
+
+
+def build_storage_path(entry: dict[str, Any], kind: str) -> str:
+    exam_key = entry["exam_key"]
+    return f"course-library/{entry['course_code']}/{exam_key}/{kind}.pdf"
+
+
+def upsert_paper_record(
+    entry: dict[str, Any],
+    user_id: str | None,
+    paper_url: str,
+    answer_url: str | None,
+) -> str:
+    sb = get_supabase()
+    payload = {
+        "user_id": user_id,
+        "course_code": entry["course_code"],
+        "year": entry["year"],
+        "term": entry["term"],
+        "exam_type": entry["exam_type"],
+        "part_label": entry.get("part_label"),
+        "paper_file_url": paper_url,
+        "answer_file_url": answer_url,
+        "status": "processing",
+        "source_kind": "course_library",
+        "source_exam_key": entry["exam_key"],
+        "source_question_filename": entry.get("question_pdf"),
+        "source_answer_filename": entry.get("primary_answer_pdf"),
+    }
+
+    existing = (
+        sb.table("papers")
+        .select("id")
+        .eq("source_kind", "course_library")
+        .eq("source_exam_key", entry["exam_key"])
+        .limit(1)
+        .execute()
+        .data
+    )
+    if existing:
+        paper_id = existing[0]["id"]
+        sb.table("papers").update(payload).eq("id", paper_id).execute()
+        return paper_id
+
+    created = sb.table("papers").insert(payload).execute().data
+    return created[0]["id"]
+
+
+def reset_existing_processed_data(paper_id: str) -> None:
+    sb = get_supabase()
+    sb.table("paper_questions").delete().eq("paper_id", paper_id).execute()
+    sb.table("papers").update(
+        {
+            "status": "processing",
+            "error_message": None,
+            "paper_extracted_text": None,
+            "answer_extracted_text": None,
+            "total_score": None,
+            "question_count": None,
+            "topics_summary": None,
+            "difficulty_level": None,
+        }
+    ).eq("id", paper_id).execute()
+
+
+async def import_entry(
+    entry: dict[str, Any],
+    args: argparse.Namespace,
+) -> None:
+    paper_bytes = read_file_bytes(args.papers_root, entry.get("question_pdf"))
+    answer_bytes = read_file_bytes(args.papers_root, entry.get("primary_answer_pdf"))
+
+    if paper_bytes is None:
+        raise ValueError(f"Importable entry is missing question PDF: {entry['exam_key']}")
+
+    if args.dry_run:
+        print(
+            f"[dry-run] {entry['exam_key']}: "
+            f"question={entry.get('question_pdf')} answer={entry.get('primary_answer_pdf')}"
+        )
+        return
+
+    sb = get_supabase()
+    paper_path = build_storage_path(entry, "paper")
+    sb.storage.from_("papers").upload(
+        paper_path,
+        paper_bytes,
+        file_options={"content-type": "application/pdf", "upsert": "true"},
+    )
+    paper_url = sb.storage.from_("papers").get_public_url(paper_path)
+
+    answer_url = None
+    if answer_bytes:
+        answer_path = build_storage_path(entry, "answer")
+        sb.storage.from_("papers").upload(
+            answer_path,
+            answer_bytes,
+            file_options={"content-type": "application/pdf", "upsert": "true"},
+        )
+        answer_url = sb.storage.from_("papers").get_public_url(answer_path)
+
+    paper_id = upsert_paper_record(entry, args.user_id, paper_url, answer_url)
+    print(f"Imported metadata for {entry['exam_key']} -> paper_id={paper_id}")
+
+    if args.process:
+        reset_existing_processed_data(paper_id)
+        await process_paper(paper_id, paper_bytes, answer_bytes)
+        print(f"Processed {entry['exam_key']}")
+
+
+async def main() -> None:
+    args = parse_args()
+    manifest = load_manifest(args.manifest)
+    entries = [entry for entry in manifest if should_import(entry, args)]
+
+    if not entries:
+        print("No manifest entries matched the provided filters.")
+        return
+
+    print(f"Preparing to import {len(entries)} manifest entries.")
+    for entry in entries:
+        await import_entry(entry, args)
+
+
+if __name__ == "__main__":
+    asyncio.run(main())
--- a/backend/pyproject.toml
+++ b/backend/pyproject.toml
@@ -0,0 +1,17 @@
+[project]
+name = "pastpaper-master-backend"
+version = "0.1.0"
+requires-python = ">=3.11"
+dependencies = [
+    "fastapi>=0.115.0",
+    "uvicorn[standard]>=0.30.0",
+    "python-dotenv>=1.0.0",
+    "python-multipart>=0.0.9",
+    "supabase>=2.0.0",
+    "openai>=1.50.0",
+    "PyMuPDF>=1.24.0",
+    "pydantic>=2.0.0",
+    "pydantic-settings>=2.0.0",
+    "httpx>=0.27.0",
+    "numpy>=2.4.4",
+]
--- a/backend/regen_ai_trio_comp2211.py
+++ b/backend/regen_ai_trio_comp2211.py
@@ -0,0 +1,174 @@
+"""Regenerate AI trio (knowledge_reminder, ai_hint, solution) for all COMP2211 course-library questions.
+
+Reads existing paper_questions rows and runs the same BATCH_ANALYSIS_PROMPT used by
+paper_processor.py — but does UPDATE instead of INSERT, so question structure is untouched.
+
+Run from the backend directory:
+    uv run python regen_ai_trio_comp2211.py
+
+Pass --dry-run to print batches without calling the LLM or writing to the database.
+"""
+
+from __future__ import annotations
+
+import asyncio
+import json
+import sys
+from app.services.supabase_client import get_supabase
+from app.services.paper_processor import BATCH_ANALYSIS_PROMPT, qwen_json_completion, chunked
+
+
+def build_reference_answer(q: dict) -> str:
+    if q.get("raw_answer_text"):
+        return q["raw_answer_text"]
+    if q.get("correct_option"):
+        return f"Correct option: {q['correct_option']}"
+    if q.get("correct_answer"):
+        return f"Correct answer: {q['correct_answer']}"
+    return ""
+
+
+async def regen(dry_run: bool = False) -> None:
+    sb = get_supabase()
+
+    papers = (
+        sb.table("papers")
+        .select("id")
+        .eq("course_code", "COMP2211")
+        .eq("source_kind", "course_library")
+        .execute()
+        .data
+    )
+    paper_ids = [p["id"] for p in papers]
+    if not paper_ids:
+        print("No COMP2211 course-library papers found.")
+        return
+
+    questions = (
+        sb.table("paper_questions")
+        .select("id, paper_id, question_number, question_type, score, question_text, topics, raw_answer_text, correct_option, correct_answer")
+        .in_("paper_id", paper_ids)
+        .order("paper_id")
+        .order("display_order")
+        .execute()
+        .data
+    )
+    print(f"Found {len(questions)} questions across {len(paper_ids)} papers.")
+
+    payloads = [
+        {
+            "question_number": q["question_number"],
+            "question_type": q["question_type"] or "long_question",
+            "score": q.get("score") or "unknown",
+            "question_text": q.get("question_text") or "",
+            "topics": q.get("topics") or [],
+            "reference_answer": build_reference_answer(q),
+        }
+        for q in questions
+    ]
+
+    id_by_qnum_paper: dict[tuple[str, str], str] = {
+        (q["paper_id"], q["question_number"]): q["id"]
+        for q in questions
+    }
+    paper_id_by_qnum: dict[str, str] = {
+        q["question_number"]: q["paper_id"] for q in questions
+    }
+
+    # Group payloads by paper so batches don't mix papers (cleaner context for LLM)
+    from collections import defaultdict
+    payloads_by_paper: dict[str, list[dict]] = defaultdict(list)
+    for q, payload in zip(questions, payloads):
+        payloads_by_paper[q["paper_id"]].append((q["id"], payload))
+
+    total_updated = 0
+    total_papers = len(payloads_by_paper)
+
+    for paper_idx, (paper_id, items) in enumerate(payloads_by_paper.items(), 1):
+        ids = [item[0] for item in items]
+        batch_payloads = [item[1] for item in items]
+
+        print(f"\n[{paper_idx}/{total_papers}] paper_id={paper_id} — {len(batch_payloads)} questions")
+
+        for batch_idx, batch in enumerate(chunked(batch_payloads, 3), 1):
+            print(f"  Batch {batch_idx}: questions {[b['question_number'] for b in batch]}", end="", flush=True)
+
+            if dry_run:
+                print(" [dry-run, skipped]")
+                continue
+
+            batch_start = (batch_idx - 1) * 3
+            batch_ids = ids[batch_start: batch_start + 3]
+
+            async def run_single(row_id: str, payload: dict) -> bool:
+                try:
+                    r = await qwen_json_completion(
+                        system_prompt=BATCH_ANALYSIS_PROMPT.format(
+                            questions_payload=json.dumps([payload], ensure_ascii=False),
+                        ),
+                        temperature=0.3,
+                        max_tokens=8192,
+                    )
+                    items = r.get("analyses", [])
+                    if not items:
+                        return False
+                    analysis = items[0]
+                    sb.table("paper_questions").update({
+                        "knowledge_reminder": analysis.get("knowledge_reminder", ""),
+                        "ai_hint": analysis.get("ai_hint", ""),
+                        "solution": analysis.get("solution", ""),
+                    }).eq("id", row_id).execute()
+                    return True
+                except Exception:
+                    return False
+
+            try:
+                result = await qwen_json_completion(
+                    system_prompt=BATCH_ANALYSIS_PROMPT.format(
+                        questions_payload=json.dumps(batch, ensure_ascii=False),
+                    ),
+                    temperature=0.3,
+                    max_tokens=8192,
+                )
+                analyses = {item["question_number"]: item for item in result.get("analyses", [])}
+                written = 0
+                for row_id, payload in zip(batch_ids, batch):
+                    qnum = payload["question_number"]
+                    analysis = analyses.get(qnum)
+                    if not analysis:
+                        # fallback: retry this single question alone
+                        ok = await run_single(row_id, payload)
+                        if ok:
+                            written += 1
+                            total_updated += 1
+                        else:
+                            print(f"\n  SKIP: {qnum}")
+                    else:
+                        sb.table("paper_questions").update({
+                            "knowledge_reminder": analysis.get("knowledge_reminder", ""),
+                            "ai_hint": analysis.get("ai_hint", ""),
+                            "solution": analysis.get("solution", ""),
+                        }).eq("id", row_id).execute()
+                        written += 1
+                        total_updated += 1
+                print(f" → {written} written")
+            except Exception as exc:
+                # batch failed entirely — retry each question individually
+                print(f" [batch error, retrying 1-by-1]")
+                written = 0
+                for row_id, payload in zip(batch_ids, batch):
+                    ok = await run_single(row_id, payload)
+                    if ok:
+                        written += 1
+                        total_updated += 1
+                    await asyncio.sleep(1)
+                print(f" → {written}/{len(batch)} written")
+
+            await asyncio.sleep(2.5)
+
+    print(f"\nDone. {total_updated} questions updated.")
+
+
+if __name__ == "__main__":
+    dry_run = "--dry-run" in sys.argv
+    asyncio.run(regen(dry_run=dry_run))
--- a/backend/regenerate_analysis.py
+++ b/backend/regenerate_analysis.py
@@ -0,0 +1,69 @@
+"""Re-generate AI trio (knowledge_reminder, ai_hint, solution) in English for existing questions."""
+
+import json
+import asyncio
+from app.services.supabase_client import get_supabase
+from app.services.llm_clients import get_qwen_client
+from app.services.paper_processor import ANALYSIS_PROMPT
+
+
+async def regenerate_for_paper(paper_id: str):
+    sb = get_supabase()
+    qwen = get_qwen_client()
+
+    questions = sb.table("paper_questions").select("*").eq("paper_id", paper_id).order("display_order").execute().data
+    print(f"Found {len(questions)} questions for paper {paper_id[:8]}")
+
+    for q in questions:
+        qnum = q["question_number"]
+        print(f"  Regenerating Q{qnum}...", end=" ", flush=True)
+
+        answer_section = ""
+        if q.get("raw_answer_text"):
+            answer_section = f"- Reference answer: {q['raw_answer_text']}"
+        elif q.get("correct_option"):
+            answer_section = f"- Correct option: {q['correct_option']}"
+        elif q.get("correct_answer"):
+            answer_section = f"- Correct answer: {q['correct_answer']}"
+
+        resp = qwen.chat.completions.create(
+            model="qwen-plus",
+            messages=[
+                {"role": "system", "content": ANALYSIS_PROMPT.format(
+                    question_number=qnum,
+                    question_type=q["question_type"],
+                    score=q.get("score", "unknown"),
+                    question_text=q["question_text"],
+                    topics=", ".join(q.get("topics", [])),
+                    answer_section=answer_section,
+                )},
+            ],
+            temperature=0.3,
+            response_format={"type": "json_object"},
+        )
+        analysis = json.loads(resp.choices[0].message.content)
+
+        sb.table("paper_questions").update({
+            "knowledge_reminder": analysis.get("knowledge_reminder", ""),
+            "ai_hint": analysis.get("ai_hint", ""),
+            "solution": analysis.get("solution", ""),
+        }).eq("id", q["id"]).execute()
+
+        print("done")
+
+    print(f"All questions regenerated for paper {paper_id[:8]}")
+
+
+async def main():
+    sb = get_supabase()
+    papers = sb.table("papers").select("id,course_code,year,term").eq("status", "ready").order("created_at", desc=True).execute().data
+
+    for p in papers:
+        print(f"\n=== {p['course_code']} {p['year']} {p['term']} ===")
+        await regenerate_for_paper(p["id"])
+
+    print("\nAll done!")
+
+
+if __name__ == "__main__":
+    asyncio.run(main())
--- a/backend/split_comp2211_2022_spring_final_part_a.py
+++ b/backend/split_comp2211_2022_spring_final_part_a.py
@@ -0,0 +1,224 @@
+"""Split COMP2211 Spring 2022 final part A into subquestions."""
+
+from __future__ import annotations
+
+import json
+import re
+from dataclasses import dataclass
+from pathlib import Path
+
+from app.services.supabase_client import get_supabase
+
+
+EXAM_KEY = "COMP2211-2022-spring-final-part-a"
+TRUE_FALSE_OPTIONS = [{"label": "True", "text": "True"}, {"label": "False", "text": "False"}]
+PROBLEM_SEED_PATH = (
+    Path(__file__).resolve().parent.parent
+    / "pastpaper-scraper"
+    / "reviews"
+    / "COMP2211"
+    / "problem_seed.json"
+)
+
+
+@dataclass(frozen=True)
+class ChildSpec:
+    question_number: str
+    parent_question: str
+    top_level_number: str
+    path: tuple[str, ...]
+    score: float
+    question_type: str
+    question_format: str | None = None
+    analytics_topic: str | None = None
+    topic_primary: str | None = None
+    topic_tags: tuple[str, ...] | None = None
+    skill_tags: tuple[str, ...] | None = None
+    page_number: int = 1
+
+
+def short_answer(
+    question_number: str,
+    parent_question: str,
+    top_level_number: str,
+    path: tuple[str, ...],
+    score: float,
+    *,
+    analytics_topic: str | None = None,
+    topic_primary: str | None = None,
+    topic_tags: tuple[str, ...] | None = None,
+    skill_tags: tuple[str, ...] | None = None,
+    page_number: int,
+) -> ChildSpec:
+    return ChildSpec(
+        question_number=question_number,
+        parent_question=parent_question,
+        top_level_number=top_level_number,
+        path=path,
+        score=score,
+        question_type="long_question",
+        question_format="short_answer",
+        analytics_topic=analytics_topic,
+        topic_primary=topic_primary,
+        topic_tags=topic_tags,
+        skill_tags=skill_tags,
+        page_number=page_number,
+    )
+
+
+CHILDREN: list[ChildSpec] = [
+    ChildSpec("1a", "1", "1", ("a",), 1, "true_false", "true_false", "KNN and Clustering", "KNN and Clustering", ("KNN and Clustering",), ("concept_check", "algorithm_property"), page_number=2),
+    ChildSpec("1b", "1", "1", ("b",), 1, "true_false", "true_false", "Perceptron and MLP", "Perceptron and MLP", ("Perceptron and MLP",), ("concept_check", "architecture_reasoning"), page_number=2),
+    ChildSpec("1c", "1", "1", ("c",), 1, "true_false", "true_false", "Perceptron and MLP", "Perceptron and MLP", ("Perceptron and MLP",), ("concept_check", "activation_selection"), page_number=2),
+    ChildSpec("1d", "1", "1", ("d",), 1, "true_false", "true_false", "Evaluation and Validation", "Evaluation and Validation", ("Evaluation and Validation",), ("concept_check", "metric_reasoning"), page_number=2),
+    ChildSpec("1e", "1", "1", ("e",), 1, "true_false", "true_false", "Vision and CNN", "Vision and CNN", ("Vision and CNN",), ("concept_check", "hardware_reasoning"), page_number=2),
+    ChildSpec("1f", "1", "1", ("f",), 1, "true_false", "true_false", "Vision and CNN", "Vision and CNN", ("Vision and CNN",), ("concept_check", "image_processing"), page_number=2),
+    ChildSpec("1g", "1", "1", ("g",), 1, "true_false", "true_false", "Vision and CNN", "Vision and CNN", ("Vision and CNN",), ("concept_check", "cnn_architecture"), page_number=2),
+    ChildSpec("1h", "1", "1", ("h",), 1, "true_false", "true_false", "Perceptron and MLP", "Perceptron and MLP", ("Perceptron and MLP",), ("concept_check", "regularization"), page_number=2),
+    ChildSpec("1i", "1", "1", ("i",), 1, "true_false", "true_false", "Search and Games", "Search and Games", ("Search and Games",), ("concept_check", "game_reasoning"), page_number=2),
+    ChildSpec("1j", "1", "1", ("j",), 1, "true_false", "true_false", "Search and Games", "Search and Games", ("Search and Games",), ("concept_check", "pruning_reasoning"), page_number=2),
+    ChildSpec("2a", "2", "2", ("a",), 6.5, "long_question", "long_answer", "Probabilistic Models", "Probabilistic Models", ("Probabilistic Models",), ("manual_computation", "probability_reasoning", "classification_decision"), page_number=4),
+    ChildSpec("2b", "2", "2", ("b",), 7.5, "long_question", "long_answer", "KNN and Clustering", "KNN and Clustering", ("KNN and Clustering",), ("distance_calculation", "algorithm_tracing", "classification_decision"), page_number=4),
+    short_answer("3a", "3", "3", ("a",), 3, analytics_topic="Evaluation and Validation", topic_primary="Evaluation and Validation", topic_tags=("Evaluation and Validation",), skill_tags=("concept_explanation", "metric_reasoning"), page_number=6),
+    short_answer("3b", "3", "3", ("b",), 2, analytics_topic="Perceptron and MLP", topic_primary="Perceptron and MLP", topic_tags=("Perceptron and MLP",), skill_tags=("concept_explanation", "activation_selection"), page_number=6),
+    short_answer("3c", "3", "3", ("c",), 2, analytics_topic="Perceptron and MLP", topic_primary="Perceptron and MLP", topic_tags=("Perceptron and MLP",), skill_tags=("architecture_reasoning", "output_layer_design"), page_number=6),
+    short_answer("3d", "3", "3", ("d",), 3, analytics_topic="Perceptron and MLP", topic_primary="Perceptron and MLP", topic_tags=("Perceptron and MLP",), skill_tags=("concept_explanation", "optimization_reasoning"), page_number=6),
+    short_answer("3e_i", "3e", "3", ("e", "i"), 1, analytics_topic="Perceptron and MLP", topic_primary="Perceptron and MLP", topic_tags=("Perceptron and MLP",), skill_tags=("optimization_reasoning",), page_number=6),
+    short_answer("3e_ii", "3e", "3", ("e", "ii"), 1, analytics_topic="Perceptron and MLP", topic_primary="Perceptron and MLP", topic_tags=("Perceptron and MLP",), skill_tags=("optimization_reasoning",), page_number=6),
+    short_answer("3f", "3", "3", ("f",), 2, analytics_topic="Perceptron and MLP", topic_primary="Perceptron and MLP", topic_tags=("Perceptron and MLP",), skill_tags=("regularization", "concept_explanation"), page_number=6),
+    ChildSpec("4a_i", "4a", "4", ("a", "i"), 2, "fill_blank", "fill_blank", page_number=7),
+    ChildSpec("4a_ii", "4a", "4", ("a", "ii"), 2, "long_question", "long_answer", page_number=7),
+    ChildSpec("4b_i", "4b", "4", ("b", "i"), 3, "fill_blank", "fill_blank", page_number=7),
+    ChildSpec("4b_ii", "4b", "4", ("b", "ii"), 4, "fill_blank", "fill_blank", page_number=7),
+    ChildSpec("4b_iii", "4b", "4", ("b", "iii"), 4, "long_question", "long_answer", page_number=7),
+]
+
+
+MARKER_RE = re.compile(r"(?m)^\(([a-z]+|[ivx]+)\)\s*")
+
+
+def split_sections(text: str) -> tuple[str, dict[str, str]]:
+    matches = list(MARKER_RE.finditer(text))
+    if not matches:
+        return text.strip(), {}
+    intro = text[: matches[0].start()].strip()
+    sections: dict[str, str] = {}
+    for idx, match in enumerate(matches):
+        marker = match.group(1)
+        end = matches[idx + 1].start() if idx + 1 < len(matches) else len(text)
+        sections[marker] = text[match.start() : end].strip()
+    return intro, sections
+
+
+def extract_segment(text: str, path: tuple[str, ...]) -> str:
+    current = text.strip()
+    carried_intro: list[str] = []
+    for depth, marker in enumerate(path):
+        intro, sections = split_sections(current)
+        if depth == 0 and intro:
+            carried_intro.append(intro)
+        current = sections.get(marker, current)
+    return "\n".join(part for part in [*carried_intro, current] if part).strip()
+
+
+def extract_true_false_answers(answer_text: str) -> dict[str, str]:
+    answers: dict[str, str] = {}
+    matches = list(re.finditer(r"(?m)^\(([a-j])\)\s*\n?([TF])\b", answer_text))
+    for match in matches:
+        answers[match.group(1)] = match.group(2)
+    return answers
+
+
+def derive_correct_answer(answer_text: str) -> str | None:
+    if not answer_text:
+        return None
+    tail = answer_text.split("Answer:", 1)[1] if "Answer:" in answer_text else answer_text
+    lines = [line.strip() for line in tail.splitlines() if line.strip()]
+    if not lines:
+        return None
+    first = lines[0]
+    if first.lower().startswith("marking scheme"):
+        return None
+    if len(first) <= 240:
+        return first
+    return None
+
+
+def load_seed_rows() -> dict[str, dict]:
+    data = json.loads(PROBLEM_SEED_PATH.read_text())
+    return {
+        row["question_number"]: row
+        for row in data
+        if row["source_exam_key"] == EXAM_KEY
+    }
+
+
+def main() -> None:
+    sb = get_supabase()
+    paper = sb.table("papers").select("id").eq("source_exam_key", EXAM_KEY).execute().data[0]
+    paper_id = paper["id"]
+
+    current_rows = (
+        sb.table("paper_questions")
+        .select("*")
+        .eq("paper_id", paper_id)
+        .order("display_order")
+        .execute()
+        .data
+    )
+    existing_by_number = {row["question_number"]: row for row in current_rows}
+    parent_rows = load_seed_rows()
+    tf_answers = extract_true_false_answers(parent_rows["1"]["raw_answer_text"] or "")
+
+    inserts = []
+    for display_order, child in enumerate(CHILDREN, start=1):
+        parent = parent_rows[child.top_level_number]
+        existing = existing_by_number.get(child.question_number, {})
+        question_text = extract_segment(parent["question_text"] or "", child.path)
+        raw_answer_text = extract_segment(parent["raw_answer_text"] or "", child.path)
+
+        correct_option = None
+        correct_answer = None
+        options = None
+        if child.question_type == "true_false":
+            correct_option = tf_answers.get(child.path[0])
+            options = TRUE_FALSE_OPTIONS
+        elif child.question_type == "fill_blank":
+            correct_answer = derive_correct_answer(raw_answer_text)
+
+        inserts.append(
+            {
+                "paper_id": paper_id,
+                "question_number": child.question_number,
+                "parent_question": child.parent_question,
+                "display_order": display_order,
+                "question_type": child.question_type,
+                "question_format": child.question_format,
+                "question_text": question_text,
+                "score": child.score,
+                "page_number": child.page_number,
+                "page_y_ratio": existing.get("page_y_ratio"),
+                "options": options,
+                "correct_option": correct_option,
+                "correct_answer": correct_answer,
+                "raw_answer_text": raw_answer_text,
+                "topics": existing.get("topics") or (list(child.topic_tags) if child.topic_tags else parent.get("topics")),
+                "topic_primary": existing.get("topic_primary") or child.topic_primary or parent.get("topic_primary"),
+                "analytics_topic": existing.get("analytics_topic") or child.analytics_topic or parent.get("analytics_topic"),
+                "topic_tags": existing.get("topic_tags") or (list(child.topic_tags) if child.topic_tags else parent.get("topic_tags")),
+                "skill_tags": existing.get("skill_tags") or (list(child.skill_tags) if child.skill_tags else parent.get("skill_tags")),
+                "difficulty": existing.get("difficulty") or parent.get("difficulty"),
+                "knowledge_reminder": existing.get("knowledge_reminder", ""),
+                "ai_hint": existing.get("ai_hint", ""),
+                "solution": existing.get("solution", ""),
+            }
+        )
+
+    sb.table("paper_questions").delete().eq("paper_id", paper_id).execute()
+    sb.table("paper_questions").insert(inserts).execute()
+    sb.table("papers").update({"question_count": len(inserts), "status": "processing"}).eq("id", paper_id).execute()
+    print(f"Inserted {len(inserts)} rows for {EXAM_KEY}.")
+
+
+if __name__ == "__main__":
+    main()
--- a/backend/split_comp2211_2022_spring_final_part_b.py
+++ b/backend/split_comp2211_2022_spring_final_part_b.py
@@ -0,0 +1,232 @@
+"""Split COMP2211 Spring 2022 final part B into subquestions."""
+
+from __future__ import annotations
+
+import json
+import re
+from dataclasses import dataclass
+from pathlib import Path
+
+from app.services.supabase_client import get_supabase
+
+
+EXAM_KEY = "COMP2211-2022-spring-final-part-b"
+PROBLEM_SEED_PATH = (
+    Path(__file__).resolve().parent.parent
+    / "pastpaper-scraper"
+    / "reviews"
+    / "COMP2211"
+    / "problem_seed.json"
+)
+
+
+@dataclass(frozen=True)
+class ChildSpec:
+    question_number: str
+    parent_question: str
+    top_level_number: str
+    path: tuple[str, ...]
+    score: float
+    question_type: str
+    question_format: str | None = None
+    analytics_topic: str | None = None
+    topic_primary: str | None = None
+    topic_tags: tuple[str, ...] | None = None
+    skill_tags: tuple[str, ...] | None = None
+    options: tuple[tuple[str, str], ...] | None = None
+    correct_option: str | None = None
+    correct_answer: str | None = None
+    page_number: int = 1
+
+
+def short_answer(
+    question_number: str,
+    parent_question: str,
+    top_level_number: str,
+    path: tuple[str, ...],
+    score: float,
+    *,
+    analytics_topic: str | None = None,
+    topic_primary: str | None = None,
+    topic_tags: tuple[str, ...] | None = None,
+    skill_tags: tuple[str, ...] | None = None,
+    correct_answer: str | None = None,
+    page_number: int,
+) -> ChildSpec:
+    return ChildSpec(
+        question_number=question_number,
+        parent_question=parent_question,
+        top_level_number=top_level_number,
+        path=path,
+        score=score,
+        question_type="long_question",
+        question_format="short_answer",
+        analytics_topic=analytics_topic,
+        topic_primary=topic_primary,
+        topic_tags=topic_tags,
+        skill_tags=skill_tags,
+        correct_answer=correct_answer,
+        page_number=page_number,
+    )
+
+
+def mc(
+    question_number: str,
+    parent_question: str,
+    top_level_number: str,
+    path: tuple[str, ...],
+    score: float,
+    *,
+    options: tuple[tuple[str, str], ...],
+    correct_option: str,
+    analytics_topic: str,
+    skill_tags: tuple[str, ...],
+    page_number: int,
+) -> ChildSpec:
+    return ChildSpec(
+        question_number=question_number,
+        parent_question=parent_question,
+        top_level_number=top_level_number,
+        path=path,
+        score=score,
+        question_type="mc",
+        question_format="mc",
+        analytics_topic=analytics_topic,
+        topic_primary=analytics_topic,
+        topic_tags=(analytics_topic,),
+        skill_tags=skill_tags,
+        options=options,
+        correct_option=correct_option,
+        page_number=page_number,
+    )
+
+
+ETHICS_ABCD = (
+    ("A", "A"),
+    ("B", "B"),
+    ("C", "C"),
+    ("D", "D"),
+)
+
+
+CHILDREN: list[ChildSpec] = [
+    ChildSpec("1a", "1", "1", ("a",), 1.5, "long_question", "long_answer", page_number=2),
+    short_answer("1b", "1", "1", ("b",), 1.5, analytics_topic="Vision and CNN", topic_primary="Vision and CNN", topic_tags=("Vision and CNN",), skill_tags=("concept_explanation", "data_augmentation"), page_number=2),
+    ChildSpec("1c", "1", "1", ("c",), 4.5, "long_question", "long_answer", page_number=2),
+    short_answer("1d", "1", "1", ("d",), 2, analytics_topic="Vision and CNN", topic_primary="Vision and CNN", topic_tags=("Vision and CNN",), skill_tags=("architecture_reasoning", "parameter_reduction"), page_number=3),
+    ChildSpec("1e", "1", "1", ("e",), 2.5, "fill_blank", "fill_blank", correct_answer="1558656", page_number=3),
+    ChildSpec("1f_i", "1f", "1", ("f", "i"), 2.5, "fill_blank", "fill_blank", correct_answer="2071656", page_number=3),
+    ChildSpec("1f_ii", "1f", "1", ("f", "ii"), 2.5, "fill_blank", "fill_blank", correct_answer="150529000", page_number=4),
+    short_answer("1g", "1", "1", ("g",), 2, analytics_topic="Vision and CNN", topic_primary="Vision and CNN", topic_tags=("Vision and CNN",), skill_tags=("architecture_reasoning", "comparison"), page_number=4),
+    ChildSpec("2a", "2", "2", ("a",), 9, "long_question", "coding", page_number=5),
+    short_answer("2b", "2", "2", ("b",), 4, analytics_topic="Vision and CNN", topic_primary="Vision and CNN", topic_tags=("Vision and CNN",), skill_tags=("architecture_reasoning", "regression_reasoning"), page_number=6),
+    ChildSpec("3a", "3", "3", ("a",), 3.5, "long_question", "long_answer", page_number=9),
+    short_answer("3b", "3", "3", ("b",), 0.5, analytics_topic="Search and Games", topic_primary="Search and Games", topic_tags=("Search and Games",), skill_tags=("game_reasoning",), correct_answer="E-a", page_number=9),
+    short_answer("3c", "3", "3", ("c",), 1.5, analytics_topic="Search and Games", topic_primary="Search and Games", topic_tags=("Search and Games",), skill_tags=("concept_explanation", "game_reasoning"), page_number=9),
+    short_answer("3d", "3", "3", ("d",), 2.5, analytics_topic="Search and Games", topic_primary="Search and Games", topic_tags=("Search and Games",), skill_tags=("pruning_reasoning",), correct_answer="E-j and E-f", page_number=9),
+    mc("4a", "4", "4", ("a",), 1, options=ETHICS_ABCD, correct_option="C", analytics_topic="Ethics of AI", skill_tags=("concept_check", "ethical_reasoning"), page_number=10),
+    mc("4b", "4", "4", ("b",), 1, options=ETHICS_ABCD, correct_option="A", analytics_topic="Ethics of AI", skill_tags=("concept_check", "bias_reasoning"), page_number=10),
+    mc("4c", "4", "4", ("c",), 1, options=ETHICS_ABCD, correct_option="C", analytics_topic="Ethics of AI", skill_tags=("concept_check", "ethical_reasoning"), page_number=10),
+    mc("4d", "4", "4", ("d",), 1, options=ETHICS_ABCD, correct_option="B", analytics_topic="Ethics of AI", skill_tags=("concept_check", "bias_reasoning"), page_number=10),
+    short_answer("4e", "4", "4", ("e",), 3, analytics_topic="Ethics of AI", topic_primary="Ethics of AI", topic_tags=("Ethics of AI",), skill_tags=("argumentation", "concept_explanation"), page_number=11),
+]
+
+
+MARKER_RE = re.compile(r"(?m)^\(([a-z]+|[ivx]+)\)\s*")
+
+
+def split_sections(text: str) -> tuple[str, dict[str, str]]:
+    matches = list(MARKER_RE.finditer(text))
+    if not matches:
+        return text.strip(), {}
+    intro = text[: matches[0].start()].strip()
+    sections: dict[str, str] = {}
+    for idx, match in enumerate(matches):
+        marker = match.group(1)
+        end = matches[idx + 1].start() if idx + 1 < len(matches) else len(text)
+        sections[marker] = text[match.start() : end].strip()
+    return intro, sections
+
+
+def extract_segment(text: str, path: tuple[str, ...]) -> str:
+    current = text.strip()
+    carried_intro: list[str] = []
+    for depth, marker in enumerate(path):
+        intro, sections = split_sections(current)
+        if depth == 0 and intro:
+            carried_intro.append(intro)
+        current = sections.get(marker, current)
+    return "\n".join(part for part in [*carried_intro, current] if part).strip()
+
+
+def load_seed_rows() -> dict[str, dict]:
+    data = json.loads(PROBLEM_SEED_PATH.read_text())
+    return {
+        row["question_number"]: row
+        for row in data
+        if row["source_exam_key"] == EXAM_KEY
+    }
+
+
+def main() -> None:
+    sb = get_supabase()
+    paper = sb.table("papers").select("id").eq("source_exam_key", EXAM_KEY).execute().data[0]
+    paper_id = paper["id"]
+
+    current_rows = (
+        sb.table("paper_questions")
+        .select("*")
+        .eq("paper_id", paper_id)
+        .order("display_order")
+        .execute()
+        .data
+    )
+    existing_by_number = {row["question_number"]: row for row in current_rows}
+    parent_rows = load_seed_rows()
+
+    inserts = []
+    for display_order, child in enumerate(CHILDREN, start=1):
+        parent = parent_rows[child.top_level_number]
+        existing = existing_by_number.get(child.question_number, {})
+        question_text = extract_segment(parent["question_text"] or "", child.path)
+        raw_answer_text = extract_segment(parent["raw_answer_text"] or "", child.path)
+        options = None
+        if child.options:
+            options = [{"label": label, "text": text} for label, text in child.options]
+
+        inserts.append(
+            {
+                "paper_id": paper_id,
+                "question_number": child.question_number,
+                "parent_question": child.parent_question,
+                "display_order": display_order,
+                "question_type": child.question_type,
+                "question_format": child.question_format,
+                "question_text": question_text,
+                "score": child.score,
+                "page_number": child.page_number,
+                "page_y_ratio": existing.get("page_y_ratio"),
+                "options": options,
+                "correct_option": child.correct_option,
+                "correct_answer": child.correct_answer,
+                "raw_answer_text": raw_answer_text,
+                "topics": existing.get("topics") or (list(child.topic_tags) if child.topic_tags else parent.get("topics")),
+                "topic_primary": existing.get("topic_primary") or child.topic_primary or parent.get("topic_primary"),
+                "analytics_topic": existing.get("analytics_topic") or child.analytics_topic or parent.get("analytics_topic"),
+                "topic_tags": existing.get("topic_tags") or (list(child.topic_tags) if child.topic_tags else parent.get("topic_tags")),
+                "skill_tags": existing.get("skill_tags") or (list(child.skill_tags) if child.skill_tags else parent.get("skill_tags")),
+                "difficulty": existing.get("difficulty") or parent.get("difficulty"),
+                "knowledge_reminder": existing.get("knowledge_reminder", ""),
+                "ai_hint": existing.get("ai_hint", ""),
+                "solution": existing.get("solution", ""),
+            }
+        )
+
+    sb.table("paper_questions").delete().eq("paper_id", paper_id).execute()
+    sb.table("paper_questions").insert(inserts).execute()
+    sb.table("papers").update({"question_count": len(inserts), "status": "processing"}).eq("id", paper_id).execute()
+    print(f"Inserted {len(inserts)} rows for {EXAM_KEY}.")
+
+
+if __name__ == "__main__":
+    main()
--- a/backend/split_comp2211_2022_spring_midterm.py
+++ b/backend/split_comp2211_2022_spring_midterm.py
@@ -0,0 +1,233 @@
+"""Split COMP2211 Spring 2022 midterm top-level problems into subquestions."""
+
+from __future__ import annotations
+
+import json
+import re
+from dataclasses import dataclass
+from pathlib import Path
+
+from app.services.supabase_client import get_supabase
+
+
+EXAM_KEY = "COMP2211-2022-spring-midterm"
+TRUE_FALSE_OPTIONS = [{"label": "True", "text": "True"}, {"label": "False", "text": "False"}]
+
+
+@dataclass(frozen=True)
+class ChildSpec:
+    question_number: str
+    parent_question: str
+    top_level_number: str
+    path: tuple[str, ...]
+    score: float
+    question_type: str
+    question_format: str | None = None
+    page_number: int = 1
+
+
+def short_answer(
+    question_number: str,
+    parent_question: str,
+    top_level_number: str,
+    path: tuple[str, ...],
+    score: float,
+    *,
+    page_number: int,
+) -> ChildSpec:
+    return ChildSpec(
+        question_number=question_number,
+        parent_question=parent_question,
+        top_level_number=top_level_number,
+        path=path,
+        score=score,
+        question_type="long_question",
+        question_format="short_answer",
+        page_number=page_number,
+    )
+
+
+CHILDREN: list[ChildSpec] = [
+    *[
+        ChildSpec(f"1{letter}", "1", "1", (letter,), 1.5, "true_false", page_number=2)
+        for letter in "abcdefghij"
+    ],
+    ChildSpec("2a_i", "2a", "2", ("a", "i"), 1, "fill_blank", page_number=4),
+    ChildSpec("2a_ii", "2a", "2", ("a", "ii"), 1, "fill_blank", page_number=4),
+    ChildSpec("2a_iii", "2a", "2", ("a", "iii"), 1, "fill_blank", page_number=4),
+    ChildSpec("2a_iv", "2a", "2", ("a", "iv"), 1, "fill_blank", page_number=4),
+    ChildSpec("2a_v", "2a", "2", ("a", "v"), 1, "fill_blank", page_number=4),
+    ChildSpec("2b", "2", "2", ("b",), 2, "fill_blank", page_number=4),
+    ChildSpec("2c", "2", "2", ("c",), 9, "long_question", "coding", page_number=5),
+    ChildSpec("3a", "3", "3", ("a",), 2, "fill_blank", page_number=7),
+    ChildSpec("3b_i", "3b", "3", ("b", "i"), 1.75, "fill_blank", page_number=7),
+    ChildSpec("3b_ii", "3b", "3", ("b", "ii"), 1.75, "fill_blank", page_number=7),
+    ChildSpec("3b_iii", "3b", "3", ("b", "iii"), 1.75, "fill_blank", page_number=7),
+    ChildSpec("3b_iv", "3b", "3", ("b", "iv"), 1.75, "fill_blank", page_number=7),
+    short_answer("3c", "3", "3", ("c",), 2, page_number=8),
+    ChildSpec("4a", "4", "4", ("a",), 3, "long_question", "long_answer", page_number=9),
+    short_answer("4b_i", "4b", "4", ("b", "i"), 3, page_number=9),
+    short_answer("4b_ii", "4b", "4", ("b", "ii"), 3, page_number=9),
+    ChildSpec("4c_i", "4c", "4", ("c", "i"), 2, "long_question", "long_answer", page_number=10),
+    ChildSpec("4c_ii", "4c", "4", ("c", "ii"), 3, "long_question", "long_answer", page_number=10),
+    ChildSpec("5a", "5", "5", ("a",), 4.5, "long_question", "long_answer", page_number=11),
+    ChildSpec("5b", "5", "5", ("b",), 1.5, "fill_blank", page_number=11),
+    ChildSpec("5c", "5", "5", ("c",), 4.5, "long_question", "long_answer", page_number=11),
+    short_answer("5d", "5", "5", ("d",), 1.5, page_number=11),
+    ChildSpec("6a", "6", "6", ("a",), 8, "long_question", "long_answer", page_number=12),
+    short_answer("6b", "6", "6", ("b",), 2, page_number=13),
+    ChildSpec("6c", "6", "6", ("c",), 10, "long_question", "coding", page_number=13),
+    short_answer("7a", "7", "7", ("a",), 4, page_number=14),
+    short_answer("7b", "7", "7", ("b",), 6, page_number=14),
+    ChildSpec("7c", "7", "7", ("c",), 2, "fill_blank", page_number=15),
+]
+
+
+MARKER_RE = re.compile(r"(?m)^\(([a-z]+)\)\s*")
+PROBLEM_SEED_PATH = (
+    Path(__file__).resolve().parent.parent
+    / "pastpaper-scraper"
+    / "reviews"
+    / "COMP2211"
+    / "problem_seed.json"
+)
+
+
+def split_sections(text: str) -> tuple[str, dict[str, str]]:
+    matches = list(MARKER_RE.finditer(text))
+    if not matches:
+        return text.strip(), {}
+    intro = text[: matches[0].start()].strip()
+    sections: dict[str, str] = {}
+    for idx, match in enumerate(matches):
+        marker = match.group(1)
+        end = matches[idx + 1].start() if idx + 1 < len(matches) else len(text)
+        sections[marker] = text[match.start() : end].strip()
+    return intro, sections
+
+
+def extract_segment(text: str, path: tuple[str, ...]) -> str:
+    intro, sections = split_sections(text)
+    if not path:
+        return text.strip()
+    first = sections.get(path[0], "")
+    if not first:
+        return text.strip()
+    if len(path) == 1:
+        return "\n".join(part for part in [intro, first] if part).strip()
+    child_intro, child_sections = split_sections(first)
+    second = child_sections.get(path[1], "")
+    return "\n".join(part for part in [intro, child_intro, second] if part).strip()
+
+
+def extract_true_false_answers(answer_text: str) -> dict[str, str]:
+    answers: dict[str, str] = {}
+    matches = list(re.finditer(r"(?m)^\(([a-j])\)\s*\n?([TF])\b", answer_text))
+    for match in matches:
+        answers[match.group(1)] = match.group(2)
+    return answers
+
+
+def derive_correct_answer(answer_text: str) -> str | None:
+    if not answer_text:
+        return None
+    if "Answer:" in answer_text:
+        tail = answer_text.split("Answer:", 1)[1]
+    else:
+        tail = answer_text
+    lines = [line.strip() for line in tail.splitlines() if line.strip()]
+    if not lines:
+        return None
+    first = lines[0]
+    if first.lower().startswith("marking scheme"):
+        return None
+    if len(first) <= 240:
+        return first
+    return None
+
+
+def load_seed_rows() -> dict[str, dict]:
+    data = json.loads(PROBLEM_SEED_PATH.read_text())
+    return {
+        row["question_number"]: row
+        for row in data
+        if row["source_exam_key"] == EXAM_KEY
+    }
+
+
+def main() -> None:
+    sb = get_supabase()
+    paper = (
+        sb.table("papers")
+        .select("id")
+        .eq("source_exam_key", EXAM_KEY)
+        .execute()
+        .data[0]
+    )
+    paper_id = paper["id"]
+
+    current_rows = (
+        sb.table("paper_questions")
+        .select("*")
+        .eq("paper_id", paper_id)
+        .order("display_order")
+        .execute()
+        .data
+    )
+    existing_by_number = {row["question_number"]: row for row in current_rows}
+    parent_rows = load_seed_rows()
+    tf_answers = extract_true_false_answers(parent_rows["1"]["raw_answer_text"] or "")
+
+    inserts = []
+    for display_order, child in enumerate(CHILDREN, start=1):
+        parent = parent_rows[child.top_level_number]
+        existing = existing_by_number.get(child.question_number, {})
+        question_text = extract_segment(parent["question_text"] or "", child.path)
+        raw_answer_text = extract_segment(parent["raw_answer_text"] or "", child.path)
+
+        correct_option = None
+        correct_answer = None
+        options = None
+        if child.question_type == "true_false":
+            marker = child.path[0]
+            correct_option = tf_answers.get(marker)
+            options = TRUE_FALSE_OPTIONS
+        elif child.question_type == "fill_blank":
+            correct_answer = derive_correct_answer(raw_answer_text)
+
+        inserts.append(
+            {
+                "paper_id": paper_id,
+                "question_number": child.question_number,
+                "parent_question": child.parent_question,
+                "display_order": display_order,
+                "question_type": child.question_type,
+                "question_format": child.question_format,
+                "question_text": question_text,
+                "score": child.score,
+                "page_number": child.page_number,
+                "page_y_ratio": existing.get("page_y_ratio"),
+                "options": options,
+                "correct_option": correct_option,
+                "correct_answer": correct_answer,
+                "raw_answer_text": raw_answer_text,
+                "topics": existing.get("topics") or parent.get("topics"),
+                "topic_primary": existing.get("topic_primary") or parent.get("topic_primary"),
+                "analytics_topic": existing.get("analytics_topic") or parent.get("analytics_topic"),
+                "topic_tags": existing.get("topic_tags") or parent.get("topic_tags"),
+                "skill_tags": existing.get("skill_tags") or parent.get("skill_tags"),
+                "difficulty": existing.get("difficulty") or parent.get("difficulty"),
+                "knowledge_reminder": existing.get("knowledge_reminder", ""),
+                "ai_hint": existing.get("ai_hint", ""),
+                "solution": existing.get("solution", ""),
+            }
+        )
+
+    sb.table("paper_questions").delete().eq("paper_id", paper_id).execute()
+    sb.table("paper_questions").insert(inserts).execute()
+    sb.table("papers").update({"question_count": len(inserts), "status": "processing"}).eq("id", paper_id).execute()
+    print(f"Inserted {len(inserts)} rows for {EXAM_KEY}.")
+
+
+if __name__ == "__main__":
+    main()
--- a/backend/split_comp2211_2023_spring_midterm.py
+++ b/backend/split_comp2211_2023_spring_midterm.py
@@ -0,0 +1,268 @@
+"""Split COMP2211 Spring 2023 midterm into subquestions."""
+
+from __future__ import annotations
+
+import json
+import re
+from dataclasses import dataclass
+from pathlib import Path
+
+from app.services.supabase_client import get_supabase
+
+
+EXAM_KEY = "COMP2211-2023-spring-midterm"
+PROBLEM_SEED_PATH = (
+    Path(__file__).resolve().parent.parent
+    / "pastpaper-scraper"
+    / "reviews"
+    / "COMP2211"
+    / "problem_seed.json"
+)
+TRUE_FALSE_OPTIONS = [{"label": "True", "text": "True"}, {"label": "False", "text": "False"}]
+
+
+@dataclass(frozen=True)
+class ChildSpec:
+    question_number: str
+    parent_question: str
+    top_level_number: str
+    path: tuple[str, ...]
+    score: float
+    question_type: str
+    question_format: str | None = None
+    analytics_topic: str | None = None
+    topic_primary: str | None = None
+    topic_tags: tuple[str, ...] | None = None
+    skill_tags: tuple[str, ...] | None = None
+    options: tuple[tuple[str, str], ...] | None = None
+    correct_option: str | None = None
+    correct_answer: str | None = None
+    page_number: int = 1
+
+
+def short_answer(
+    question_number: str,
+    parent_question: str,
+    top_level_number: str,
+    path: tuple[str, ...],
+    score: float,
+    *,
+    analytics_topic: str | None = None,
+    topic_primary: str | None = None,
+    topic_tags: tuple[str, ...] | None = None,
+    skill_tags: tuple[str, ...] | None = None,
+    correct_answer: str | None = None,
+    page_number: int,
+) -> ChildSpec:
+    return ChildSpec(
+        question_number=question_number,
+        parent_question=parent_question,
+        top_level_number=top_level_number,
+        path=path,
+        score=score,
+        question_type="long_question",
+        question_format="short_answer",
+        analytics_topic=analytics_topic,
+        topic_primary=topic_primary,
+        topic_tags=topic_tags,
+        skill_tags=skill_tags,
+        correct_answer=correct_answer,
+        page_number=page_number,
+    )
+
+
+def mc(
+    question_number: str,
+    parent_question: str,
+    top_level_number: str,
+    path: tuple[str, ...],
+    score: float,
+    *,
+    options: tuple[tuple[str, str], ...],
+    correct_option: str,
+    analytics_topic: str,
+    skill_tags: tuple[str, ...],
+    page_number: int,
+) -> ChildSpec:
+    return ChildSpec(
+        question_number=question_number,
+        parent_question=parent_question,
+        top_level_number=top_level_number,
+        path=path,
+        score=score,
+        question_type="mc",
+        question_format="mc",
+        analytics_topic=analytics_topic,
+        topic_primary=analytics_topic,
+        topic_tags=(analytics_topic,),
+        skill_tags=skill_tags,
+        options=options,
+        correct_option=correct_option,
+        page_number=page_number,
+    )
+
+
+ABCDE = (("A", "A"), ("B", "B"), ("C", "C"), ("D", "D"), ("E", "E"))
+
+
+CHILDREN: list[ChildSpec] = [
+    ChildSpec("1a", "1", "1", ("a",), 1, "true_false", "true_false", "Probabilistic Models", "Probabilistic Models", ("Probabilistic Models",), ("concept_check", "classification_decision"), page_number=3),
+    ChildSpec("1b", "1", "1", ("b",), 1, "true_false", "true_false", "Probabilistic Models", "Probabilistic Models", ("Probabilistic Models",), ("concept_check", "classification_decision"), page_number=3),
+    ChildSpec("1c", "1", "1", ("c",), 1, "true_false", "true_false", "KNN and Clustering", "KNN and Clustering", ("KNN and Clustering",), ("concept_check", "algorithm_property"), page_number=3),
+    ChildSpec("1d", "1", "1", ("d",), 1, "true_false", "true_false", "KNN and Clustering", "KNN and Clustering", ("KNN and Clustering",), ("concept_check", "distance_reasoning"), page_number=3),
+    ChildSpec("1e", "1", "1", ("e",), 1, "true_false", "true_false", "Evaluation and Validation", "Evaluation and Validation", ("Evaluation and Validation",), ("concept_check", "validation_reasoning"), page_number=3),
+    ChildSpec("1f", "1", "1", ("f",), 1, "true_false", "true_false", "KNN and Clustering", "KNN and Clustering", ("KNN and Clustering",), ("concept_check", "algorithm_property"), page_number=3),
+    ChildSpec("1g", "1", "1", ("g",), 1, "true_false", "true_false", "KNN and Clustering", "KNN and Clustering", ("KNN and Clustering",), ("concept_check", "robustness_reasoning"), page_number=3),
+    ChildSpec("1h", "1", "1", ("h",), 1, "true_false", "true_false", "Perceptron and MLP", "Perceptron and MLP", ("Perceptron and MLP",), ("concept_check", "decision_boundary"), page_number=3),
+    ChildSpec("1i", "1", "1", ("i",), 1, "true_false", "true_false", "Perceptron and MLP", "Perceptron and MLP", ("Perceptron and MLP",), ("concept_check", "optimization_reasoning"), page_number=3),
+    ChildSpec("1j", "1", "1", ("j",), 1, "true_false", "true_false", "Perceptron and MLP", "Perceptron and MLP", ("Perceptron and MLP",), ("concept_check", "expressiveness_reasoning"), page_number=3),
+    short_answer("2a_i", "2a", "2", ("a", "i"), 1, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("code_tracing",), page_number=4),
+    short_answer("2a_ii", "2a", "2", ("a", "ii"), 1, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("code_tracing",), page_number=4),
+    short_answer("2a_iii", "2a", "2", ("a", "iii"), 1, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("code_tracing",), page_number=4),
+    short_answer("2a_iv", "2a", "2", ("a", "iv"), 1, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("code_tracing",), page_number=4),
+    short_answer("2a_v", "2a", "2", ("a", "v"), 1, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("indexing", "code_tracing"), page_number=4),
+    short_answer("2a_vi", "2a", "2", ("a", "vi"), 1, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("indexing", "error_reasoning"), page_number=5),
+    short_answer("2a_vii", "2a", "2", ("a", "vii"), 1, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("masking", "code_tracing"), page_number=5),
+    short_answer("2a_viii", "2a", "2", ("a", "viii"), 1, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("aggregation", "code_tracing"), page_number=5),
+    short_answer("2a_ix", "2a", "2", ("a", "ix"), 1, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("transpose", "code_tracing"), page_number=5),
+    short_answer("2b_i", "2b", "2", ("b", "i"), 2, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("broadcasting", "code_tracing"), page_number=6),
+    short_answer("2b_ii", "2b", "2", ("b", "ii"), 2, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("broadcasting", "error_reasoning"), page_number=6),
+    short_answer("2b_iii", "2b", "2", ("b", "iii"), 2, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("broadcasting", "code_tracing"), page_number=6),
+    ChildSpec("2c", "2", "2", ("c",), 6, "long_question", "coding", "Python Fundamentals", "Python Fundamentals", ("Python Fundamentals",), ("implementation", "vectorization", "geometry_reasoning"), page_number=7),
+    short_answer("3", "3", "3", (), 8, analytics_topic="Probabilistic Models", topic_primary="Probabilistic Models", topic_tags=("Probabilistic Models",), skill_tags=("concept_explanation", "missing_data_reasoning"), page_number=9),
+    ChildSpec("4a", "4", "4", ("a",), 8, "long_question", "long_answer", "KNN and Clustering", "KNN and Clustering", ("KNN and Clustering",), ("distance_calculation", "classification_decision"), page_number=10),
+    short_answer("4b", "4", "4", ("b",), 6, analytics_topic="KNN and Clustering", topic_primary="KNN and Clustering", topic_tags=("KNN and Clustering",), skill_tags=("distance_reasoning", "comparison"), page_number=11),
+    ChildSpec("5a", "5", "5", ("a",), 7, "long_question", "long_answer", "KNN and Clustering", "KNN and Clustering", ("KNN and Clustering",), ("distance_calculation", "algorithm_tracing"), page_number=12),
+    ChildSpec("5b", "5", "5", ("b",), 7, "long_question", "long_answer", "KNN and Clustering", "KNN and Clustering", ("KNN and Clustering",), ("centroid_update", "algorithm_tracing"), page_number=12),
+    short_answer("5c", "5", "5", ("c",), 5, analytics_topic="KNN and Clustering", topic_primary="KNN and Clustering", topic_tags=("KNN and Clustering",), skill_tags=("concept_explanation", "model_selection"), page_number=14),
+    short_answer("6a", "6", "6", ("a",), 2, analytics_topic="Perceptron and MLP", topic_primary="Perceptron and MLP", topic_tags=("Perceptron and MLP",), skill_tags=("convergence_reasoning",), page_number=15),
+    mc("6b", "6", "6", ("b",), 2, options=ABCDE, correct_option="D", analytics_topic="Perceptron and MLP", skill_tags=("generalization_reasoning",), page_number=15),
+    short_answer("6c", "6", "6", ("c",), 2, analytics_topic="Perceptron and MLP", topic_primary="Perceptron and MLP", topic_tags=("Perceptron and MLP",), skill_tags=("activation_reasoning",), page_number=16),
+    ChildSpec("6d", "6", "6", ("d",), 6, "long_question", "coding", "Perceptron and MLP", "Perceptron and MLP", ("Perceptron and MLP",), ("debugging", "implementation", "weight_update"), page_number=16),
+    short_answer("7a", "7", "7", ("a",), 4, analytics_topic="Perceptron and MLP", topic_primary="Perceptron and MLP", topic_tags=("Perceptron and MLP",), skill_tags=("decision_boundary", "linearity_reasoning"), page_number=18),
+    short_answer("7b", "7", "7", ("b",), 2, analytics_topic="Perceptron and MLP", topic_primary="Perceptron and MLP", topic_tags=("Perceptron and MLP",), skill_tags=("decision_boundary", "linearity_reasoning"), page_number=18),
+    ChildSpec("7c", "7", "7", ("c",), 10, "long_question", "long_answer", "Perceptron and MLP", "Perceptron and MLP", ("Perceptron and MLP",), ("architecture_reasoning", "parameter_design"), page_number=19),
+]
+
+
+MARKER_RE = re.compile(r"(?m)^\(([a-z]+|[ivx]+)\)\s*")
+
+
+def split_sections(text: str) -> tuple[str, dict[str, str]]:
+    matches = list(MARKER_RE.finditer(text))
+    if not matches:
+        return text.strip(), {}
+    intro = text[: matches[0].start()].strip()
+    sections: dict[str, str] = {}
+    for idx, match in enumerate(matches):
+        marker = match.group(1)
+        end = matches[idx + 1].start() if idx + 1 < len(matches) else len(text)
+        sections[marker] = text[match.start() : end].strip()
+    return intro, sections
+
+
+def extract_segment(text: str, path: tuple[str, ...]) -> str:
+    current = text.strip()
+    carried_intro: list[str] = []
+    for depth, marker in enumerate(path):
+        intro, sections = split_sections(current)
+        if depth == 0 and intro:
+            carried_intro.append(intro)
+        current = sections.get(marker, current)
+    return "\n".join(part for part in [*carried_intro, current] if part).strip()
+
+
+def extract_true_false_answers(answer_text: str) -> dict[str, str]:
+    answers: dict[str, str] = {}
+    matches = list(re.finditer(r"(?m)^\(([a-j])\)\s*\n?T\s*F", answer_text))
+    if matches:
+        return answers
+    for match in re.finditer(r"(?m)^\(([a-j])\)\s*\n?([TF])\b", answer_text):
+        answers[match.group(1)] = match.group(2)
+    if answers:
+        return answers
+    lines = [line.strip() for line in answer_text.splitlines() if line.strip()]
+    current = None
+    for line in lines:
+        m = re.fullmatch(r"\(([a-j])\)", line)
+        if m:
+            current = m.group(1)
+            continue
+        if current and line in {"T", "F"}:
+            answers[current] = line
+            current = None
+    return answers
+
+
+def load_seed_rows() -> dict[str, dict]:
+    data = json.loads(PROBLEM_SEED_PATH.read_text())
+    return {row["question_number"]: row for row in data if row["source_exam_key"] == EXAM_KEY}
+
+
+def main() -> None:
+    sb = get_supabase()
+    paper = sb.table("papers").select("id").eq("source_exam_key", EXAM_KEY).execute().data[0]
+    paper_id = paper["id"]
+    current_rows = (
+        sb.table("paper_questions")
+        .select("*")
+        .eq("paper_id", paper_id)
+        .order("display_order")
+        .execute()
+        .data
+    )
+    existing_by_number = {row["question_number"]: row for row in current_rows}
+    parent_rows = load_seed_rows()
+    tf_answers = extract_true_false_answers(parent_rows["1"]["raw_answer_text"] or "")
+
+    inserts = []
+    for display_order, child in enumerate(CHILDREN, start=1):
+        parent = parent_rows[child.top_level_number]
+        existing = existing_by_number.get(child.question_number, {})
+        question_text = extract_segment(parent["question_text"] or "", child.path)
+        raw_answer_text = extract_segment(parent["raw_answer_text"] or "", child.path) if child.path else (parent["raw_answer_text"] or "")
+
+        options = None
+        correct_option = child.correct_option
+        if child.options:
+            options = [{"label": label, "text": text} for label, text in child.options]
+        if child.question_type == "true_false":
+            options = TRUE_FALSE_OPTIONS
+            correct_option = tf_answers.get(child.path[0])
+
+        inserts.append(
+            {
+                "paper_id": paper_id,
+                "question_number": child.question_number,
+                "parent_question": child.parent_question,
+                "display_order": display_order,
+                "question_type": child.question_type,
+                "question_format": child.question_format,
+                "question_text": question_text,
+                "score": child.score,
+                "page_number": child.page_number,
+                "page_y_ratio": existing.get("page_y_ratio"),
+                "options": options,
+                "correct_option": correct_option,
+                "correct_answer": child.correct_answer,
+                "raw_answer_text": raw_answer_text,
+                "topics": existing.get("topics") or (list(child.topic_tags) if child.topic_tags else parent.get("topics")),
+                "topic_primary": existing.get("topic_primary") or child.topic_primary or parent.get("topic_primary"),
+                "analytics_topic": existing.get("analytics_topic") or child.analytics_topic or parent.get("analytics_topic"),
+                "topic_tags": existing.get("topic_tags") or (list(child.topic_tags) if child.topic_tags else parent.get("topic_tags")),
+                "skill_tags": existing.get("skill_tags") or (list(child.skill_tags) if child.skill_tags else parent.get("skill_tags")),
+                "difficulty": existing.get("difficulty") or parent.get("difficulty"),
+                "knowledge_reminder": existing.get("knowledge_reminder", ""),
+                "ai_hint": existing.get("ai_hint", ""),
+                "solution": existing.get("solution", ""),
+            }
+        )
+
+    sb.table("paper_questions").delete().eq("paper_id", paper_id).execute()
+    sb.table("paper_questions").insert(inserts).execute()
+    sb.table("papers").update({"question_count": len(inserts), "status": "processing"}).eq("id", paper_id).execute()
+    print(f"Inserted {len(inserts)} rows for {EXAM_KEY}.")
+
+
+if __name__ == "__main__":
+    main()
--- a/backend/split_comp2211_2024_spring_final.py
+++ b/backend/split_comp2211_2024_spring_final.py
@@ -0,0 +1,242 @@
+"""Split COMP2211 Spring 2024 final into subquestions."""
+
+from __future__ import annotations
+
+import json
+import re
+from dataclasses import dataclass
+from pathlib import Path
+
+from app.services.supabase_client import get_supabase
+
+
+EXAM_KEY = "COMP2211-2024-spring-final"
+PROBLEM_SEED_PATH = (
+    Path(__file__).resolve().parent.parent
+    / "pastpaper-scraper"
+    / "reviews"
+    / "COMP2211"
+    / "problem_seed.json"
+)
+TRUE_FALSE_OPTIONS = [{"label": "True", "text": "True"}, {"label": "False", "text": "False"}]
+
+
+@dataclass(frozen=True)
+class ChildSpec:
+    question_number: str
+    parent_question: str
+    top_level_number: str
+    path: tuple[str, ...]
+    score: float
+    question_type: str
+    question_format: str | None = None
+    analytics_topic: str | None = None
+    topic_primary: str | None = None
+    topic_tags: tuple[str, ...] | None = None
+    skill_tags: tuple[str, ...] | None = None
+    options: tuple[tuple[str, str], ...] | None = None
+    correct_option: str | None = None
+    correct_answer: str | None = None
+    page_number: int = 1
+
+
+def short_answer(
+    question_number: str,
+    parent_question: str,
+    top_level_number: str,
+    path: tuple[str, ...],
+    score: float,
+    *,
+    analytics_topic: str | None = None,
+    topic_primary: str | None = None,
+    topic_tags: tuple[str, ...] | None = None,
+    skill_tags: tuple[str, ...] | None = None,
+    correct_answer: str | None = None,
+    page_number: int,
+) -> ChildSpec:
+    return ChildSpec(
+        question_number=question_number,
+        parent_question=parent_question,
+        top_level_number=top_level_number,
+        path=path,
+        score=score,
+        question_type="long_question",
+        question_format="short_answer",
+        analytics_topic=analytics_topic,
+        topic_primary=topic_primary,
+        topic_tags=topic_tags,
+        skill_tags=skill_tags,
+        correct_answer=correct_answer,
+        page_number=page_number,
+    )
+
+
+CHILDREN: list[ChildSpec] = [
+    ChildSpec("1a", "1", "1", ("a",), 1, "true_false", "true_false", "Python Fundamentals", "Python Fundamentals", ("Python Fundamentals",), ("concept_check", "code_tracing"), page_number=2),
+    ChildSpec("1b", "1", "1", ("b",), 1, "true_false", "true_false", "Probabilistic Models", "Probabilistic Models", ("Probabilistic Models",), ("concept_check", "classification_decision"), page_number=2),
+    ChildSpec("1c", "1", "1", ("c",), 1, "true_false", "true_false", "KNN and Clustering", "KNN and Clustering", ("KNN and Clustering",), ("concept_check", "algorithm_property"), page_number=2),
+    ChildSpec("1d", "1", "1", ("d",), 1, "true_false", "true_false", "KNN and Clustering", "KNN and Clustering", ("KNN and Clustering",), ("concept_check", "algorithm_property"), page_number=2),
+    ChildSpec("1e", "1", "1", ("e",), 1, "true_false", "true_false", "Perceptron and MLP", "Perceptron and MLP", ("Perceptron and MLP",), ("concept_check", "activation_reasoning"), page_number=2),
+    ChildSpec("1f", "1", "1", ("f",), 1, "true_false", "true_false", "Vision and CNN", "Vision and CNN", ("Vision and CNN",), ("concept_check", "image_processing"), page_number=2),
+    ChildSpec("1g", "1", "1", ("g",), 1, "true_false", "true_false", "Vision and CNN", "Vision and CNN", ("Vision and CNN",), ("concept_check", "cnn_complexity"), page_number=2),
+    ChildSpec("1h", "1", "1", ("h",), 1, "true_false", "true_false", "Vision and CNN", "Vision and CNN", ("Vision and CNN",), ("concept_check", "regularization"), page_number=2),
+    ChildSpec("1i", "1", "1", ("i",), 1, "true_false", "true_false", "Search and Games", "Search and Games", ("Search and Games",), ("concept_check", "pruning_reasoning"), page_number=2),
+    ChildSpec("1j", "1", "1", ("j",), 1, "true_false", "true_false", "Ethics of AI", "Ethics of AI", ("Ethics of AI",), ("concept_check", "research_ethics"), page_number=2),
+    ChildSpec("2a", "2", "2", ("a",), 4, "long_question", "coding", "Python Fundamentals", "Python Fundamentals", ("Python Fundamentals",), ("implementation", "vectorization", "masking"), page_number=3),
+    ChildSpec("2b", "2", "2", ("b",), 6, "long_question", "coding", "Python Fundamentals", "Python Fundamentals", ("Python Fundamentals",), ("implementation", "convolution", "array_manipulation"), page_number=4),
+    short_answer("3a_i", "3a", "3", ("a", "i"), 1.5, analytics_topic="Probabilistic Models", topic_primary="Probabilistic Models", topic_tags=("Probabilistic Models",), skill_tags=("manual_computation", "probability_reasoning"), page_number=6),
+    short_answer("3a_ii", "3a", "3", ("a", "ii"), 1.5, analytics_topic="Probabilistic Models", topic_primary="Probabilistic Models", topic_tags=("Probabilistic Models",), skill_tags=("manual_computation", "probability_reasoning"), page_number=6),
+    short_answer("3a_iii", "3a", "3", ("a", "iii"), 1.5, analytics_topic="Probabilistic Models", topic_primary="Probabilistic Models", topic_tags=("Probabilistic Models",), skill_tags=("manual_computation", "probability_reasoning"), page_number=6),
+    short_answer("3a_iv", "3a", "3", ("a", "iv"), 1.5, analytics_topic="Probabilistic Models", topic_primary="Probabilistic Models", topic_tags=("Probabilistic Models",), skill_tags=("manual_computation", "probability_reasoning"), page_number=6),
+    short_answer("3b_i", "3b", "3", ("b", "i"), 1.5, analytics_topic="Evaluation and Validation", topic_primary="Evaluation and Validation", topic_tags=("Evaluation and Validation",), skill_tags=("validation_reasoning",), page_number=6),
+    short_answer("3b_ii", "3b", "3", ("b", "ii"), 1.5, analytics_topic="Evaluation and Validation", topic_primary="Evaluation and Validation", topic_tags=("Evaluation and Validation",), skill_tags=("validation_reasoning",), page_number=6),
+    short_answer("3b_iii", "3b", "3", ("b", "iii"), 1.5, analytics_topic="Evaluation and Validation", topic_primary="Evaluation and Validation", topic_tags=("Evaluation and Validation",), skill_tags=("validation_reasoning",), page_number=6),
+    short_answer("3c", "3", "3", ("c",), 1.5, analytics_topic="Perceptron and MLP", topic_primary="Perceptron and MLP", topic_tags=("Perceptron and MLP",), skill_tags=("linearity_reasoning", "classification_decision"), page_number=6),
+    short_answer("4a_i", "4a", "4", ("a", "i"), 2.5, analytics_topic="Perceptron and MLP", topic_primary="Perceptron and MLP", topic_tags=("Perceptron and MLP",), skill_tags=("parameter_counting",), page_number=7),
+    short_answer("4a_ii", "4a", "4", ("a", "ii"), 2.5, analytics_topic="Perceptron and MLP", topic_primary="Perceptron and MLP", topic_tags=("Perceptron and MLP",), skill_tags=("model_selection",), page_number=7),
+    short_answer("4b", "4", "4", ("b",), 1, analytics_topic="Perceptron and MLP", topic_primary="Perceptron and MLP", topic_tags=("Perceptron and MLP",), skill_tags=("concept_explanation",), page_number=7),
+    short_answer("4c", "4", "4", ("c",), 2, analytics_topic="Perceptron and MLP", topic_primary="Perceptron and MLP", topic_tags=("Perceptron and MLP",), skill_tags=("activation_reasoning", "optimization_reasoning"), page_number=7),
+    ChildSpec("4d_i", "4d", "4", ("d", "i"), 1.5, "long_question", "long_answer", "Perceptron and MLP", "Perceptron and MLP", ("Perceptron and MLP",), ("forward_pass", "activation_reasoning"), page_number=8),
+    ChildSpec("4d_ii", "4d", "4", ("d", "ii"), 1.5, "long_question", "long_answer", "Perceptron and MLP", "Perceptron and MLP", ("Perceptron and MLP",), ("backpropagation", "weight_update"), page_number=8),
+    ChildSpec("5a", "5", "5", ("a",), 4.5, "long_question", "long_answer", "Vision and CNN", "Vision and CNN", ("Vision and CNN",), ("histogram_reasoning", "image_transform"), page_number=9),
+    ChildSpec("5b", "5", "5", ("b",), 3, "long_question", "long_answer", "Vision and CNN", "Vision and CNN", ("Vision and CNN",), ("thresholding", "manual_computation"), page_number=10),
+    ChildSpec("5c", "5", "5", ("c",), 2, "long_question", "long_answer", "Vision and CNN", "Vision and CNN", ("Vision and CNN",), ("padding", "manual_construction"), page_number=10),
+    short_answer("5d_i", "5d", "5", ("d", "i"), 0.5, analytics_topic="Vision and CNN", topic_primary="Vision and CNN", topic_tags=("Vision and CNN",), skill_tags=("filter_effect_reasoning",), page_number=11),
+    short_answer("5d_ii", "5d", "5", ("d", "ii"), 0.5, analytics_topic="Vision and CNN", topic_primary="Vision and CNN", topic_tags=("Vision and CNN",), skill_tags=("filter_effect_reasoning",), page_number=11),
+    short_answer("5d_iii", "5d", "5", ("d", "iii"), 0.5, analytics_topic="Vision and CNN", topic_primary="Vision and CNN", topic_tags=("Vision and CNN",), skill_tags=("filter_effect_reasoning",), page_number=11),
+    short_answer("5e", "5", "5", ("e",), 2, analytics_topic="Vision and CNN", topic_primary="Vision and CNN", topic_tags=("Vision and CNN",), skill_tags=("concept_explanation", "local_vs_global"), page_number=11),
+    ChildSpec("6a", "6", "6", ("a",), 10, "long_question", "coding", "Vision and CNN", "Vision and CNN", ("Vision and CNN",), ("implementation", "convolution", "debugging"), page_number=12),
+    ChildSpec("6b", "6", "6", ("b",), 3, "long_question", "coding", "Vision and CNN", "Vision and CNN", ("Vision and CNN",), ("implementation", "regularization"), page_number=15),
+    short_answer("7a_i", "7a", "7", ("a", "i"), 1, analytics_topic="Vision and CNN", topic_primary="Vision and CNN", topic_tags=("Vision and CNN",), skill_tags=("cnn_architecture",), page_number=16),
+    short_answer("7a_ii", "7a", "7", ("a", "ii"), 4, analytics_topic="Vision and CNN", topic_primary="Vision and CNN", topic_tags=("Vision and CNN",), skill_tags=("shape_reasoning", "parameter_counting"), page_number=16),
+    short_answer("7a_iii", "7a", "7", ("a", "iii"), 3, analytics_topic="Vision and CNN", topic_primary="Vision and CNN", topic_tags=("Vision and CNN",), skill_tags=("overfitting", "regularization"), page_number=16),
+    ChildSpec("7b", "7", "7", ("b",), 5, "long_question", "long_answer", "Vision and CNN", "Vision and CNN", ("Vision and CNN",), ("manual_computation", "cnn_forward_pass"), page_number=17),
+    short_answer("7c_i", "7c", "7", ("c", "i"), 2, analytics_topic="Vision and CNN", topic_primary="Vision and CNN", topic_tags=("Vision and CNN",), skill_tags=("shape_reasoning", "3d_convolution"), page_number=17),
+    short_answer("7c_ii", "7c", "7", ("c", "ii"), 1.5, analytics_topic="Vision and CNN", topic_primary="Vision and CNN", topic_tags=("Vision and CNN",), skill_tags=("parameter_counting", "3d_convolution"), page_number=17),
+    short_answer("7c_iii", "7c", "7", ("c", "iii"), 1.5, analytics_topic="Vision and CNN", topic_primary="Vision and CNN", topic_tags=("Vision and CNN",), skill_tags=("parameter_counting", "3d_convolution"), page_number=17),
+    short_answer("8a_i", "8a", "8", ("a", "i"), 1, analytics_topic="Search and Games", topic_primary="Search and Games", topic_tags=("Search and Games",), skill_tags=("tree_search", "manual_tracing"), page_number=18),
+    short_answer("8a_ii", "8a", "8", ("a", "ii"), 3, analytics_topic="Search and Games", topic_primary="Search and Games", topic_tags=("Search and Games",), skill_tags=("pruning", "manual_tracing"), page_number=18),
+    short_answer("8a_iii", "8a", "8", ("a", "iii"), 1, analytics_topic="Search and Games", topic_primary="Search and Games", topic_tags=("Search and Games",), skill_tags=("game_reasoning",), page_number=18),
+    short_answer("8b_i", "8b", "8", ("b", "i"), 2.5, analytics_topic="Search and Games", topic_primary="Search and Games", topic_tags=("Search and Games",), skill_tags=("utility_reasoning",), page_number=18),
+    short_answer("8b_ii", "8b", "8", ("b", "ii"), 2.5, analytics_topic="Search and Games", topic_primary="Search and Games", topic_tags=("Search and Games",), skill_tags=("pruning_reasoning", "concept_explanation"), page_number=18),
+    short_answer("9", "9", "9", (), 3, analytics_topic="Ethics of AI", topic_primary="Ethics of AI", topic_tags=("Ethics of AI",), skill_tags=("concept_explanation", "governance"), page_number=19),
+]
+
+
+MARKER_RE = re.compile(r"(?m)^\(([a-z]+|[ivx]+)\)\s*")
+
+
+def split_sections(text: str) -> tuple[str, dict[str, str]]:
+    matches = list(MARKER_RE.finditer(text))
+    if not matches:
+        return text.strip(), {}
+    intro = text[: matches[0].start()].strip()
+    sections: dict[str, str] = {}
+    for idx, match in enumerate(matches):
+        marker = match.group(1)
+        end = matches[idx + 1].start() if idx + 1 < len(matches) else len(text)
+        sections[marker] = text[match.start() : end].strip()
+    return intro, sections
+
+
+def extract_segment(text: str, path: tuple[str, ...]) -> str:
+    if not path:
+        return text.strip()
+    current = text.strip()
+    carried_intro: list[str] = []
+    for depth, marker in enumerate(path):
+        intro, sections = split_sections(current)
+        if depth == 0 and intro:
+            carried_intro.append(intro)
+        current = sections.get(marker, current)
+    return "\n".join(part for part in [*carried_intro, current] if part).strip()
+
+
+def extract_true_false_answers(answer_text: str) -> dict[str, str]:
+    answers: dict[str, str] = {}
+    table_match = re.search(r"Answer\s+(T\s+F\s+T\s+F\s+F\s+T\s+F\s+F\s+F\s+T)", answer_text, re.S)
+    if table_match:
+        seq = re.findall(r"[TF]", table_match.group(1))
+        if len(seq) == 10:
+            for idx, val in enumerate(seq):
+                answers[chr(ord("a") + idx)] = val
+            return answers
+    seq = re.findall(r"\b([TF])\b", answer_text)
+    if len(seq) >= 10:
+        for idx, val in enumerate(seq[:10]):
+            answers[chr(ord("a") + idx)] = val
+    return answers
+
+
+def load_seed_rows() -> dict[str, dict]:
+    data = json.loads(PROBLEM_SEED_PATH.read_text())
+    return {row["question_number"]: row for row in data if row["source_exam_key"] == EXAM_KEY}
+
+
+def main() -> None:
+    sb = get_supabase()
+    paper = sb.table("papers").select("id").eq("source_exam_key", EXAM_KEY).execute().data[0]
+    paper_id = paper["id"]
+    current_rows = (
+        sb.table("paper_questions")
+        .select("*")
+        .eq("paper_id", paper_id)
+        .order("display_order")
+        .execute()
+        .data
+    )
+    existing_by_number = {row["question_number"]: row for row in current_rows}
+    parent_rows = load_seed_rows()
+    tf_answers = extract_true_false_answers(parent_rows["1"]["raw_answer_text"] or "")
+
+    inserts = []
+    for display_order, child in enumerate(CHILDREN, start=1):
+        parent = parent_rows[child.top_level_number]
+        existing = existing_by_number.get(child.question_number, {})
+        question_text = extract_segment(parent["question_text"] or "", child.path)
+        raw_answer_text = extract_segment(parent["raw_answer_text"] or "", child.path) if child.path else (parent["raw_answer_text"] or "")
+
+        options = None
+        correct_option = child.correct_option
+        if child.question_type == "true_false":
+            options = TRUE_FALSE_OPTIONS
+            correct_option = tf_answers.get(child.path[0])
+        elif child.options:
+            options = [{"label": label, "text": text} for label, text in child.options]
+
+        inserts.append(
+            {
+                "paper_id": paper_id,
+                "question_number": child.question_number,
+                "parent_question": child.parent_question,
+                "display_order": display_order,
+                "question_type": child.question_type,
+                "question_format": child.question_format,
+                "question_text": question_text,
+                "score": child.score,
+                "page_number": child.page_number,
+                "page_y_ratio": existing.get("page_y_ratio"),
+                "options": options,
+                "correct_option": correct_option,
+                "correct_answer": child.correct_answer,
+                "raw_answer_text": raw_answer_text,
+                "topics": existing.get("topics") or (list(child.topic_tags) if child.topic_tags else parent.get("topics")),
+                "topic_primary": existing.get("topic_primary") or child.topic_primary or parent.get("topic_primary"),
+                "analytics_topic": existing.get("analytics_topic") or child.analytics_topic or parent.get("analytics_topic"),
+                "topic_tags": existing.get("topic_tags") or (list(child.topic_tags) if child.topic_tags else parent.get("topic_tags")),
+                "skill_tags": existing.get("skill_tags") or (list(child.skill_tags) if child.skill_tags else parent.get("skill_tags")),
+                "difficulty": existing.get("difficulty") or parent.get("difficulty"),
+                "knowledge_reminder": existing.get("knowledge_reminder", ""),
+                "ai_hint": existing.get("ai_hint", ""),
+                "solution": existing.get("solution", ""),
+            }
+        )
+
+    sb.table("paper_questions").delete().eq("paper_id", paper_id).execute()
+    sb.table("paper_questions").insert(inserts).execute()
+    sb.table("papers").update({"question_count": len(inserts), "status": "processing"}).eq("id", paper_id).execute()
+    print(f"Inserted {len(inserts)} rows for {EXAM_KEY}.")
+
+
+if __name__ == "__main__":
+    main()
--- a/backend/split_comp2211_2024_spring_midterm.py
+++ b/backend/split_comp2211_2024_spring_midterm.py
@@ -0,0 +1,291 @@
+"""Rebuild COMP2211 Spring 2024 midterm into subquestions."""
+
+from __future__ import annotations
+
+import json
+import re
+from dataclasses import dataclass
+from pathlib import Path
+
+import fitz
+
+from app.services.supabase_client import get_supabase
+
+
+EXAM_KEY = "COMP2211-2024-spring-midterm"
+ROOT = Path(__file__).resolve().parent.parent
+QUESTION_PDF = ROOT / "pastpaper-scraper" / "papers" / "COMP2211" / "(COMP2211)[2024](s)midterm~=rcidkjgf^_82003.pdf"
+ANSWER_PDF = ROOT / "pastpaper-scraper" / "papers" / "COMP2211" / "(COMP2211)[2024](s)midterm~=ubrzkjmz^_90406.pdf"
+PROBLEM_SEED_PATH = ROOT / "pastpaper-scraper" / "reviews" / "COMP2211" / "problem_seed.json"
+TRUE_FALSE_OPTIONS = [{"label": "True", "text": "True"}, {"label": "False", "text": "False"}]
+
+
+@dataclass(frozen=True)
+class ChildSpec:
+    question_number: str
+    parent_question: str
+    top_level_number: str
+    path: tuple[str, ...]
+    score: float
+    question_type: str
+    question_format: str | None = None
+    analytics_topic: str | None = None
+    topic_primary: str | None = None
+    topic_tags: tuple[str, ...] | None = None
+    skill_tags: tuple[str, ...] | None = None
+    page_number: int = 1
+
+
+def short_answer(
+    question_number: str,
+    parent_question: str,
+    top_level_number: str,
+    path: tuple[str, ...],
+    score: float,
+    *,
+    analytics_topic: str | None = None,
+    topic_primary: str | None = None,
+    topic_tags: tuple[str, ...] | None = None,
+    skill_tags: tuple[str, ...] | None = None,
+    page_number: int,
+) -> ChildSpec:
+    return ChildSpec(
+        question_number=question_number,
+        parent_question=parent_question,
+        top_level_number=top_level_number,
+        path=path,
+        score=score,
+        question_type="long_question",
+        question_format="short_answer",
+        analytics_topic=analytics_topic,
+        topic_primary=topic_primary,
+        topic_tags=topic_tags,
+        skill_tags=skill_tags,
+        page_number=page_number,
+    )
+
+
+CHILDREN: list[ChildSpec] = [
+    ChildSpec("1a", "1", "1", ("a",), 0.5, "true_false", "true_false", "Python Fundamentals", "Python Fundamentals", ("Python Fundamentals",), ("concept_check", "code_tracing"), page_number=3),
+    ChildSpec("1b", "1", "1", ("b",), 0.5, "true_false", "true_false", "Python Fundamentals", "Python Fundamentals", ("Python Fundamentals",), ("concept_check", "broadcasting"), page_number=3),
+    ChildSpec("1c", "1", "1", ("c",), 0.5, "true_false", "true_false", "KNN and Clustering", "KNN and Clustering", ("KNN and Clustering",), ("concept_check", "algorithm_property"), page_number=3),
+    ChildSpec("1d", "1", "1", ("d",), 0.5, "true_false", "true_false", "KNN and Clustering", "KNN and Clustering", ("KNN and Clustering",), ("concept_check", "tie_reasoning"), page_number=3),
+    ChildSpec("1e", "1", "1", ("e",), 0.5, "true_false", "true_false", "Evaluation and Validation", "Evaluation and Validation", ("Evaluation and Validation",), ("concept_check", "cross_validation"), page_number=3),
+    ChildSpec("1f", "1", "1", ("f",), 0.5, "true_false", "true_false", "KNN and Clustering", "KNN and Clustering", ("KNN and Clustering",), ("concept_check", "clustering_property"), page_number=3),
+    ChildSpec("1g", "1", "1", ("g",), 0.5, "true_false", "true_false", "KNN and Clustering", "KNN and Clustering", ("KNN and Clustering",), ("concept_check", "robustness_reasoning"), page_number=3),
+    ChildSpec("1h", "1", "1", ("h",), 0.5, "true_false", "true_false", "Perceptron and MLP", "Perceptron and MLP", ("Perceptron and MLP",), ("concept_check", "decision_boundary"), page_number=3),
+    ChildSpec("1i", "1", "1", ("i",), 0.5, "true_false", "true_false", "Perceptron and MLP", "Perceptron and MLP", ("Perceptron and MLP",), ("concept_check", "optimization_reasoning"), page_number=3),
+    ChildSpec("1j", "1", "1", ("j",), 0.5, "true_false", "true_false", "KNN and Clustering", "KNN and Clustering", ("KNN and Clustering",), ("concept_check", "clustering_property"), page_number=3),
+    short_answer("2a_i", "2a", "2", ("a", "i"), 1, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("code_tracing",), page_number=4),
+    short_answer("2a_ii", "2a", "2", ("a", "ii"), 1, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("code_tracing",), page_number=4),
+    short_answer("2a_iii", "2a", "2", ("a", "iii"), 1, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("array_manipulation",), page_number=5),
+    short_answer("2a_iv", "2a", "2", ("a", "iv"), 1, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("array_construction",), page_number=5),
+    short_answer("2a_v", "2a", "2", ("a", "v"), 1, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("aggregation",), page_number=5),
+    short_answer("2a_vi", "2a", "2", ("a", "vi"), 1, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("transpose",), page_number=6),
+    short_answer("2a_vii", "2a", "2", ("a", "vii"), 1, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("matrix_multiplication",), page_number=6),
+    short_answer("2a_viii", "2a", "2", ("a", "viii"), 1, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("dot_product",), page_number=6),
+    short_answer("2a_ix", "2a", "2", ("a", "ix"), 1, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("broadcasting",), page_number=6),
+    short_answer("2a_x", "2a", "2", ("a", "x"), 1, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("error_reasoning",), page_number=7),
+    short_answer("2a_xi", "2a", "2", ("a", "xi"), 1, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("broadcasting",), page_number=7),
+    short_answer("2a_xii", "2a", "2", ("a", "xii"), 1, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("slicing",), page_number=7),
+    short_answer("2a_xiii", "2a", "2", ("a", "xiii"), 1, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("views_vs_copies",), page_number=7),
+    ChildSpec("2b", "2", "2", ("b",), 6, "long_question", "coding", "Python Fundamentals", "Python Fundamentals", ("Python Fundamentals",), ("implementation", "vectorization", "similarity_computation"), page_number=8),
+    ChildSpec("3a", "3", "3", ("a",), 5.5, "long_question", "long_answer", "Evaluation and Validation", "Evaluation and Validation", ("Evaluation and Validation",), ("manual_computation", "metric_reasoning"), page_number=10),
+    short_answer("3b", "3", "3", ("b",), 1, analytics_topic="Evaluation and Validation", topic_primary="Evaluation and Validation", topic_tags=("Evaluation and Validation",), skill_tags=("metric_reasoning",), page_number=11),
+    ChildSpec("3c", "3", "3", ("c",), 2.5, "long_question", "long_answer", "Evaluation and Validation", "Evaluation and Validation", ("Evaluation and Validation",), ("manual_computation", "metric_reasoning"), page_number=11),
+    short_answer("3d", "3", "3", ("d",), 1, analytics_topic="Evaluation and Validation", topic_primary="Evaluation and Validation", topic_tags=("Evaluation and Validation",), skill_tags=("metric_reasoning",), page_number=12),
+    ChildSpec("3e", "3", "3", ("e",), 6, "long_question", "coding", "Evaluation and Validation", "Evaluation and Validation", ("Evaluation and Validation",), ("implementation", "metrics", "vectorization"), page_number=12),
+    ChildSpec("4a", "4", "4", ("a",), 4, "long_question", "long_answer", "Probabilistic Models", "Probabilistic Models", ("Probabilistic Models",), ("manual_computation", "gaussian_nb"), page_number=15),
+    ChildSpec("4b", "4", "4", ("b",), 3, "long_question", "long_answer", "Probabilistic Models", "Probabilistic Models", ("Probabilistic Models",), ("manual_computation", "likelihood_reasoning"), page_number=15),
+    ChildSpec("4c", "4", "4", ("c",), 4, "long_question", "long_answer", "Probabilistic Models", "Probabilistic Models", ("Probabilistic Models",), ("laplace_smoothing", "likelihood_reasoning"), page_number=16),
+    short_answer("4d", "4", "4", ("d",), 2, analytics_topic="Probabilistic Models", topic_primary="Probabilistic Models", topic_tags=("Probabilistic Models",), skill_tags=("prior_reasoning",), page_number=17),
+    ChildSpec("4e", "4", "4", ("e",), 3, "long_question", "long_answer", "Probabilistic Models", "Probabilistic Models", ("Probabilistic Models",), ("posterior_reasoning", "classification_decision"), page_number=17),
+    ChildSpec("5a", "5", "5", ("a",), 3, "long_question", "long_answer", "KNN and Clustering", "KNN and Clustering", ("KNN and Clustering",), ("distance_calculation", "weighted_knn"), page_number=18),
+    ChildSpec("5b", "5", "5", ("b",), 13, "long_question", "long_answer", "KNN and Clustering", "KNN and Clustering", ("KNN and Clustering",), ("cross_validation", "manual_tracing", "model_selection"), page_number=18),
+    short_answer("5c", "5", "5", ("c",), 2, analytics_topic="KNN and Clustering", topic_primary="KNN and Clustering", topic_tags=("KNN and Clustering",), skill_tags=("test_error", "model_selection"), page_number=20),
+    ChildSpec("6a", "6", "6", ("a",), 6, "long_question", "long_answer", "KNN and Clustering", "KNN and Clustering", ("KNN and Clustering",), ("manual_computation", "clustering"), page_number=21),
+    ChildSpec("6b", "6", "6", ("b",), 6, "long_question", "long_answer", "KNN and Clustering", "KNN and Clustering", ("KNN and Clustering",), ("manual_computation", "clustering"), page_number=22),
+    short_answer("6c", "6", "6", ("c",), 2, analytics_topic="KNN and Clustering", topic_primary="KNN and Clustering", topic_tags=("KNN and Clustering",), skill_tags=("outlier_reasoning",), page_number=22),
+    short_answer("6d", "6", "6", ("d",), 2, analytics_topic="KNN and Clustering", topic_primary="KNN and Clustering", topic_tags=("KNN and Clustering",), skill_tags=("model_selection", "threshold_reasoning"), page_number=22),
+    ChildSpec("7", "7", "7", (), 10, "long_question", "long_answer", "Evaluation and Validation", "Evaluation and Validation", ("Evaluation and Validation",), ("cross_validation", "data_leakage_reasoning"), page_number=23),
+]
+
+
+MARKER_RE = re.compile(r"(?m)^\(([a-z]+|[ivx]+)\)\s*")
+
+
+def split_sections(text: str) -> tuple[str, dict[str, str]]:
+    matches = list(MARKER_RE.finditer(text))
+    if not matches:
+        return text.strip(), {}
+    intro = text[: matches[0].start()].strip()
+    sections: dict[str, str] = {}
+    for idx, match in enumerate(matches):
+        marker = match.group(1)
+        end = matches[idx + 1].start() if idx + 1 < len(matches) else len(text)
+        sections[marker] = text[match.start() : end].strip()
+    return intro, sections
+
+
+def extract_segment(text: str, path: tuple[str, ...]) -> str:
+    if not path:
+        return text.strip()
+    current = text.strip()
+    carried_intro: list[str] = []
+    for depth, marker in enumerate(path):
+        intro, sections = split_sections(current)
+        if depth == 0 and intro:
+            carried_intro.append(intro)
+        current = sections.get(marker, current)
+    return "\n".join(part for part in [*carried_intro, current] if part).strip()
+
+
+def extract_pages(pdf_path: Path, start: int, end: int) -> str:
+    doc = fitz.open(pdf_path)
+    try:
+        return "\n".join(doc[i].get_text("text") for i in range(start - 1, end))
+    finally:
+        doc.close()
+
+
+def load_seed_rows() -> dict[str, dict]:
+    data = json.loads(PROBLEM_SEED_PATH.read_text())
+    return {row["question_number"]: row for row in data if row["source_exam_key"] == EXAM_KEY}
+
+
+def build_source_rows(existing_rows: dict[str, dict]) -> dict[str, dict]:
+    seed_rows = load_seed_rows()
+    rows = dict(seed_rows)
+    if "5" in rows:
+        rows["5"] = {
+            **rows["5"],
+            "question_text": extract_pages(QUESTION_PDF, 18, 20),
+            "raw_answer_text": extract_pages(ANSWER_PDF, 21, 25),
+            "page_number": 18,
+            "analytics_topic": "KNN and Clustering",
+            "topic_primary": "KNN and Clustering",
+            "topic_tags": ["KNN and Clustering"],
+            "skill_tags": ["manual_computation", "distance_calculation", "algorithm_tracing"],
+            "difficulty": "medium",
+        }
+    else:
+        rows["5"] = {
+            **seed_rows["5"],
+            "question_text": extract_pages(QUESTION_PDF, 18, 20),
+            "raw_answer_text": extract_pages(ANSWER_PDF, 21, 25),
+            "page_number": 18,
+        }
+    if "7" in rows:
+        rows["7"] = {
+            **rows["7"],
+            "question_text": extract_pages(QUESTION_PDF, 23, 24),
+            "raw_answer_text": extract_pages(ANSWER_PDF, 31, 34),
+            "page_number": 23,
+            "analytics_topic": "Evaluation and Validation",
+            "topic_primary": "Evaluation and Validation",
+            "topic_tags": ["Evaluation and Validation"],
+            "skill_tags": ["cross_validation", "data_leakage_reasoning"],
+            "difficulty": "medium",
+        }
+    else:
+        rows["7"] = {
+            **seed_rows["7"],
+            "question_text": extract_pages(QUESTION_PDF, 23, 24),
+            "raw_answer_text": extract_pages(ANSWER_PDF, 31, 34),
+            "page_number": 23,
+        }
+    return rows
+
+
+def extract_true_false_answers(answer_text: str) -> dict[str, str]:
+    answers: dict[str, str] = {}
+    table_match = re.search(r"Answer\s+([TF\s]+)", answer_text, re.S)
+    if table_match:
+        seq = re.findall(r"[TF]", table_match.group(1))
+        if len(seq) >= 10:
+            for idx, val in enumerate(seq[:10]):
+                answers[chr(ord("a") + idx)] = val
+            return answers
+    lines = [line.strip() for line in answer_text.splitlines() if line.strip()]
+    current_letter: str | None = None
+    for line in lines:
+        m = re.fullmatch(r"\(([a-j])\)", line)
+        if m:
+            current_letter = m.group(1)
+            continue
+        if current_letter and line in {"T", "F"}:
+            answers[current_letter] = line
+            current_letter = None
+    if answers:
+        return answers
+    seq = re.findall(r"\b([TF])\b", answer_text)
+    if len(seq) >= 10:
+        for idx, val in enumerate(seq[:10]):
+            answers[chr(ord("a") + idx)] = val
+    return answers
+
+
+def main() -> None:
+    sb = get_supabase()
+    paper = sb.table("papers").select("id").eq("source_exam_key", EXAM_KEY).execute().data[0]
+    paper_id = paper["id"]
+    current_rows = (
+        sb.table("paper_questions")
+        .select("*")
+        .eq("paper_id", paper_id)
+        .order("display_order")
+        .execute()
+        .data
+    )
+    existing_by_number = {row["question_number"]: row for row in current_rows}
+    parent_rows = build_source_rows(existing_by_number)
+    tf_answers = extract_true_false_answers(parent_rows["1"]["raw_answer_text"] or "")
+
+    inserts = []
+    for display_order, child in enumerate(CHILDREN, start=1):
+        parent = parent_rows[child.top_level_number]
+        existing = existing_by_number.get(child.question_number, {})
+        question_text = extract_segment(parent["question_text"] or "", child.path)
+        raw_answer_text = extract_segment(parent["raw_answer_text"] or "", child.path) if child.path else (parent["raw_answer_text"] or "")
+        options = None
+        correct_option = None
+        if child.question_type == "true_false":
+            options = TRUE_FALSE_OPTIONS
+            correct_option = tf_answers.get(child.path[0])
+
+        inserts.append(
+            {
+                "paper_id": paper_id,
+                "question_number": child.question_number,
+                "parent_question": child.parent_question,
+                "display_order": display_order,
+                "question_type": child.question_type,
+                "question_format": child.question_format,
+                "question_text": question_text,
+                "score": child.score,
+                "page_number": child.page_number,
+                "page_y_ratio": existing.get("page_y_ratio"),
+                "options": options,
+                "correct_option": correct_option,
+                "correct_answer": None,
+                "raw_answer_text": raw_answer_text,
+                "topics": existing.get("topics") or (list(child.topic_tags) if child.topic_tags else parent.get("topics")),
+                "topic_primary": existing.get("topic_primary") or child.topic_primary or parent.get("topic_primary"),
+                "analytics_topic": existing.get("analytics_topic") or child.analytics_topic or parent.get("analytics_topic"),
+                "topic_tags": existing.get("topic_tags") or (list(child.topic_tags) if child.topic_tags else parent.get("topic_tags")),
+                "skill_tags": existing.get("skill_tags") or (list(child.skill_tags) if child.skill_tags else parent.get("skill_tags")),
+                "difficulty": existing.get("difficulty") or parent.get("difficulty"),
+                "knowledge_reminder": existing.get("knowledge_reminder", ""),
+                "ai_hint": existing.get("ai_hint", ""),
+                "solution": existing.get("solution", ""),
+            }
+        )
+
+    sb.table("paper_questions").delete().eq("paper_id", paper_id).execute()
+    sb.table("paper_questions").insert(inserts).execute()
+    sb.table("papers").update({"question_count": len(inserts), "status": "processing"}).eq("id", paper_id).execute()
+    print(f"Inserted {len(inserts)} rows for {EXAM_KEY}.")
+
+
+if __name__ == "__main__":
+    main()
--- a/backend/upload_course_library_pdfs.py
+++ b/backend/upload_course_library_pdfs.py
@@ -0,0 +1,121 @@
+"""Upload COMP2211 course-library PDFs to Supabase Storage.
+
+Run from the backend directory:
+    uv run python upload_course_library_pdfs.py
+
+Each entry maps a storage path (inside the `papers` bucket) to the local
+source file under pastpaper-scraper/papers/COMP2211/.
+"""
+
+from __future__ import annotations
+
+import sys
+from pathlib import Path
+
+# ---------------------------------------------------------------------------
+# Manifest: (storage_path, local_filename)
+# storage_path is relative inside the `papers` bucket.
+# local_filename is relative to PAPERS_DIR below.
+# ---------------------------------------------------------------------------
+MANIFEST: list[tuple[str, str]] = [
+    (
+        "course-library/COMP2211/COMP2211-2022-fall-midterm/paper.pdf",
+        "(COMP2211)[2022](f)midterm~=yjz8dxdd^_27002.pdf",
+    ),
+    (
+        "course-library/COMP2211/COMP2211-2022-fall-midterm/answer.pdf",
+        "(COMP2211)[2022](f)midterm~=yjz8dxdd^_18747.pdf",
+    ),
+    (
+        "course-library/COMP2211/COMP2211-2022-spring-midterm/paper.pdf",
+        "(COMP2211)[2022](s)midterm~=b8bidkgs^_14629.pdf",
+    ),
+    (
+        "course-library/COMP2211/COMP2211-2022-spring-midterm/answer.pdf",
+        "(COMP2211)[2022](s)midterm~=6ma030^_89587.pdf",
+    ),
+    (
+        "course-library/COMP2211/COMP2211-2022-spring-final-part-a/paper.pdf",
+        "(COMP2211)[2022](s)final~=b8bidkgs^_33018.pdf",
+    ),
+    (
+        "course-library/COMP2211/COMP2211-2022-spring-final-part-a/answer.pdf",
+        "(COMP2211)[2022](s)final~=ajou6^_82011.pdf",
+    ),
+    (
+        "course-library/COMP2211/COMP2211-2022-spring-final-part-b/paper.pdf",
+        "(COMP2211)[2022](s)final~=b8bidkgs^_40627.pdf",
+    ),
+    (
+        "course-library/COMP2211/COMP2211-2022-spring-final-part-b/answer.pdf",
+        "(COMP2211)[2022](s)final~=ajou6^_51199.pdf",
+    ),
+    (
+        "course-library/COMP2211/COMP2211-2023-spring-midterm/paper.pdf",
+        "(COMP2211)[2023](s)midterm~=bxbidkmj^_26587.pdf",
+    ),
+    (
+        "course-library/COMP2211/COMP2211-2023-spring-midterm/answer.pdf",
+        "(COMP2211)[2023](s)midterm~clchanbg^_17297.pdf",
+    ),
+    (
+        "course-library/COMP2211/COMP2211-2024-spring-midterm/paper.pdf",
+        "(COMP2211)[2024](s)midterm~=rcidkjgf^_82003.pdf",
+    ),
+    (
+        "course-library/COMP2211/COMP2211-2024-spring-midterm/answer.pdf",
+        "(COMP2211)[2024](s)midterm~=ubrzkjmz^_90406.pdf",
+    ),
+    (
+        "course-library/COMP2211/COMP2211-2024-spring-final/paper.pdf",
+        "(COMP2211)[2024](s)final~=igk5mmg^_90365.pdf",
+    ),
+    (
+        "course-library/COMP2211/COMP2211-2024-spring-final/answer.pdf",
+        "(COMP2211)[2024](s)final~=igk5mmg^_58857.pdf",
+    ),
+]
+
+PAPERS_DIR = (
+    Path(__file__).parent.parent
+    / "pastpaper-scraper"
+    / "papers"
+    / "COMP2211"
+)
+
+
+def main() -> None:
+    from app.services.supabase_client import get_supabase
+
+    sb = get_supabase()
+    bucket = sb.storage.from_("papers")
+
+    ok = 0
+    skipped = 0
+    failed = 0
+
+    for storage_path, local_name in MANIFEST:
+        local_file = PAPERS_DIR / local_name
+        if not local_file.exists():
+            print(f"  MISSING local file: {local_name}")
+            failed += 1
+            continue
+
+        data = local_file.read_bytes()
+        try:
+            bucket.upload(
+                storage_path,
+                data,
+                file_options={"content-type": "application/pdf", "upsert": "true"},
+            )
+            print(f"  OK  {storage_path}")
+            ok += 1
+        except Exception as exc:
+            print(f"  ERR {storage_path}: {exc}")
+            failed += 1
+
+    print(f"\nDone: {ok} uploaded, {skipped} skipped, {failed} failed.")
+
+
+if __name__ == "__main__":
+    main()
--- a/backend/uv.lock
+++ b/backend/uv.lock
--- a/deploy.md
+++ b/deploy.md
@@ -0,0 +1,92 @@
+# 部署到腾讯云
+
+## 1. 服务器准备
+
+```bash
+# SSH 登录后安装 Docker
+curl -fsSL https://get.docker.com | sh
+sudo systemctl enable docker && sudo systemctl start docker
+
+# 安装 docker-compose
+sudo curl -L "https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
+sudo chmod +x /usr/local/bin/docker-compose
+```
+
+## 2. 上传代码
+
+```bash
+# 本地打包（排除 node_modules 和 .venv）
+cd "/Users/soda/Desktop/PastPaper Master"
+tar --exclude='node_modules' --exclude='.venv' --exclude='__pycache__' --exclude='.git' \
+    -czf pastpaper.tar.gz .
+
+# 上传到服务器
+scp pastpaper.tar.gz root@<SERVER_IP>:/opt/pastpaper/
+
+# 服务器上解压
+ssh root@<SERVER_IP>
+cd /opt/pastpaper && tar xzf pastpaper.tar.gz
+```
+
+## 3. 配置环境变量
+
+```bash
+# 编辑 .env，确认所有 key 正确
+vi /opt/pastpaper/.env
+```
+
+需要的变量：
+- `SUPABASE_URL`, `SUPABASE_ANON_KEY`, `SUPABASE_SERVICE_ROLE_KEY`
+- `DASHSCOPE_BASE_URL`, `DASHSCOPE_API_KEY`
+- `DEEPSEEK_BASE_URL`, `DEEPSEEK_API_KEY`
+- `LAOZHANG_BASE_URL`, `LAOZHANG_API_KEY`（备用）
+- `GOOGLE_GEMINI_API_KEY`（如果服务器地区支持）
+
+## 4. 构建并启动
+
+```bash
+cd /opt/pastpaper
+docker-compose up -d --build
+```
+
+## 5. 验证
+
+```bash
+# 检查容器状态
+docker-compose ps
+
+# 检查后端健康
+curl http://localhost/health
+
+# 查看日志
+docker-compose logs -f backend
+docker-compose logs -f frontend
+```
+
+## 6. 域名 + HTTPS（可选）
+
+如果有域名，在腾讯云控制台配置 DNS → 服务器 IP，然后：
+
+```bash
+# 安装 certbot
+apt install -y certbot python3-certbot-nginx
+
+# 获取证书（先把 nginx.conf 里 server_name 改成你的域名）
+certbot --nginx -d your-domain.com
+```
+
+## 常用运维命令
+
+```bash
+# 重启
+docker-compose restart
+
+# 更新代码后重新构建
+docker-compose up -d --build
+
+# 查看后端日志
+docker-compose logs -f backend
+
+# 进入后端容器
+docker-compose exec backend bash
+```
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -0,0 +1,10 @@
+services:
+  backend:
+    build: ./backend
+    env_file: .env
+    ports:
+      - "8001:8000"
+    restart: unless-stopped
+    dns:
+      - 8.8.8.8
+      - 1.1.1.1
--- a/docs/PAGE_NUMBER_BACKFILL.md
+++ b/docs/PAGE_NUMBER_BACKFILL.md
@@ -0,0 +1,152 @@
+# Sub-question Page Number Backfill — Requirements
+
+## Problem
+
+All six `split_comp2211_*.py` scripts create sub-questions by inheriting `page_number`
+from their parent question:
+
+```python
+"page_number": parent.get("page_number"),
+```
+
+This is wrong for sub-questions that span multiple pages. For example, Q1 True/False
+has 10 statements (a–j); if (a)–(f) are on page 1 and (g)–(j) are on page 2, all ten
+inherit page 1 from the parent. Clicking Q1h in the UI scrolls to page 1 instead of page 2.
+
+## Goal
+
+Every `ChildSpec` in every split script should carry its own correct `page_number`.
+When the script runs, it writes that page number to the database instead of inheriting
+from the parent.
+
+## Files to modify
+
+```
+backend/split_comp2211_2022_fall_midterm.py      ← does not exist yet; parent is seed SQL
+backend/split_comp2211_2022_spring_midterm.py
+backend/split_comp2211_2022_spring_final_part_a.py
+backend/split_comp2211_2022_spring_final_part_b.py
+backend/split_comp2211_2023_spring_midterm.py
+backend/split_comp2211_2024_spring_midterm.py
+backend/split_comp2211_2024_spring_final.py
+```
+
+Note: `2022-fall-midterm` sub-questions were inserted directly via the seed SQL
+(`supabase/seeds/comp2211_problem_level_questions.sql`), not via a split script.
+Their page numbers must be fixed directly in that SQL file or via a separate UPDATE.
+
+## How to determine page numbers
+
+Use PyMuPDF (`import pymupdf` — already in the venv) to search for question markers
+in the local PDF files. The PDFs are at:
+
+```
+../pastpaper-scraper/papers/COMP2211/<filename>
+```
+
+Filename mapping (from `upload_course_library_pdfs.py`):
+
+| Exam key | Local paper PDF |
+|----------|----------------|
+| COMP2211-2022-fall-midterm | (COMP2211)[2022](f)midterm~=yjz8dxdd^_27002.pdf |
+| COMP2211-2022-spring-midterm | (COMP2211)[2022](s)midterm~=b8bidkgs^_14629.pdf |
+| COMP2211-2022-spring-final-part-a | (COMP2211)[2022](s)final~=b8bidkgs^_33018.pdf |
+| COMP2211-2022-spring-final-part-b | (COMP2211)[2022](s)final~=b8bidkgs^_40627.pdf |
+| COMP2211-2023-spring-midterm | (COMP2211)[2023](s)midterm~=bxbidkmj^_26587.pdf |
+| COMP2211-2024-spring-midterm | (COMP2211)[2024](s)midterm~=rcidkjgf^_82003.pdf |
+| COMP2211-2024-spring-final | (COMP2211)[2024](s)final~=igk5mmg^_90365.pdf |
+
+### Suggested search strategy
+
+```python
+import pymupdf
+
+doc = pymupdf.open("path/to/paper.pdf")
+for page_num, page in enumerate(doc, start=1):
+    text = page.get_text()
+    print(f"--- Page {page_num} ---")
+    print(text[:500])
+```
+
+Search for markers like:
+- `"(a)"`, `"(b)"`, ... for True/False sub-statements
+- `"Q2(a)"`, `"2(a)"`, `"Question 2"` for major sub-questions
+- `"(i)"`, `"(ii)"` for nested sub-questions
+
+Page numbers are 1-indexed (matching the `page_number` field in the database).
+
+## Code changes per split script
+
+### Step 1 — Add `page_number` field to `ChildSpec`
+
+Each script has its own `ChildSpec` dataclass. Add the field with a default so
+existing call sites don't break immediately:
+
+```python
+@dataclass(frozen=True)
+class ChildSpec:
+    ...
+    page_number: int = 1   # add this field
+```
+
+### Step 2 — Set correct page numbers in each `ChildSpec` instance
+
+Fill in the actual page after inspecting the PDF:
+
+```python
+ChildSpec("1a", "1", "1", ("a",), 1.5, "true_false", page_number=1),
+ChildSpec("1b", "1", "1", ("b",), 1.5, "true_false", page_number=1),
+...
+ChildSpec("1h", "1", "1", ("h",), 1.5, "true_false", page_number=2),
+```
+
+### Step 3 — Write `page_number` in the upsert payload
+
+Find where the script builds the INSERT/upsert dict and replace the inherited value:
+
+```python
+# Before:
+"page_number": parent.get("page_number"),
+
+# After:
+"page_number": child.page_number,
+```
+
+### Step 4 — Update existing rows in the database
+
+After modifying the scripts, run each script once — they already use upsert/update
+semantics, so re-running overwrites the old (inherited) page numbers with the correct ones.
+
+If a script does INSERT-only (not upsert), add a separate UPDATE pass:
+
+```python
+sb.table("paper_questions").update({"page_number": child.page_number}) \
+  .eq("paper_id", paper_id) \
+  .eq("question_number", child.question_number) \
+  .execute()
+```
+
+## 2022-fall-midterm (seed SQL)
+
+Sub-questions for this paper are in:
+`supabase/seeds/comp2211_problem_level_questions.sql`
+
+The seed has a `page_number` column in the VALUES rows. Find all rows for
+`COMP2211-2022-fall-midterm` and correct the values. Then run a direct UPDATE
+against the live database:
+
+```sql
+-- Example — adjust actual page numbers after inspecting the PDF
+UPDATE paper_questions
+SET page_number = 2
+WHERE paper_id = (SELECT id FROM papers WHERE source_exam_key = 'COMP2211-2022-fall-midterm')
+  AND question_number IN ('1g', '1h', '1i', '1j');
+```
+
+## Definition of Done
+
+- [ ] Every `ChildSpec` in every split script has an explicit `page_number`
+- [ ] No script uses `parent.get("page_number")` for the upsert payload
+- [ ] All six scripts have been re-run against the live database
+- [ ] 2022-fall-midterm sub-questions updated via SQL
+- [ ] Spot-check: clicking Q1h in a paper where Q1 spans 2 pages scrolls to page 2 in the UI
--- a/docs/TAGGING_REQUIREMENTS.md
+++ b/docs/TAGGING_REQUIREMENTS.md
@@ -0,0 +1,243 @@
+# Tag Schema & Similar Question Retrieval — Requirements
+
+## Background
+
+Current state of `paper_questions` tagging for COMP2211:
+
+- `analytics_topic`: 8 coarse buckets (e.g. "KNN and Clustering" covers both KNN and K-Means)
+- `topic_tags`: redundant copy of `analytics_topic`, adds no information
+- `skill_tags`: fine-grained snake_case labels (e.g. `centroid_update`, `distance_calculation`), not shown to users
+- `question_text`: at subquestion level, but currently stores **parent problem header text**, not the actual subquestion statement
+
+The result is that similar question retrieval conflates KNN and K-Means, cannot distinguish "write code" from "trace algorithm", and produces low-precision recommendations.
+
+---
+
+## Goal
+
+Every subquestion should carry enough structured metadata that the retrieval system can return **topically and skill-wise identical questions across different exam years**, rather than just questions from the same broad topic bucket.
+
+Precision target: a question on K-Means centroid update should retrieve other K-Means centroid update questions, not KNN distance questions.
+
+---
+
+## Field Definitions (revised)
+
+### `analytics_topic` — single string, primary retrieval bucket
+
+Granularity: **algorithm or concept level**, not course-section level.
+
+Allowed values for COMP2211 (replace current 8-bucket system):
+
+| New value | Replaces / splits |
+|-----------|-------------------|
+| `Naive Bayes` | Probabilistic Models (partial) |
+| `Bayesian Inference` | Probabilistic Models (partial) |
+| `KNN` | KNN and Clustering (partial) |
+| `K-Means` | KNN and Clustering (partial) |
+| `Perceptron` | Perceptron and MLP (partial) |
+| `MLP` | Perceptron and MLP (partial) |
+| `CNN` | Vision and CNN |
+| `Evaluation Metrics` | Evaluation and Validation (partial) |
+| `Cross Validation` | Evaluation and Validation (partial) |
+| `Python and NumPy` | Python Fundamentals |
+| `Search Algorithms` | Search and Games (partial) |
+| `Game Trees` | Search and Games (partial) |
+| `Ethics of AI` | Ethics of AI (unchanged) |
+
+Rules:
+- One value per question — pick the **most specific** algorithm being tested
+- If a subquestion genuinely spans two algorithms, pick the one being asked to compute/demonstrate
+- `True/False` is **not** a valid analytics_topic (it is a format, not a topic)
+
+---
+
+### `topic_tags` — string array, secondary topic labels
+
+Granularity: **concept and variant level** within the algorithm.
+
+Purpose: catch cross-topic overlaps and concept aliases.
+
+Examples:
+
+```
+analytics_topic = "K-Means"
+topic_tags = ["K-Means", "Centroid Update", "Convergence"]
+
+analytics_topic = "KNN"
+topic_tags = ["KNN", "Euclidean Distance", "Classification"]
+
+analytics_topic = "Naive Bayes"
+topic_tags = ["Naive Bayes", "Prior", "Likelihood", "Posterior"]
+
+analytics_topic = "Evaluation Metrics"
+topic_tags = ["Evaluation Metrics", "Precision", "Recall", "F1 Score"]
+
+analytics_topic = "MLP"
+topic_tags = ["MLP", "Backpropagation", "Activation Function", "Hidden Layer"]
+
+analytics_topic = "Python and NumPy"
+topic_tags = ["NumPy", "Broadcasting", "Array Indexing", "Vectorization"]
+```
+
+Rules:
+- First element should match or alias `analytics_topic`
+- Include concept names a student would search for ("F1 Score", not "metric_reasoning")
+- 2–5 tags per question; avoid over-tagging
+- Human-readable, title-case, no underscores
+
+---
+
+### `skill_tags` — string array, task type labels
+
+Granularity: **what the student must do**, not what the topic is.
+
+Current values are acceptable in meaning but must be converted to human-readable form.
+
+Rename convention: `snake_case` → `Title Case with spaces`
+
+| Old | New |
+|-----|-----|
+| `concept_check` | `Concept Check` |
+| `code_tracing` | `Code Tracing` |
+| `algorithm_tracing` | `Algorithm Tracing` |
+| `distance_calculation` | `Distance Calculation` |
+| `centroid_update` | `Centroid Update` |
+| `weight_update` | `Weight Update` |
+| `decision_boundary` | `Decision Boundary` |
+| `implementation` | `Implementation` |
+| `debugging` | `Debugging` |
+| `model_selection` | `Model Selection` |
+| `concept_explanation` | `Concept Explanation` |
+| `architecture_reasoning` | `Architecture Reasoning` |
+| `convergence_reasoning` | `Convergence Reasoning` |
+| `generalization_reasoning` | `Generalization Reasoning` |
+| `classification_decision` | `Classification Decision` |
+
+Rules:
+- 1–3 tags per question
+- Describes the **task type**, not the subject matter
+- These are used for retrieval ranking, not primary display
+
+---
+
+### `question_text` — the actual subquestion statement
+
+Current problem: subquestions store the **parent problem header** as `question_text`, not the individual statement.
+
+Required fix per subquestion type:
+
+| Type | What `question_text` should contain |
+|------|-------------------------------------|
+| True/False subquestion (Q1a–Q1j) | The specific T/F statement being judged |
+| Code output (Q2a_i–Q2a_v) | The specific code snippet + "What is the output?" |
+| Calculation subquestion (Q4a, Q5a) | The specific sub-task, e.g. "Compute the Euclidean distance between..." |
+| Written explanation (Q3, Q5c) | The full question prompt for that part |
+
+This is a **data extraction quality issue**. The backfill script must extract the correct per-subquestion text from the source PDF or from `raw_answer_text`.
+
+---
+
+## Backfill Requirements
+
+### Script: `backfill_comp2211_tags.py`
+
+Target: all `paper_questions` where `paper_id` in the COMP2211 course library.
+
+For each question:
+
+1. **Re-classify `analytics_topic`** using the new value list above
+   - Use `question_text` + existing `topic_tags` + `skill_tags` as signals
+   - If `analytics_topic` is currently `"KNN and Clustering"`:
+     - Look at `skill_tags` and `question_text`
+     - If `centroid_update`, `algorithm_tracing`, or text contains "K-Means" / "centroid" → set `"K-Means"`
+     - Otherwise → set `"KNN"`
+   - If `analytics_topic` is currently `"Perceptron and MLP"`:
+     - If `question_text` or `skill_tags` references hidden layer, backprop, activation function → `"MLP"`
+     - Otherwise → `"Perceptron"`
+   - If `analytics_topic` is currently `"Probabilistic Models"`:
+     - If Naive Bayes in text → `"Naive Bayes"`
+     - Otherwise → `"Bayesian Inference"`
+   - If `analytics_topic` is currently `"Evaluation and Validation"`:
+     - If cross-validation, train/val split in text → `"Cross Validation"`
+     - Otherwise → `"Evaluation Metrics"`
+   - If `analytics_topic` is currently `"Search and Games"`:
+     - If minimax, alpha-beta, game tree in text → `"Game Trees"`
+     - Otherwise → `"Search Algorithms"`
+
+2. **Rebuild `topic_tags`** — do not copy `analytics_topic`; derive from question content
+
+3. **Rename `skill_tags`** — convert all snake_case values to Title Case per the mapping table above
+
+4. **Do not overwrite `question_text`** in this pass (separate task)
+
+---
+
+## Retrieval Algorithm Changes (backend `questions.py`)
+
+### Separate topic and skill contributions
+
+Current `similarity_score()` merges `analytics_topic`, `topic_tags`, and `skill_tags` into one set. This causes skill tags like `centroid_update` to appear as "Shared topic: centroid_update" in the UI.
+
+Required split:
+
+```python
+def similarity_score(target, candidate):
+    score = 0
+    reasons = []
+
+    # 1. analytics_topic exact match: 40 pts
+    if target.get("analytics_topic") and target["analytics_topic"] == candidate.get("analytics_topic"):
+        score += 40
+        reasons.append(f"Same topic: {target['analytics_topic']}")
+
+    # 2. topic_tags overlap: up to 20 pts (10 per shared tag, max 2)
+    target_tt = set(t.lower() for t in (target.get("topic_tags") or []))
+    candidate_tt = set(t.lower() for t in (candidate.get("topic_tags") or []))
+    shared_tt = target_tt & candidate_tt
+    tt_pts = min(len(shared_tt) * 10, 20)
+    if tt_pts:
+        score += tt_pts
+        reasons.append(f"Shared concept: {', '.join(sorted(shared_tt)[:2])}")
+
+    # 3. skill_tags overlap: up to 20 pts (10 per shared tag, max 2)
+    target_st = set(t.lower() for t in (target.get("skill_tags") or []))
+    candidate_st = set(t.lower() for t in (candidate.get("skill_tags") or []))
+    shared_st = target_st & candidate_st
+    st_pts = min(len(shared_st) * 10, 20)
+    if st_pts:
+        score += st_pts
+        reasons.append(f"Shared skill: {', '.join(sorted(shared_st)[:2])}")
+
+    # 4. Same question format: 10 pts
+    if question_family(candidate) == question_family(target):
+        score += 10
+        reasons.append("Same format")
+
+    # 5. Same difficulty: 5 pts
+    if candidate.get("difficulty") and candidate["difficulty"] == target.get("difficulty"):
+        score += 5
+        reasons.append("Same difficulty")
+
+    # 6. Full-text similarity: up to 20 pts (from tsvector RPC)
+    # (injected externally, not computed here)
+
+    return min(score, 99), reasons
+```
+
+### Threshold and display
+
+- Filter: `match_percent < 20` (raised from 10; ensures analytics_topic at least partially matches)
+- UI display: show `match_reasons` chips, but replace snake_case with Title Case before display
+
+---
+
+## Definition of Done
+
+- [ ] All COMP2211 questions have `analytics_topic` from the new value list
+- [ ] No `analytics_topic` value of `"KNN and Clustering"`, `"Perceptron and MLP"`, `"Probabilistic Models"`, `"Evaluation and Validation"`, `"Search and Games"` remains
+- [ ] `topic_tags` contains 2–5 human-readable concept names, not a copy of `analytics_topic`
+- [ ] `skill_tags` values are Title Case with spaces
+- [ ] Similar question retrieval returns 0 cross-algorithm false positives between KNN and K-Means
+- [ ] `match_reasons` chips in the UI show no underscores
+- [ ] Retrieval threshold enforces `analytics_topic` match as a hard or near-hard requirement
--- a/frontend/Dockerfile
+++ b/frontend/Dockerfile
@@ -0,0 +1,12 @@
+FROM node:20-alpine AS build
+
+WORKDIR /app
+COPY package.json package-lock.json ./
+RUN npm ci
+COPY . .
+RUN npm run build
+
+FROM nginx:alpine
+COPY --from=build /app/dist /usr/share/nginx/html
+COPY nginx.conf /etc/nginx/conf.d/default.conf
+EXPOSE 80
--- a/frontend/index.html
+++ b/frontend/index.html
@@ -0,0 +1,13 @@
+<!DOCTYPE html>
+<html lang="zh-CN">
+  <head>
+    <meta charset="UTF-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+    <link rel="icon" type="image/jpeg" href="/favicon.jpg" />
+    <title>PastPaper Master</title>
+  </head>
+  <body>
+    <div id="root"></div>
+    <script type="module" src="/src/main.tsx"></script>
+  </body>
+</html>
--- a/frontend/nginx.conf
+++ b/frontend/nginx.conf
@@ -0,0 +1,27 @@
+server {
+    listen 80;
+    server_name pastpaper.knowit.top;
+
+    root /usr/share/nginx/html;
+    index index.html;
+
+    # SPA fallback
+    location / {
+        try_files $uri $uri/ /index.html;
+    }
+
+    # API proxy to backend
+    location /api/ {
+        proxy_pass http://backend:8000/api/;
+        proxy_set_header Host $host;
+        proxy_set_header X-Real-IP $remote_addr;
+        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
+        proxy_read_timeout 300s;
+        client_max_body_size 50M;
+    }
+
+    # Health check proxy
+    location /health {
+        proxy_pass http://backend:8000/health;
+    }
+}
--- a/frontend/package-lock.json
+++ b/frontend/package-lock.json
--- a/frontend/package.json
+++ b/frontend/package.json
@@ -0,0 +1,30 @@
+{
+  "name": "frontend",
+  "version": "1.0.0",
+  "description": "",
+  "type": "module",
+  "scripts": {
+    "dev": "vite",
+    "build": "tsc && vite build",
+    "preview": "vite preview"
+  },
+  "dependencies": {
+    "@supabase/supabase-js": "^2.103.0",
+    "katex": "^0.16.38",
+    "pdfjs-dist": "^5.5.207",
+    "react": "^19.2.4",
+    "react-dom": "^19.2.4",
+    "react-pdf": "^10.4.1",
+    "react-router-dom": "^7.13.1"
+  },
+  "devDependencies": {
+    "@tailwindcss/vite": "^4.2.1",
+    "@types/katex": "^0.16.8",
+    "@types/react": "^19.2.14",
+    "@types/react-dom": "^19.2.3",
+    "@vitejs/plugin-react": "^4.7.0",
+    "tailwindcss": "^4.2.1",
+    "typescript": "^5.9.3",
+    "vite": "^7.3.1"
+  }
+}
--- a/frontend/public/favicon.jpg
+++ b/frontend/public/favicon.jpg
--- a/frontend/src/App.tsx
+++ b/frontend/src/App.tsx
@@ -0,0 +1,30 @@
+import { Navigate, Routes, Route } from "react-router-dom";
+import { useAuth } from "./contexts/AuthContext";
+import ProcessingBanner from "./components/layout/ProcessingBanner";
+import LoginPage from "./pages/LoginPage";
+import HomePage from "./pages/HomePage";
+import UploadPage from "./pages/UploadPage";
+import WorkbenchPage from "./pages/WorkbenchPage";
+import ErrorBookPage from "./pages/ErrorBookPage";
+import AnalyticsPage from "./pages/AnalyticsPage";
+
+export default function App() {
+  const { session, loading } = useAuth();
+
+  if (loading) return <div className="min-h-screen bg-gray-50 flex items-center justify-center"><div className="text-gray-400 text-sm">Loading...</div></div>;
+
+  return (
+    <>
+    <ProcessingBanner />
+    <Routes>
+      <Route path="/login" element={session ? <Navigate to="/" replace /> : <LoginPage />} />
+      <Route path="/" element={<HomePage />} />
+      <Route path="/upload" element={<UploadPage />} />
+      <Route path="/paper/:id" element={<WorkbenchPage />} />
+      <Route path="/error-book" element={<ErrorBookPage />} />
+      <Route path="/analytics" element={<AnalyticsPage />} />
+      <Route path="/analytics/:courseCode" element={<AnalyticsPage />} />
+    </Routes>
+    </>
+  );
+}
--- a/frontend/src/components/layout/Header.tsx
+++ b/frontend/src/components/layout/Header.tsx
@@ -0,0 +1,69 @@
+import { Link } from "react-router-dom";
+import { useAuth } from "@/contexts/AuthContext";
+
+export default function Header({
+  courseCode,
+  paperTitle,
+}: {
+  courseCode?: string;
+  paperTitle?: string;
+}) {
+  const { user, signOut } = useAuth();
+
+  return (
+    <header className="h-14 border-b border-gray-200 bg-white flex items-center px-6 shrink-0">
+      <Link to="/" className="text-lg font-bold text-blue-600 mr-6">
+        PastPaper Master
+      </Link>
+      {courseCode && (
+        <div className="flex items-center gap-2 text-sm text-gray-600">
+          <span className="bg-blue-50 text-blue-700 px-2 py-0.5 rounded font-medium">
+            {courseCode}
+          </span>
+          {paperTitle && <span>{paperTitle}</span>}
+          <Link
+            to={`/analytics/${courseCode}`}
+            className="ml-2 flex items-center gap-1 px-2.5 py-1 text-xs font-medium text-indigo-600 bg-indigo-50 rounded hover:bg-indigo-100 transition-colors"
+          >
+            <svg className="w-3 h-3" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
+              <path strokeLinecap="round" strokeLinejoin="round" d="M3 13.125C3 12.504 3.504 12 4.125 12h2.25c.621 0 1.125.504 1.125 1.125v6.75C7.5 20.496 6.996 21 6.375 21h-2.25A1.125 1.125 0 013 19.875v-6.75zM9.75 8.625c0-.621.504-1.125 1.125-1.125h2.25c.621 0 1.125.504 1.125 1.125v11.25c0 .621-.504 1.125-1.125 1.125h-2.25a1.125 1.125 0 01-1.125-1.125V8.625zM16.5 4.125c0-.621.504-1.125 1.125-1.125h2.25C20.496 3 21 3.504 21 4.125v15.75c0 .621-.504 1.125-1.125 1.125h-2.25a1.125 1.125 0 01-1.125-1.125V4.125z" />
+            </svg>
+            AI Analytics
+          </Link>
+        </div>
+      )}
+      <div className="ml-auto flex items-center gap-4 text-sm">
+        <Link to="/" className="text-gray-500 hover:text-gray-800">
+          My Papers
+        </Link>
+        <Link to="/error-book" className="text-gray-500 hover:text-gray-800">
+          Error Book
+        </Link>
+        <Link to="/analytics" className="text-gray-500 hover:text-gray-800">
+          Analytics
+        </Link>
+        <Link to="/upload" className="text-blue-600 hover:text-blue-800 font-medium">
+          Upload
+        </Link>
+        {user ? (
+          <div className="flex items-center gap-3 pl-4 border-l border-gray-200">
+            <span className="text-xs text-gray-400">{user.email}</span>
+            <button
+              onClick={signOut}
+              className="text-xs text-gray-500 hover:text-gray-800 px-2 py-1 rounded hover:bg-gray-100"
+            >
+              Sign out
+            </button>
+          </div>
+        ) : (
+          <Link
+            to="/login"
+            className="text-sm text-blue-600 hover:text-blue-800 font-medium pl-4 border-l border-gray-200"
+          >
+            Sign in
+          </Link>
+        )}
+      </div>
+    </header>
+  );
+}
--- a/frontend/src/components/layout/ProcessingBanner.tsx
+++ b/frontend/src/components/layout/ProcessingBanner.tsx
@@ -0,0 +1,183 @@
+import { useEffect, useRef, useState } from "react";
+import { Link } from "react-router-dom";
+import { myPapers } from "@/lib/api";
+import { useAuth } from "@/contexts/AuthContext";
+import type { Paper } from "@/types/api";
+
+interface Notification {
+  paperId: string;
+  label: string;
+}
+
+const POLL_MS = 4000;
+
+export default function ProcessingBanner() {
+  const { user } = useAuth();
+  const [processing, setProcessing] = useState<Paper[]>([]);
+  const [doneNotifs, setDoneNotifs] = useState<Notification[]>([]);
+  const [expanded, setExpanded] = useState(false);
+  const knownIds = useRef<Set<string>>(new Set());
+
+  // Drag state
+  const [pos, setPos] = useState({ x: window.innerWidth - 220, y: 24 });
+  const dragging = useRef(false);
+  const dragOffset = useRef({ x: 0, y: 0 });
+  const widgetRef = useRef<HTMLDivElement>(null);
+
+  useEffect(() => {
+    if (!user) return;
+    let cancelled = false;
+
+    const poll = async () => {
+      try {
+        const papers = await myPapers();
+        if (cancelled) return;
+
+        const inProgress = papers.filter((p) => p.status === "processing" || p.status === "uploaded");
+        setProcessing(inProgress);
+
+        papers
+          .filter((p) => p.status === "ready" && knownIds.current.has(p.id))
+          .forEach((p) => {
+            knownIds.current.delete(p.id);
+            const label = `${p.course_code} ${p.year} ${p.term} ${p.exam_type}`;
+            setDoneNotifs((prev) => [...prev, { paperId: p.id, label }]);
+            setTimeout(() => {
+              setDoneNotifs((prev) => prev.filter((n) => n.paperId !== p.id));
+            }, 8000);
+          });
+
+        inProgress.forEach((p) => knownIds.current.add(p.id));
+      } catch {
+        // silent
+      }
+    };
+
+    poll();
+    const interval = setInterval(poll, POLL_MS);
+    return () => { cancelled = true; clearInterval(interval); };
+  }, [user]);
+
+  // Drag handlers
+  const onMouseDown = (e: React.MouseEvent) => {
+    // Only drag on the header bar
+    dragging.current = true;
+    dragOffset.current = {
+      x: e.clientX - pos.x,
+      y: e.clientY - pos.y,
+    };
+    e.preventDefault();
+  };
+
+  useEffect(() => {
+    const onMouseMove = (e: MouseEvent) => {
+      if (!dragging.current) return;
+      setPos({
+        x: Math.max(0, Math.min(window.innerWidth - 200, e.clientX - dragOffset.current.x)),
+        y: Math.max(0, Math.min(window.innerHeight - 60, e.clientY - dragOffset.current.y)),
+      });
+    };
+    const onMouseUp = () => { dragging.current = false; };
+    window.addEventListener("mousemove", onMouseMove);
+    window.addEventListener("mouseup", onMouseUp);
+    return () => {
+      window.removeEventListener("mousemove", onMouseMove);
+      window.removeEventListener("mouseup", onMouseUp);
+    };
+  }, []);
+
+  if (!user || (processing.length === 0 && doneNotifs.length === 0)) return null;
+
+  const total = processing.length + doneNotifs.length;
+
+  return (
+    <div
+      ref={widgetRef}
+      className="fixed z-50 select-none"
+      style={{ left: pos.x, top: pos.y }}
+    >
+      {/* ── Header / collapsed pill ── */}
+      <div
+        onMouseDown={onMouseDown}
+        onClick={() => setExpanded((v) => !v)}
+        className="flex items-center gap-2 bg-gray-900 text-white text-xs px-3.5 py-2.5 rounded-xl shadow-lg cursor-grab active:cursor-grabbing"
+        style={{ minWidth: 180 }}
+      >
+        <span className="w-3 h-3 border-2 border-white border-t-transparent rounded-full animate-spin shrink-0" />
+        <span className="flex-1 font-medium">
+          {processing.length > 0
+            ? `${processing.length} processing…`
+            : `${doneNotifs.length} ready`}
+        </span>
+        {doneNotifs.length > 0 && (
+          <span className="w-4 h-4 flex items-center justify-center bg-green-500 rounded-full text-[10px] font-bold shrink-0">
+            {doneNotifs.length}
+          </span>
+        )}
+        <span className="text-gray-400 text-[10px] shrink-0">{expanded ? "▲" : "▼"}</span>
+      </div>
+
+      {/* ── Expanded panel ── */}
+      {expanded && (
+        <div className="mt-1.5 flex flex-col gap-1.5" style={{ minWidth: 240 }}>
+          {processing.map((p) => {
+            const step = p.processing_step;
+            const progress = p.processing_progress || 0;
+            const total = p.processing_total || 0;
+            const pct = total > 0 ? Math.round((progress / total) * 100) : 0;
+            return (
+              <div
+                key={p.id}
+                className="bg-gray-900 text-white text-xs px-3.5 py-2.5 rounded-xl shadow-lg"
+              >
+                <div className="flex items-center gap-2.5 mb-1.5">
+                  <span className="w-3 h-3 border-2 border-white border-t-transparent rounded-full animate-spin shrink-0" />
+                  <span className="truncate">
+                    <span className="font-semibold">{p.course_code}</span>{" "}
+                    {p.year} {p.term} {p.exam_type}
+                  </span>
+                </div>
+                {step && (
+                  <>
+                    <div className="text-[10px] text-gray-400 mb-1 truncate">{step}</div>
+                    {total > 0 && (
+                      <div className="h-1.5 bg-gray-700 rounded-full overflow-hidden">
+                        <div className="h-full bg-blue-400 rounded-full transition-all duration-500" style={{ width: `${pct}%` }} />
+                      </div>
+                    )}
+                  </>
+                )}
+              </div>
+            );
+          })}
+
+          {doneNotifs.map((n) => (
+            <div
+              key={n.paperId}
+              className="flex items-center gap-2.5 bg-green-600 text-white text-xs px-3.5 py-2.5 rounded-xl shadow-lg"
+            >
+              <span className="text-sm leading-none">✓</span>
+              <span className="flex-1 truncate font-semibold">{n.label}</span>
+              <Link
+                to={`/paper/${n.paperId}`}
+                className="shrink-0 underline font-semibold hover:text-green-100"
+                onClick={(e) => e.stopPropagation()}
+              >
+                Open →
+              </Link>
+              <button
+                onClick={(e) => {
+                  e.stopPropagation();
+                  setDoneNotifs((prev) => prev.filter((x) => x.paperId !== n.paperId));
+                }}
+                className="shrink-0 text-green-200 hover:text-white"
+              >
+                ×
+              </button>
+            </div>
+          ))}
+        </div>
+      )}
+    </div>
+  );
+}
--- a/frontend/src/components/shared/CollapsibleSection.tsx
+++ b/frontend/src/components/shared/CollapsibleSection.tsx
@@ -0,0 +1,65 @@
+import { useState } from "react";
+
+const schemes = {
+  blue: {
+    border: "border-blue-200",
+    bg: "bg-blue-50",
+    text: "text-blue-800",
+    icon: "text-blue-500",
+  },
+  amber: {
+    border: "border-amber-200",
+    bg: "bg-amber-50",
+    text: "text-amber-800",
+    icon: "text-amber-500",
+  },
+  green: {
+    border: "border-green-200",
+    bg: "bg-green-50",
+    text: "text-green-800",
+    icon: "text-green-500",
+  },
+} as const;
+
+export default function CollapsibleSection({
+  title,
+  colorScheme,
+  defaultOpen = false,
+  children,
+}: {
+  title: string;
+  colorScheme: keyof typeof schemes;
+  defaultOpen?: boolean;
+  children: React.ReactNode;
+}) {
+  const [isOpen, setIsOpen] = useState(defaultOpen);
+  const s = schemes[colorScheme];
+
+  return (
+    <div className={`rounded-lg border ${s.border} mb-3`}>
+      <button
+        onClick={() => setIsOpen(!isOpen)}
+        className={`w-full flex items-center justify-between p-3 rounded-t-lg ${s.bg} cursor-pointer`}
+      >
+        <span className={`font-semibold text-sm ${s.text}`}>{title}</span>
+        <svg
+          className={`w-4 h-4 ${s.icon} transition-transform duration-200 ${isOpen ? "rotate-180" : ""}`}
+          fill="none"
+          viewBox="0 0 24 24"
+          stroke="currentColor"
+          strokeWidth={2}
+        >
+          <path strokeLinecap="round" strokeLinejoin="round" d="M19 9l-7 7-7-7" />
+        </svg>
+      </button>
+      <div
+        className="grid transition-[grid-template-rows] duration-300 ease-in-out"
+        style={{ gridTemplateRows: isOpen ? "1fr" : "0fr" }}
+      >
+        <div className="overflow-hidden">
+          <div className="p-3">{children}</div>
+        </div>
+      </div>
+    </div>
+  );
+}
--- a/frontend/src/components/shared/KaTeXRenderer.tsx
+++ b/frontend/src/components/shared/KaTeXRenderer.tsx
@@ -0,0 +1,86 @@
+import { useMemo } from "react";
+import katex from "katex";
+
+/**
+ * Pre-render all LaTeX in an HTML string at the string level,
+ * then set innerHTML. This avoids DOM-based auto-render issues
+ * where delimiters get split across text nodes or special chars
+ * like # cause silent failures.
+ */
+function renderLatexInString(html: string): string {
+  // Strip <code class="latex"> and <pre class="latex"> wrappers
+  let s = html
+    .replace(/<code[^>]*class="latex"[^>]*>(.*?)<\/code>/gs, "$1")
+    .replace(/<pre[^>]*class="latex"[^>]*>(.*?)<\/pre>/gs, "$1");
+
+  // 1) Render display math: $$...$$ and \[...\]
+  s = s.replace(/\$\$([\s\S]+?)\$\$/g, (_match, tex: string) => {
+    return renderTex(tex.trim(), true);
+  });
+  s = s.replace(/\\\[([\s\S]+?)\\\]/g, (_match, tex: string) => {
+    return renderTex(tex.trim(), true);
+  });
+
+  // 2) Render inline math: $...$ and \(...\)
+  //    Negative lookbehind for \ to avoid matching \$ escapes
+  //    Also avoid matching $$ (already handled above)
+  s = s.replace(/(?<![\\$])\$(?!\$)((?:[^$\\]|\\.)+?)\$/g, (_match, tex: string) => {
+    return renderTex(tex, false);
+  });
+  s = s.replace(/\\\(([\s\S]+?)\\\)/g, (_match, tex: string) => {
+    return renderTex(tex, false);
+  });
+
+  return s;
+}
+
+function decodeHtmlEntities(s: string): string {
+  return s
+    .replace(/&amp;/g, "&")
+    .replace(/&lt;/g, "<")
+    .replace(/&gt;/g, ">")
+    .replace(/&quot;/g, '"')
+    .replace(/&#39;/g, "'")
+    .replace(/&nbsp;/g, " ");
+}
+
+function renderTex(tex: string, displayMode: boolean): string {
+  // Decode HTML entities that might appear in DB-sourced HTML
+  let cleaned = decodeHtmlEntities(tex);
+  // Sanitize common issues that cause KaTeX to fail:
+  // 1) # and % inside \text{} — escape them
+  cleaned = cleaned.replace(/\\text\{([^}]*)\}/g, (_m, inner: string) => {
+    return "\\text{" + inner.replace(/#/g, "\\#").replace(/%/g, "\\%") + "}";
+  });
+  // 2) Standalone # outside \text{} in math — escape it
+  cleaned = cleaned.replace(/(?<!\\)#(?!\\)/g, "\\#");
+
+  try {
+    return katex.renderToString(cleaned, {
+      displayMode,
+      throwOnError: false,
+      trust: true,
+      strict: false,
+    });
+  } catch {
+    // Fallback: show the raw LaTeX in a styled span
+    return `<span class="katex-error" style="color:#E11D48;font-size:0.85em">${tex}</span>`;
+  }
+}
+
+export default function KaTeXRenderer({
+  html,
+  className,
+}: {
+  html: string;
+  className?: string;
+}) {
+  const rendered = useMemo(() => renderLatexInString(html), [html]);
+
+  return (
+    <div
+      className={`kb-html-content text-sm ${className ?? ""}`}
+      dangerouslySetInnerHTML={{ __html: rendered }}
+    />
+  );
+}
--- a/frontend/src/components/shared/StatusBadge.tsx
+++ b/frontend/src/components/shared/StatusBadge.tsx
@@ -0,0 +1,15 @@
+const statusConfig = {
+  uploaded: { label: "Uploaded", bg: "bg-gray-100", text: "text-gray-600" },
+  processing: { label: "Processing...", bg: "bg-blue-100", text: "text-blue-700" },
+  ready: { label: "Ready", bg: "bg-green-100", text: "text-green-700" },
+  error: { label: "Error", bg: "bg-red-100", text: "text-red-700" },
+} as const;
+
+export default function StatusBadge({ status }: { status: string }) {
+  const config = statusConfig[status as keyof typeof statusConfig] ?? statusConfig.uploaded;
+  return (
+    <span className={`inline-block px-2 py-0.5 rounded-full text-xs font-medium ${config.bg} ${config.text}`}>
+      {config.label}
+    </span>
+  );
+}
--- a/frontend/src/components/upload/FilePickerField.tsx
+++ b/frontend/src/components/upload/FilePickerField.tsx
@@ -0,0 +1,63 @@
+import { useRef, useState } from "react";
+
+export default function FilePickerField({
+  label,
+  required,
+  file,
+  onFileChange,
+}: {
+  label: string;
+  required?: boolean;
+  file: File | null;
+  onFileChange: (file: File | null) => void;
+}) {
+  const inputRef = useRef<HTMLInputElement>(null);
+  const [isDragging, setIsDragging] = useState(false);
+
+  const handleDrop = (e: React.DragEvent) => {
+    e.preventDefault();
+    setIsDragging(false);
+    const f = e.dataTransfer.files[0];
+    if (f?.type === "application/pdf") onFileChange(f);
+  };
+
+  return (
+    <div>
+      <label className="block text-sm font-medium text-gray-700 mb-1">
+        {label} {required && <span className="text-red-500">*</span>}
+      </label>
+      <div
+        className={`border-2 border-dashed rounded-lg p-6 text-center cursor-pointer transition-colors
+          ${isDragging ? "border-blue-400 bg-blue-50" : "border-gray-300 hover:border-gray-400"}`}
+        onClick={() => inputRef.current?.click()}
+        onDragOver={(e) => { e.preventDefault(); setIsDragging(true); }}
+        onDragLeave={() => setIsDragging(false)}
+        onDrop={handleDrop}
+      >
+        <input
+          ref={inputRef}
+          type="file"
+          accept=".pdf"
+          className="hidden"
+          onChange={(e) => onFileChange(e.target.files?.[0] ?? null)}
+        />
+        {file ? (
+          <div className="flex items-center justify-center gap-2">
+            <span className="text-blue-600 font-medium text-sm">{file.name}</span>
+            <button
+              type="button"
+              onClick={(e) => { e.stopPropagation(); onFileChange(null); }}
+              className="text-gray-400 hover:text-red-500 text-xs"
+            >
+              Remove
+            </button>
+          </div>
+        ) : (
+          <div className="text-gray-400 text-sm">
+            Click or drag PDF file here
+          </div>
+        )}
+      </div>
+    </div>
+  );
+}
--- a/frontend/src/components/upload/UploadForm.tsx
+++ b/frontend/src/components/upload/UploadForm.tsx
@@ -0,0 +1,184 @@
+import { useState, useCallback } from "react";
+import { useNavigate } from "react-router-dom";
+import { uploadPaper } from "@/lib/api";
+import FilePickerField from "./FilePickerField";
+
+/** Try to extract course code, year, term, exam type from filename */
+function parseFilename(name: string): {
+  courseCode?: string;
+  year?: number;
+  term?: string;
+  examType?: string;
+} {
+  const result: ReturnType<typeof parseFilename> = {};
+
+  // Remove extension
+  const base = name.replace(/\.[^.]+$/, "").replace(/[_\-]+/g, " ");
+
+  // Course code: 2-4 uppercase letters + 4 digits + optional letter (e.g. COMP2211, MATH1014H)
+  const courseMatch = base.match(/([A-Za-z]{2,4}\s*\d{4}[A-Za-z]?)/i);
+  if (courseMatch) {
+    result.courseCode = courseMatch[1].replace(/\s/g, "").toUpperCase();
+  }
+
+  // Year: 4-digit (2019-2029) or 2-digit (19-29)
+  const year4 = base.match(/\b(20[1-2]\d)\b/);
+  if (year4) {
+    result.year = Number(year4[1]);
+  } else {
+    const year2 = base.match(/\b(\d{2})\b/);
+    if (year2) {
+      const y = Number(year2[1]);
+      if (y >= 15 && y <= 29) result.year = 2000 + y;
+    }
+  }
+
+  // Term
+  const lower = base.toLowerCase();
+  if (/spring|spr/i.test(lower)) result.term = "spring";
+  else if (/fall|aut/i.test(lower)) result.term = "fall";
+  else if (/summer|sum/i.test(lower)) result.term = "summer";
+
+  // Exam type
+  if (/mid/i.test(lower)) result.examType = "midterm";
+  else if (/final|fin/i.test(lower)) result.examType = "final";
+  else if (/quiz/i.test(lower)) result.examType = "quiz";
+
+  return result;
+}
+
+export default function UploadForm() {
+  const navigate = useNavigate();
+  const [paperFile, setPaperFile] = useState<File | null>(null);
+  const [answerFile, setAnswerFile] = useState<File | null>(null);
+  const [courseCode, setCourseCode] = useState("");
+  const [year, setYear] = useState(new Date().getFullYear());
+  const [term, setTerm] = useState("fall");
+  const [examType, setExamType] = useState("midterm");
+  const [submitting, setSubmitting] = useState(false);
+  const [error, setError] = useState<string | null>(null);
+  const [autoFilled, setAutoFilled] = useState(false);
+
+  const handlePaperFile = useCallback((file: File | null) => {
+    setPaperFile(file);
+    if (!file) { setAutoFilled(false); return; }
+
+    const parsed = parseFilename(file.name);
+    const filled: string[] = [];
+
+    if (parsed.courseCode) { setCourseCode(parsed.courseCode); filled.push("course"); }
+    if (parsed.year) { setYear(parsed.year); filled.push("year"); }
+    if (parsed.term) { setTerm(parsed.term); filled.push("term"); }
+    if (parsed.examType) { setExamType(parsed.examType); filled.push("type"); }
+
+    setAutoFilled(filled.length > 0);
+  }, []);
+
+  const handleSubmit = async (e: React.FormEvent) => {
+    e.preventDefault();
+    if (!paperFile || !courseCode) return;
+
+    setSubmitting(true);
+    setError(null);
+
+    try {
+      const fd = new FormData();
+      fd.append("paper_file", paperFile);
+      if (answerFile) fd.append("answer_file", answerFile);
+      fd.append("course_code", courseCode);
+      fd.append("year", String(year));
+      fd.append("term", term);
+      fd.append("exam_type", examType);
+
+      const result = await uploadPaper(fd);
+      navigate(`/paper/${result.paper_id}`);
+    } catch (err) {
+      setError(err instanceof Error ? err.message : "Upload failed");
+      setSubmitting(false);
+    }
+  };
+
+  return (
+    <form onSubmit={handleSubmit} className="max-w-lg mx-auto space-y-5">
+      <FilePickerField
+        label="Paper PDF"
+        required
+        file={paperFile}
+        onFileChange={handlePaperFile}
+      />
+      {autoFilled && (
+        <div className="text-xs text-green-600 bg-green-50 px-3 py-1.5 rounded-lg -mt-3">
+          Auto-filled from filename — please verify below
+        </div>
+      )}
+      <FilePickerField
+        label="Answer / Solution PDF (optional)"
+        file={answerFile}
+        onFileChange={setAnswerFile}
+      />
+
+      <div>
+        <label className="block text-sm font-medium text-gray-700 mb-1">
+          Course Code <span className="text-red-500">*</span>
+        </label>
+        <input
+          type="text"
+          value={courseCode}
+          onChange={(e) => setCourseCode(e.target.value.toUpperCase())}
+          placeholder="e.g. COMP2011"
+          className="w-full border border-gray-300 rounded-lg px-3 py-2 text-sm focus:outline-none focus:ring-2 focus:ring-blue-500"
+          required
+        />
+      </div>
+
+      <div className="grid grid-cols-3 gap-3">
+        <div>
+          <label className="block text-sm font-medium text-gray-700 mb-1">Year</label>
+          <input
+            type="number"
+            value={year}
+            onChange={(e) => setYear(Number(e.target.value))}
+            className="w-full border border-gray-300 rounded-lg px-3 py-2 text-sm focus:outline-none focus:ring-2 focus:ring-blue-500"
+          />
+        </div>
+        <div>
+          <label className="block text-sm font-medium text-gray-700 mb-1">Term</label>
+          <select
+            value={term}
+            onChange={(e) => setTerm(e.target.value)}
+            className="w-full border border-gray-300 rounded-lg px-3 py-2 text-sm focus:outline-none focus:ring-2 focus:ring-blue-500"
+          >
+            <option value="fall">Fall</option>
+            <option value="spring">Spring</option>
+            <option value="summer">Summer</option>
+          </select>
+        </div>
+        <div>
+          <label className="block text-sm font-medium text-gray-700 mb-1">Exam Type</label>
+          <select
+            value={examType}
+            onChange={(e) => setExamType(e.target.value)}
+            className="w-full border border-gray-300 rounded-lg px-3 py-2 text-sm focus:outline-none focus:ring-2 focus:ring-blue-500"
+          >
+            <option value="midterm">Midterm</option>
+            <option value="final">Final</option>
+            <option value="quiz">Quiz</option>
+          </select>
+        </div>
+      </div>
+
+      {error && (
+        <div className="text-red-600 text-sm bg-red-50 p-3 rounded-lg">{error}</div>
+      )}
+
+      <button
+        type="submit"
+        disabled={!paperFile || !courseCode || submitting}
+        className="w-full bg-blue-600 text-white py-2.5 rounded-lg font-medium text-sm
+          hover:bg-blue-700 disabled:opacity-50 disabled:cursor-not-allowed transition-colors"
+      >
+        {submitting ? "Uploading..." : "Upload & Analyze"}
+      </button>
+    </form>
+  );
+}
--- a/frontend/src/components/workbench/ActionBar.tsx
+++ b/frontend/src/components/workbench/ActionBar.tsx
@@ -0,0 +1,58 @@
+import type { Question } from "@/types/api";
+
+export default function ActionBar({
+  question,
+  onGenerateVariant,
+  isGenerating,
+  onPhotoOpen,
+  answerState,
+}: {
+  question: Question | null;
+  onGenerateVariant: () => void;
+  isGenerating: boolean;
+  onPhotoOpen: () => void;
+  answerState?: "correct" | "wrong" | null;
+}) {
+  if (!question) return null;
+
+  const isLong = question.question_type === "long_question" || question.question_type === "long_answer" || question.question_type === "coding";
+
+  return (
+    <div className="border-t border-gray-200 bg-white px-4 py-3 shrink-0 space-y-2">
+      {/* Answer state feedback (for non-long questions, driven by QuestionDetail) */}
+      {answerState && (
+        <div className={`text-center text-sm font-medium py-1.5 rounded-lg ${
+          answerState === "correct"
+            ? "bg-green-50 text-green-600"
+            : "bg-red-50 text-red-600"
+        }`}>
+          {answerState === "correct" ? "Correct!" : "Added to error book"}
+        </div>
+      )}
+
+      {/* Long question: Upload handwritten answer */}
+      {isLong && (
+        <button
+          onClick={onPhotoOpen}
+          className="w-full py-2.5 rounded-lg text-sm font-medium bg-blue-600 text-white hover:bg-blue-700 transition-colors"
+        >
+          Upload handwritten answer
+        </button>
+      )}
+
+      {/* Generate variant — always available */}
+      <button
+        onClick={onGenerateVariant}
+        disabled={isGenerating}
+        className="w-full py-2 rounded-lg text-sm font-medium bg-purple-50 text-purple-700 border border-purple-200 hover:bg-purple-100 disabled:opacity-50 transition-colors"
+      >
+        {isGenerating ? (
+          <span className="flex items-center justify-center gap-2">
+            <span className="w-3 h-3 border-2 border-purple-600 border-t-transparent rounded-full animate-spin" />
+            Generating...
+          </span>
+        ) : "Generate Variant"}
+      </button>
+    </div>
+  );
+}
--- a/frontend/src/components/workbench/AiTrioPanel.tsx
+++ b/frontend/src/components/workbench/AiTrioPanel.tsx
@@ -0,0 +1,21 @@
+import type { Question } from "@/types/api";
+import CollapsibleSection from "@/components/shared/CollapsibleSection";
+import KaTeXRenderer from "@/components/shared/KaTeXRenderer";
+
+export default function AiTrioPanel({ question }: { question: Question }) {
+  return (
+    <div>
+      <CollapsibleSection title="Knowledge Reminder" colorScheme="blue" defaultOpen>
+        <KaTeXRenderer html={question.knowledge_reminder} />
+      </CollapsibleSection>
+
+      <CollapsibleSection title="AI Hint" colorScheme="amber">
+        <KaTeXRenderer html={question.ai_hint} />
+      </CollapsibleSection>
+
+      <CollapsibleSection title="Solution" colorScheme="green">
+        <KaTeXRenderer html={question.solution} />
+      </CollapsibleSection>
+    </div>
+  );
+}
--- a/frontend/src/components/workbench/PdfViewer.tsx
+++ b/frontend/src/components/workbench/PdfViewer.tsx
@@ -0,0 +1,170 @@
+import { useState, useRef, useEffect, useCallback } from "react";
+import { Document, Page, pdfjs } from "react-pdf";
+import "react-pdf/dist/Page/AnnotationLayer.css";
+import "react-pdf/dist/Page/TextLayer.css";
+
+pdfjs.GlobalWorkerOptions.workerSrc = `https://unpkg.com/pdfjs-dist@${pdfjs.version}/build/pdf.worker.min.mjs`;
+
+export default function PdfViewer({
+  fileUrl,
+  currentPage,
+  onPageChange,
+}: {
+  fileUrl: string;
+  currentPage?: number;
+  onPageChange?: (page: number) => void;
+}) {
+  const [numPages, setNumPages] = useState(0);
+  const [containerWidth, setContainerWidth] = useState(0);
+  const containerRef = useRef<HTMLDivElement>(null);
+  const scrollRef = useRef<HTMLDivElement>(null);
+  const pageRefs = useRef<Map<number, HTMLDivElement>>(new Map());
+  const [jumpPage, setJumpPage] = useState("");
+  const programmaticScroll = useRef(false);
+
+  // Resize observer for container width
+  useEffect(() => {
+    if (!containerRef.current) return;
+    const observer = new ResizeObserver((entries) => {
+      setContainerWidth(entries[0].contentRect.width);
+    });
+    observer.observe(containerRef.current);
+    return () => observer.disconnect();
+  }, []);
+
+  // Scroll to page when currentPage changes (programmatic)
+  useEffect(() => {
+    if (!currentPage || currentPage < 1) return;
+    const el = pageRefs.current.get(currentPage);
+    if (el) {
+      programmaticScroll.current = true;
+      el.scrollIntoView({ behavior: "smooth", block: "start" });
+      setTimeout(() => { programmaticScroll.current = false; }, 2000);
+    }
+  }, [currentPage]);
+
+  // IntersectionObserver to detect visible page on user scroll
+  useEffect(() => {
+    if (numPages === 0 || !scrollRef.current) return;
+
+    const visiblePages = new Map<number, number>();
+
+    const observer = new IntersectionObserver(
+      (entries) => {
+        for (const entry of entries) {
+          const pageNum = Number(entry.target.getAttribute("data-page"));
+          if (entry.isIntersecting) {
+            visiblePages.set(pageNum, entry.intersectionRatio);
+          } else {
+            visiblePages.delete(pageNum);
+          }
+        }
+
+        // Don't fire callback during programmatic scroll
+        if (programmaticScroll.current) return;
+
+        // Find the page with the highest visibility ratio
+        let bestPage = 0;
+        let bestRatio = 0;
+        for (const [page, ratio] of visiblePages) {
+          if (ratio > bestRatio) {
+            bestRatio = ratio;
+            bestPage = page;
+          }
+        }
+        if (bestPage > 0) {
+          onPageChange?.(bestPage);
+        }
+      },
+      {
+        root: scrollRef.current,
+        threshold: [0, 0.25, 0.5, 0.75, 1],
+      },
+    );
+
+    for (const [, el] of pageRefs.current) {
+      observer.observe(el);
+    }
+
+    return () => observer.disconnect();
+  }, [numPages, onPageChange]);
+
+  const setPageRef = useCallback((pageNum: number, el: HTMLDivElement | null) => {
+    if (el) {
+      el.setAttribute("data-page", String(pageNum));
+      pageRefs.current.set(pageNum, el);
+    } else {
+      pageRefs.current.delete(pageNum);
+    }
+  }, []);
+
+  const handleJump = () => {
+    const p = parseInt(jumpPage, 10);
+    if (p >= 1 && p <= numPages) {
+      const el = pageRefs.current.get(p);
+      el?.scrollIntoView({ behavior: "smooth", block: "start" });
+    }
+    setJumpPage("");
+  };
+
+  return (
+    <div ref={containerRef} className="h-full flex flex-col bg-gray-100">
+      {/* Page controls */}
+      <div className="flex items-center justify-center gap-3 py-2 bg-white border-b border-gray-200 text-sm shrink-0">
+        <span className="text-gray-600">
+          {numPages} pages
+        </span>
+        <span className="text-gray-300">|</span>
+        <span className="text-gray-600">
+          Go to{" "}
+          <input
+            type="number"
+            value={jumpPage}
+            onChange={(e) => setJumpPage(e.target.value)}
+            onKeyDown={(e) => { if (e.key === "Enter") handleJump(); }}
+            placeholder="#"
+            className="w-12 text-center border border-gray-300 rounded px-1 py-0.5 text-sm"
+            min={1}
+            max={numPages}
+          />
+        </span>
+      </div>
+
+      {/* All pages scrollable */}
+      <div ref={scrollRef} className="flex-1 overflow-auto">
+        <Document
+          file={fileUrl}
+          onLoadSuccess={({ numPages: n }) => setNumPages(n)}
+          loading={
+            <div className="flex items-center justify-center h-64 text-gray-400">
+              Loading PDF...
+            </div>
+          }
+          error={
+            <div className="flex items-center justify-center h-64 text-red-400">
+              Failed to load PDF
+            </div>
+          }
+        >
+          {numPages > 0 &&
+            Array.from({ length: numPages }, (_, i) => i + 1).map((pageNum) => (
+              <div
+                key={pageNum}
+                ref={(el) => setPageRef(pageNum, el)}
+                className="flex justify-center mb-2"
+              >
+                <div className="bg-white shadow-sm">
+                  <Page
+                    pageNumber={pageNum}
+                    width={containerWidth > 0 ? containerWidth - 48 : undefined}
+                    renderAnnotationLayer
+                    renderTextLayer
+                  />
+                </div>
+              </div>
+            ))}
+        </Document>
+      </div>
+    </div>
+  );
+}
--- a/frontend/src/components/workbench/PhotoUpload.tsx
+++ b/frontend/src/components/workbench/PhotoUpload.tsx
@@ -0,0 +1,90 @@
+import { useState, useRef } from "react";
+import { uploadPhoto } from "@/lib/api";
+import type { UserAttempt } from "@/types/api";
+
+export default function PhotoUpload({
+  questionId,
+  onClose,
+  onSubmitted,
+}: {
+  questionId: string;
+  onClose: () => void;
+  onSubmitted: (promise: Promise<{ attempt: UserAttempt; ocr_text: string; grade: { is_correct: boolean; score_given?: number; feedback: string } }>) => void;
+}) {
+  const [file, setFile] = useState<File | null>(null);
+  const [preview, setPreview] = useState<string | null>(null);
+  const [submitting, setSubmitting] = useState(false);
+  const [error, setError] = useState<string | null>(null);
+  const inputRef = useRef<HTMLInputElement>(null);
+
+  const handleFile = (f: File) => {
+    setFile(f);
+    setPreview(URL.createObjectURL(f));
+    setError(null);
+  };
+
+  const handleSubmit = () => {
+    if (!file || submitting) return;
+    setSubmitting(true);
+    const promise = uploadPhoto(questionId, file);
+    // Close modal immediately, let parent handle the async result
+    onSubmitted(promise);
+    onClose();
+  };
+
+  return (
+    <div className="fixed inset-0 bg-black/40 flex items-center justify-center z-50 p-4">
+      <div className="bg-white rounded-xl shadow-xl max-w-lg w-full max-h-[90vh] overflow-y-auto">
+        <div className="p-5">
+          <div className="flex items-center justify-between mb-4">
+            <h3 className="text-lg font-semibold text-gray-900">Upload Answer Photo</h3>
+            <button onClick={onClose} className="text-gray-400 hover:text-gray-600 text-xl">&times;</button>
+          </div>
+
+          {!preview ? (
+            <div
+              onClick={() => inputRef.current?.click()}
+              className="border-2 border-dashed border-gray-300 rounded-lg p-8 text-center cursor-pointer hover:border-blue-400 transition-colors"
+            >
+              <div className="text-3xl mb-2">📷</div>
+              <p className="text-sm text-gray-600">Click to take photo or select image</p>
+              <input
+                ref={inputRef}
+                type="file"
+                accept="image/*"
+                capture="environment"
+                className="hidden"
+                onChange={(e) => {
+                  const f = e.target.files?.[0];
+                  if (f) handleFile(f);
+                }}
+              />
+            </div>
+          ) : (
+            <div className="space-y-3">
+              <img src={preview} alt="Preview" className="w-full rounded-lg border" />
+              {error && (
+                <div className="text-sm text-red-600 bg-red-50 rounded-lg p-3">{error}</div>
+              )}
+              <div className="flex gap-2">
+                <button
+                  onClick={() => { setFile(null); setPreview(null); }}
+                  className="flex-1 py-2 rounded-lg text-sm border border-gray-200 text-gray-600 hover:bg-gray-50"
+                >
+                  Retake
+                </button>
+                <button
+                  onClick={handleSubmit}
+                  disabled={submitting}
+                  className="flex-1 py-2 rounded-lg text-sm bg-blue-600 text-white font-medium hover:bg-blue-700 disabled:opacity-50"
+                >
+                  Submit for Grading
+                </button>
+              </div>
+            </div>
+          )}
+        </div>
+      </div>
+    </div>
+  );
+}
--- a/frontend/src/components/workbench/QuestionDetail.tsx
+++ b/frontend/src/components/workbench/QuestionDetail.tsx
@@ -0,0 +1,260 @@
+import { useState, useEffect } from "react";
+import type { Question } from "@/types/api";
+import { subquestionLabel } from "@/lib/questionGroups";
+
+const typeLabels: Record<string, string> = {
+  mc: "Multiple Choice",
+  true_false: "True / False",
+  fill_blank: "Fill in Blank",
+  long_question: "Long Question",
+  long_answer: "Long Answer",
+  short_answer: "Short Answer",
+  coding: "Coding",
+};
+
+const difficultyColors: Record<string, string> = {
+  easy: "bg-green-100 text-green-700",
+  medium: "bg-yellow-100 text-yellow-700",
+  hard: "bg-red-100 text-red-700",
+};
+
+export default function QuestionDetail({
+  question,
+  onAnswerResult,
+}: {
+  question: Question;
+  onAnswerResult?: (isCorrect: boolean, userAnswer: string) => void;
+}) {
+  const [selectedOption, setSelectedOption] = useState<string | null>(null);
+  const [checked, setChecked] = useState(false);
+  const [fillAnswer, setFillAnswer] = useState("");
+  const [fillChecked, setFillChecked] = useState(false);
+  // True/False: per-statement answers { "a": "True", "b": "False", ... }
+  const [tfAnswer, setTfAnswer] = useState<"True" | "False" | null>(null);
+  const [tfChecked, setTfChecked] = useState(false);
+
+  // Reset state when question changes
+  useEffect(() => {
+    setSelectedOption(null);
+    setChecked(false);
+    setFillAnswer("");
+    setFillChecked(false);
+    setTfAnswer(null);
+    setTfChecked(false);
+  }, [question.id]);
+
+  const isCorrectMc = checked && selectedOption === question.correct_option;
+  const isCorrectFill =
+    fillChecked &&
+    question.correct_answer != null &&
+    fillAnswer.trim().toLowerCase() === question.correct_answer.trim().toLowerCase();
+
+  const handleMcCheck = () => {
+    if (!selectedOption) return;
+    setChecked(true);
+    const correct = selectedOption === question.correct_option;
+    onAnswerResult?.(correct, selectedOption);
+  };
+
+  const handleFillCheck = () => {
+    if (!fillAnswer.trim()) return;
+    setFillChecked(true);
+    const correct =
+      question.correct_answer != null &&
+      fillAnswer.trim().toLowerCase() === question.correct_answer.trim().toLowerCase();
+    onAnswerResult?.(correct, fillAnswer.trim());
+  };
+
+  const getOptionStyle = (label: string) => {
+    if (!checked) {
+      return label === selectedOption
+        ? "border-blue-400 bg-blue-50"
+        : "border-gray-200 hover:bg-gray-50";
+    }
+    if (label === question.correct_option) return "border-green-400 bg-green-50";
+    if (label === selectedOption) return "border-red-400 bg-red-50";
+    return "border-gray-200 opacity-50";
+  };
+
+  return (
+    <div className="mb-4">
+      {/* Header row */}
+      <div className="flex items-center gap-2 mb-2 flex-wrap">
+        <span className="text-base font-bold text-gray-900">
+          Q{question.question_number.match(/^\d+/)?.[0] ?? question.question_number}
+        </span>
+        {question.question_number.replace(/^\d+/, "") && (
+          <span className="text-xs px-2 py-0.5 rounded bg-gray-100 text-gray-600">
+            {subquestionLabel(question)}
+          </span>
+        )}
+        <span className="text-xs px-2 py-0.5 rounded bg-blue-100 text-blue-700">
+          {typeLabels[question.question_type] ?? question.question_type}
+        </span>
+        {question.score != null && (
+          <span className="text-xs text-gray-500">{question.score} pts</span>
+        )}
+        {question.difficulty && (
+          <span
+            className={`text-xs px-2 py-0.5 rounded ${difficultyColors[question.difficulty] ?? ""}`}
+          >
+            {question.difficulty}
+          </span>
+        )}
+      </div>
+
+      {/* Topics */}
+      {question.topics && question.topics.length > 0 && (
+        <div className="flex gap-1 mb-3 flex-wrap">
+          {question.topics.map((t) => (
+            <span key={t} className="text-xs bg-gray-100 text-gray-600 px-2 py-0.5 rounded-full">
+              {t}
+            </span>
+          ))}
+        </div>
+      )}
+
+      {/* MC options */}
+      {question.question_type === "mc" && question.options && (
+        <>
+          <div className="mt-3 space-y-1.5">
+            {question.options.map((opt) => (
+              <button
+                key={opt.label}
+                onClick={() => { if (!checked) setSelectedOption(opt.label); }}
+                className={`w-full flex items-start gap-2 p-2 rounded-lg border text-sm text-left transition-colors ${getOptionStyle(opt.label)}`}
+                disabled={checked}
+              >
+                <span className={`font-semibold shrink-0 w-6 ${
+                  checked && opt.label === question.correct_option ? "text-green-600" :
+                  checked && opt.label === selectedOption ? "text-red-600" :
+                  opt.label === selectedOption ? "text-blue-600" : "text-blue-600"
+                }`}>
+                  {opt.label}.
+                </span>
+                <span className="text-gray-700">{opt.text}</span>
+                {checked && opt.label === question.correct_option && (
+                  <span className="ml-auto text-green-600 text-xs font-medium shrink-0">Correct</span>
+                )}
+              </button>
+            ))}
+          </div>
+          {!checked && selectedOption && (
+            <button
+              onClick={handleMcCheck}
+              className="mt-2 px-4 py-1.5 bg-blue-600 text-white rounded-lg text-sm font-medium hover:bg-blue-700 transition-colors"
+            >
+              Check Answer
+            </button>
+          )}
+          {checked && (
+            <div className={`mt-2 text-sm font-medium ${isCorrectMc ? "text-green-600" : "text-red-600"}`}>
+              {isCorrectMc ? "Correct!" : `Wrong — the answer is ${question.correct_option}`}
+            </div>
+          )}
+        </>
+      )}
+
+      {/* True/False */}
+      {question.question_type === "true_false" && (() => {
+        // Normalize T/F/True/False to "true"/"false"
+        const normTF = (v: string | null | undefined): string => {
+          if (!v) return "";
+          const l = v.trim().toLowerCase();
+          if (l === "t" || l === "true") return "true";
+          if (l === "f" || l === "false") return "false";
+          return l;
+        };
+        const correctNorm = normTF(question.correct_option ?? question.correct_answer);
+        const correctDisplay = correctNorm === "true" ? "True" : "False";
+        return (
+        <>
+          <div className="mt-3 flex gap-2">
+            {(["True", "False"] as const).map((val) => {
+              const isSelected = tfAnswer === val;
+              const isCorrectVal = tfChecked && normTF(val) === correctNorm;
+              const isWrongVal = tfChecked && isSelected && !isCorrectVal;
+              return (
+                <button
+                  key={val}
+                  onClick={() => { if (!tfChecked) setTfAnswer(val); }}
+                  disabled={tfChecked}
+                  className={`flex-1 py-2 rounded-lg border text-sm font-semibold transition-colors ${
+                    isCorrectVal
+                      ? "border-green-400 bg-green-50 text-green-700"
+                      : isWrongVal
+                        ? "border-red-400 bg-red-50 text-red-700"
+                        : isSelected
+                          ? "border-blue-400 bg-blue-50 text-blue-700"
+                          : "border-gray-200 text-gray-600 hover:bg-gray-50"
+                  }`}
+                >
+                  {val === "True" ? "T — True" : "F — False"}
+                </button>
+              );
+            })}
+          </div>
+          {!tfChecked && tfAnswer && (
+            <button
+              onClick={() => {
+                setTfChecked(true);
+                const isCorrect = normTF(tfAnswer) === correctNorm;
+                onAnswerResult?.(isCorrect, tfAnswer);
+              }}
+              className="mt-2 px-4 py-1.5 bg-blue-600 text-white rounded-lg text-sm font-medium hover:bg-blue-700 transition-colors"
+            >
+              Check Answer
+            </button>
+          )}
+          {tfChecked && (
+            <div className={`mt-2 text-sm font-medium ${
+              normTF(tfAnswer) === correctNorm ? "text-green-600" : "text-red-600"
+            }`}>
+              {normTF(tfAnswer) === correctNorm
+                ? "Correct!"
+                : `Wrong — the answer is ${correctDisplay}`}
+            </div>
+          )}
+        </>
+        );
+      })()}
+
+      {/* Fill-blank input */}
+      {question.question_type === "fill_blank" && (
+        <div className="mt-3">
+          <div className="flex gap-2">
+            <input
+              type="text"
+              value={fillAnswer}
+              onChange={(e) => { if (!fillChecked) setFillAnswer(e.target.value); }}
+              placeholder="Type your answer..."
+              disabled={fillChecked}
+              className={`flex-1 border rounded-lg px-3 py-2 text-sm focus:outline-none focus:ring-2 focus:ring-blue-500 ${
+                fillChecked
+                  ? isCorrectFill ? "border-green-400 bg-green-50" : "border-red-400 bg-red-50"
+                  : "border-gray-300"
+              }`}
+              onKeyDown={(e) => { if (e.key === "Enter") handleFillCheck(); }}
+            />
+            {!fillChecked && (
+              <button
+                onClick={handleFillCheck}
+                disabled={!fillAnswer.trim()}
+                className="px-4 py-2 bg-blue-600 text-white rounded-lg text-sm font-medium hover:bg-blue-700 disabled:opacity-50 transition-colors"
+              >
+                Check
+              </button>
+            )}
+          </div>
+          {fillChecked && (
+            <div className={`mt-2 text-sm font-medium ${isCorrectFill ? "text-green-600" : "text-red-600"}`}>
+              {isCorrectFill
+                ? "Correct!"
+                : `Wrong — the answer is: ${question.correct_answer ?? "N/A"}`}
+            </div>
+          )}
+        </div>
+      )}
+    </div>
+  );
+}
--- a/frontend/src/components/workbench/QuestionNav.tsx
+++ b/frontend/src/components/workbench/QuestionNav.tsx
@@ -0,0 +1,56 @@
+import type { Question } from "@/types/api";
+import type { QuestionGroup } from "@/lib/questionGroups";
+import { subquestionLabel } from "@/lib/questionGroups";
+
+export default function QuestionNav({
+  groups,
+  currentGroupKey,
+  currentQuestionId,
+  onSelectGroup,
+  onSelectQuestion,
+}: {
+  groups: QuestionGroup[];
+  currentGroupKey: string | null;
+  currentQuestionId: string | null;
+  onSelectGroup: (groupKey: string) => void;
+  onSelectQuestion: (questionId: string) => void;
+}) {
+  const activeGroup = groups.find((group) => group.key === currentGroupKey) ?? null;
+
+  return (
+    <div className="border-b border-gray-200 bg-white px-4 py-2 shrink-0">
+      <div className="flex gap-1.5 overflow-x-auto hide-scrollbar">
+        {groups.map((group) => (
+          <button
+            key={group.key}
+            onClick={() => onSelectGroup(group.key)}
+            className={`px-3 py-1.5 rounded-lg text-xs font-medium whitespace-nowrap transition-colors
+              ${group.key === currentGroupKey
+                ? "bg-blue-600 text-white"
+                : "bg-gray-100 text-gray-600 hover:bg-gray-200"
+              }`}
+          >
+            {group.label}
+          </button>
+        ))}
+      </div>
+      {activeGroup && activeGroup.questions.length > 1 && (
+        <div className="flex gap-1.5 overflow-x-auto hide-scrollbar mt-2">
+          {activeGroup.questions.map((question) => (
+            <button
+              key={question.id}
+              onClick={() => onSelectQuestion(question.id)}
+              className={`px-2.5 py-1 rounded-md text-[11px] font-medium whitespace-nowrap transition-colors
+                ${question.id === currentQuestionId
+                  ? "bg-blue-50 text-blue-700 border border-blue-200"
+                  : "bg-gray-50 text-gray-500 border border-gray-200 hover:bg-gray-100"
+                }`}
+            >
+              {subquestionLabel(question)}
+            </button>
+          ))}
+        </div>
+      )}
+    </div>
+  );
+}
--- a/frontend/src/components/workbench/SimilarHistoryPanel.tsx
+++ b/frontend/src/components/workbench/SimilarHistoryPanel.tsx
@@ -0,0 +1,130 @@
+import { useEffect, useState } from "react";
+import { Link } from "react-router-dom";
+
+import { getSimilarQuestions } from "@/lib/api";
+import type { Question, SimilarQuestion } from "@/types/api";
+
+const typeLabel: Record<string, string> = {
+  mc: "MC",
+  true_false: "T/F",
+  fill_blank: "Fill",
+  long_question: "Long",
+  long_answer: "Long",
+  short_answer: "Short",
+  coding: "Code",
+};
+
+function matchColor(percent: number): string {
+  if (percent >= 80) return "bg-green-100 text-green-700";
+  if (percent >= 60) return "bg-amber-100 text-amber-700";
+  return "bg-gray-100 text-gray-600";
+}
+
+function cleanReason(reason: string): string {
+  // "Shared topic: foo_bar, baz_qux" → "Shared topic: Foo Bar, Baz Qux"
+  return reason.replace(/[_]/g, " ").replace(/:\s*(.+)$/, (_, rest) =>
+    ": " + rest.split(",").map((s: string) =>
+      s.trim().replace(/\b\w/g, (c: string) => c.toUpperCase())
+    ).join(", ")
+  );
+}
+
+export default function SimilarHistoryPanel({ question }: { question: Question }) {
+  const [items, setItems] = useState<SimilarQuestion[]>([]);
+  const [loading, setLoading] = useState(true);
+  const [error, setError] = useState<string | null>(null);
+  const [isOpen, setIsOpen] = useState(true);
+
+  useEffect(() => {
+    let cancelled = false;
+    setLoading(true);
+    setError(null);
+    setItems([]);
+    getSimilarQuestions(question.id)
+      .then((data) => {
+        if (cancelled) return;
+        setItems(data);
+        setLoading(false);
+      })
+      .catch((err: unknown) => {
+        if (cancelled) return;
+        setError(err instanceof Error ? err.message : "Failed to load.");
+        setLoading(false);
+      });
+    return () => { cancelled = true; };
+  }, [question.id]);
+
+  return (
+    <div className="rounded-lg border border-blue-200 mb-3 overflow-hidden">
+      <button
+        onClick={() => setIsOpen((open) => !open)}
+        className="w-full flex items-center justify-between p-3 bg-blue-50"
+      >
+        <div className="flex items-center gap-2">
+          <span className="w-5 h-5 flex items-center justify-center rounded bg-blue-600 text-white text-xs font-bold">S</span>
+          <span className="font-semibold text-sm text-blue-800">Similar Questions</span>
+        </div>
+        <span className="text-xs text-blue-600">{loading ? "…" : items.length}</span>
+      </button>
+
+      {isOpen && (
+        <div className="p-2 space-y-1.5 bg-white">
+          {loading && <div className="text-xs text-gray-400 px-1 py-2">Loading…</div>}
+          {!loading && error && (
+            <div className="text-xs text-red-600 bg-red-50 border border-red-200 rounded px-3 py-2">{error}</div>
+          )}
+          {!loading && !error && items.length === 0 && (
+            <div className="text-xs text-gray-400 px-1 py-2">No similar questions found.</div>
+          )}
+
+          {items.map((item) => (
+            <Link
+              key={item.id}
+              to={`/paper/${item.paper_id}`}
+              className="flex items-center gap-2 px-2.5 py-2 rounded-lg border border-gray-100 hover:border-blue-200 hover:bg-blue-50/40 transition-colors"
+            >
+              {/* Match % badge */}
+              <span className={`shrink-0 text-[11px] font-bold px-1.5 py-0.5 rounded ${matchColor(item.match_percent)}`}>
+                {item.match_percent}%
+              </span>
+
+              {/* Main info */}
+              <div className="flex-1 min-w-0">
+                <div className="flex items-center gap-1.5 flex-wrap">
+                  <span className="text-xs font-semibold text-gray-700">{item.source}</span>
+                  <span className="text-xs text-gray-400">·</span>
+                  <span className="text-xs text-gray-500">Q{item.question_number}</span>
+                  {item.question_type && (
+                    <>
+                      <span className="text-xs text-gray-400">·</span>
+                      <span className="text-xs text-gray-500">{typeLabel[item.question_type] ?? item.question_type}</span>
+                    </>
+                  )}
+                </div>
+
+                {/* Topics + reasons in one row */}
+                <div className="flex gap-1 flex-wrap mt-1">
+                  {item.topics.slice(0, 2).map((topic) => (
+                    <span key={topic} className="text-[10px] px-1.5 py-0.5 rounded bg-gray-100 text-gray-500">
+                      {topic}
+                    </span>
+                  ))}
+                  {item.match_reasons
+                    ?.filter((r) => !r.startsWith("Same format") && !r.startsWith("Same difficulty"))
+                    .slice(0, 2)
+                    .map((reason) => (
+                      <span key={reason} className="text-[10px] px-1.5 py-0.5 rounded bg-blue-50 text-blue-500">
+                        {cleanReason(reason)}
+                      </span>
+                    ))}
+                </div>
+              </div>
+
+              <span className="text-gray-300 text-xs shrink-0">›</span>
+            </Link>
+          ))}
+        </div>
+      )}
+    </div>
+  );
+}
--- a/frontend/src/components/workbench/VariantDetail.tsx
+++ b/frontend/src/components/workbench/VariantDetail.tsx
@@ -0,0 +1,148 @@
+import { useState } from "react";
+import type { VariantQuestion } from "@/types/api";
+import KaTeXRenderer from "@/components/shared/KaTeXRenderer";
+import CollapsibleSection from "@/components/shared/CollapsibleSection";
+
+export default function VariantDetail({
+  variant,
+}: {
+  variant: VariantQuestion;
+}) {
+  const [selectedOption, setSelectedOption] = useState<string | null>(null);
+  const [checked, setChecked] = useState(false);
+  const [fillAnswer, setFillAnswer] = useState("");
+  const [fillChecked, setFillChecked] = useState(false);
+
+  const isMc = (variant.question_type === "mc" || variant.question_type === "true_false") && variant.options;
+
+  const handleMcCheck = () => {
+    if (!selectedOption) return;
+    setChecked(true);
+  };
+
+  const handleFillCheck = () => {
+    if (!fillAnswer.trim()) return;
+    setFillChecked(true);
+  };
+
+  const isCorrectMc = checked && selectedOption === variant.correct_answer;
+  const isCorrectFill =
+    fillChecked &&
+    fillAnswer.trim().toLowerCase() === variant.correct_answer.trim().toLowerCase();
+
+  const getOptionStyle = (label: string) => {
+    if (!checked) {
+      return label === selectedOption
+        ? "border-blue-400 bg-blue-50"
+        : "border-gray-200 hover:bg-gray-50";
+    }
+    if (label === variant.correct_answer) return "border-green-400 bg-green-50";
+    if (label === selectedOption) return "border-red-400 bg-red-50";
+    return "border-gray-200 opacity-50";
+  };
+
+  return (
+    <div>
+      {/* Header */}
+      <div className="flex items-center gap-2 mb-3">
+        <span className="w-5 h-5 flex items-center justify-center bg-purple-600 text-white text-xs font-bold rounded-full">V</span>
+        <span className="text-sm font-semibold text-gray-900">Similar Question</span>
+        <span className="text-xs px-2 py-0.5 rounded bg-purple-100 text-purple-700">
+          {variant.question_type}
+        </span>
+      </div>
+
+      {/* Question text */}
+      <div className="text-sm text-gray-800 leading-relaxed bg-purple-50 rounded-lg p-3 border border-purple-200 mb-4">
+        <KaTeXRenderer html={variant.question_text} />
+      </div>
+
+      {/* MC options */}
+      {isMc && variant.options && (
+        <>
+          <div className="space-y-1.5">
+            {variant.options.map((opt) => (
+              <button
+                key={opt.label}
+                onClick={() => { if (!checked) setSelectedOption(opt.label); }}
+                disabled={checked}
+                className={`w-full flex items-start gap-2 p-2 rounded-lg border text-sm text-left transition-colors ${getOptionStyle(opt.label)}`}
+              >
+                <span className="font-semibold shrink-0 w-6 text-blue-600">{opt.label}.</span>
+                <span className="text-gray-700">{opt.text}</span>
+                {checked && opt.label === variant.correct_answer && (
+                  <span className="ml-auto text-green-600 text-xs font-medium shrink-0">Correct</span>
+                )}
+              </button>
+            ))}
+          </div>
+          {!checked && selectedOption && (
+            <button
+              onClick={handleMcCheck}
+              className="mt-2 px-4 py-1.5 bg-blue-600 text-white rounded-lg text-sm font-medium hover:bg-blue-700"
+            >
+              Check Answer
+            </button>
+          )}
+          {checked && (
+            <div className={`mt-2 text-sm font-medium ${isCorrectMc ? "text-green-600" : "text-red-600"}`}>
+              {isCorrectMc ? "Correct!" : `Wrong — the answer is ${variant.correct_answer}`}
+            </div>
+          )}
+        </>
+      )}
+
+      {/* Non-MC input */}
+      {!isMc && (
+        <div className="mb-3">
+          <div className="flex gap-2">
+            <input
+              type="text"
+              value={fillAnswer}
+              onChange={(e) => { if (!fillChecked) setFillAnswer(e.target.value); }}
+              placeholder="Type your answer..."
+              disabled={fillChecked}
+              className={`flex-1 border rounded-lg px-3 py-2 text-sm focus:outline-none focus:ring-2 focus:ring-blue-500 ${
+                fillChecked
+                  ? isCorrectFill ? "border-green-400 bg-green-50" : "border-red-400 bg-red-50"
+                  : "border-gray-300"
+              }`}
+              onKeyDown={(e) => { if (e.key === "Enter") handleFillCheck(); }}
+            />
+            {!fillChecked && (
+              <button
+                onClick={handleFillCheck}
+                disabled={!fillAnswer.trim()}
+                className="px-4 py-2 bg-blue-600 text-white rounded-lg text-sm font-medium hover:bg-blue-700 disabled:opacity-50"
+              >
+                Check
+              </button>
+            )}
+          </div>
+          {fillChecked && (
+            <div className={`mt-2 text-sm font-medium ${isCorrectFill ? "text-green-600" : "text-red-600"}`}>
+              {isCorrectFill ? "Correct!" : `Answer: ${variant.correct_answer}`}
+            </div>
+          )}
+        </div>
+      )}
+
+      {/* AI Trio */}
+      <div className="mt-4 space-y-2">
+        {variant.knowledge_reminder && (
+          <CollapsibleSection title="Knowledge Reminder" colorScheme="blue">
+            <KaTeXRenderer html={variant.knowledge_reminder} />
+          </CollapsibleSection>
+        )}
+        {variant.ai_hint && (
+          <CollapsibleSection title="AI Hint" colorScheme="amber">
+            <KaTeXRenderer html={variant.ai_hint} />
+          </CollapsibleSection>
+        )}
+        <CollapsibleSection title="Solution" colorScheme="green">
+          <KaTeXRenderer html={variant.solution} />
+        </CollapsibleSection>
+      </div>
+    </div>
+  );
+}
--- a/frontend/src/components/workbench/VariantModal.tsx
+++ b/frontend/src/components/workbench/VariantModal.tsx
@@ -0,0 +1,189 @@
+import { useState } from "react";
+import type { VariantQuestion } from "@/types/api";
+import KaTeXRenderer from "@/components/shared/KaTeXRenderer";
+
+export default function VariantModal({
+  variant,
+  onClose,
+}: {
+  variant: VariantQuestion;
+  onClose: () => void;
+}) {
+  const [selectedOption, setSelectedOption] = useState<string | null>(null);
+  const [checked, setChecked] = useState(false);
+  const [fillAnswer, setFillAnswer] = useState("");
+  const [fillChecked, setFillChecked] = useState(false);
+  const [showKnowledge, setShowKnowledge] = useState(false);
+  const [showHint, setShowHint] = useState(false);
+  const [showSolution, setShowSolution] = useState(false);
+
+  const isMc = (variant.question_type === "mc" || variant.question_type === "true_false") && variant.options;
+
+  const handleMcCheck = () => {
+    if (!selectedOption) return;
+    setChecked(true);
+  };
+
+  const handleFillCheck = () => {
+    if (!fillAnswer.trim()) return;
+    setFillChecked(true);
+  };
+
+  const isCorrectMc = checked && selectedOption === variant.correct_answer;
+  const isCorrectFill =
+    fillChecked &&
+    fillAnswer.trim().toLowerCase() === variant.correct_answer.trim().toLowerCase();
+
+  const getOptionStyle = (label: string) => {
+    if (!checked) {
+      return label === selectedOption
+        ? "border-blue-400 bg-blue-50"
+        : "border-gray-200 hover:bg-gray-50";
+    }
+    if (label === variant.correct_answer) return "border-green-400 bg-green-50";
+    if (label === selectedOption) return "border-red-400 bg-red-50";
+    return "border-gray-200 opacity-50";
+  };
+
+  return (
+    <div className="fixed inset-0 bg-black/40 flex items-center justify-center z-50 p-4">
+      <div className="bg-white rounded-xl shadow-xl max-w-lg w-full max-h-[90vh] overflow-y-auto">
+        <div className="p-5">
+          <div className="flex items-center justify-between mb-4">
+            <h3 className="text-lg font-semibold text-gray-900">Similar Question</h3>
+            <button onClick={onClose} className="text-gray-400 hover:text-gray-600 text-xl">&times;</button>
+          </div>
+
+          {/* Question text */}
+          <div className="text-sm text-gray-800 leading-relaxed bg-gray-50 rounded-lg p-3 border border-gray-200 mb-3">
+            <KaTeXRenderer html={variant.question_text} />
+          </div>
+
+          {/* MC options */}
+          {isMc && variant.options && (
+            <>
+              <div className="space-y-1.5">
+                {variant.options.map((opt) => (
+                  <button
+                    key={opt.label}
+                    onClick={() => { if (!checked) setSelectedOption(opt.label); }}
+                    disabled={checked}
+                    className={`w-full flex items-start gap-2 p-2 rounded-lg border text-sm text-left transition-colors ${getOptionStyle(opt.label)}`}
+                  >
+                    <span className="font-semibold shrink-0 w-6 text-blue-600">{opt.label}.</span>
+                    <span className="text-gray-700">{opt.text}</span>
+                    {checked && opt.label === variant.correct_answer && (
+                      <span className="ml-auto text-green-600 text-xs font-medium shrink-0">Correct</span>
+                    )}
+                  </button>
+                ))}
+              </div>
+              {!checked && selectedOption && (
+                <button
+                  onClick={handleMcCheck}
+                  className="mt-2 px-4 py-1.5 bg-blue-600 text-white rounded-lg text-sm font-medium hover:bg-blue-700"
+                >
+                  Check Answer
+                </button>
+              )}
+              {checked && (
+                <div className={`mt-2 text-sm font-medium ${isCorrectMc ? "text-green-600" : "text-red-600"}`}>
+                  {isCorrectMc ? "Correct!" : `Wrong — the answer is ${variant.correct_answer}`}
+                </div>
+              )}
+            </>
+          )}
+
+          {/* Non-MC input */}
+          {!isMc && (
+            <div className="mt-1">
+              <div className="flex gap-2">
+                <input
+                  type="text"
+                  value={fillAnswer}
+                  onChange={(e) => { if (!fillChecked) setFillAnswer(e.target.value); }}
+                  placeholder="Type your answer..."
+                  disabled={fillChecked}
+                  className={`flex-1 border rounded-lg px-3 py-2 text-sm focus:outline-none focus:ring-2 focus:ring-blue-500 ${
+                    fillChecked
+                      ? isCorrectFill ? "border-green-400 bg-green-50" : "border-red-400 bg-red-50"
+                      : "border-gray-300"
+                  }`}
+                  onKeyDown={(e) => { if (e.key === "Enter") handleFillCheck(); }}
+                />
+                {!fillChecked && (
+                  <button
+                    onClick={handleFillCheck}
+                    disabled={!fillAnswer.trim()}
+                    className="px-4 py-2 bg-blue-600 text-white rounded-lg text-sm font-medium hover:bg-blue-700 disabled:opacity-50"
+                  >
+                    Check
+                  </button>
+                )}
+              </div>
+              {fillChecked && (
+                <div className={`mt-2 text-sm font-medium ${isCorrectFill ? "text-green-600" : "text-red-600"}`}>
+                  {isCorrectFill ? "Correct!" : `Answer: ${variant.correct_answer}`}
+                </div>
+              )}
+            </div>
+          )}
+
+          {/* AI Trio: Knowledge / Hint / Solution */}
+          <div className="mt-4 border-t pt-3 space-y-2">
+            {variant.knowledge_reminder && (
+              <div>
+                <button
+                  onClick={() => setShowKnowledge(!showKnowledge)}
+                  className="text-sm text-blue-600 hover:text-blue-800 font-medium"
+                >
+                  {showKnowledge ? "▾ Hide Knowledge" : "▸ Knowledge Reminder"}
+                </button>
+                {showKnowledge && (
+                  <div className="mt-2 bg-blue-50 rounded-lg p-3 text-sm border border-blue-200">
+                    <KaTeXRenderer html={variant.knowledge_reminder} />
+                  </div>
+                )}
+              </div>
+            )}
+            {variant.ai_hint && (
+              <div>
+                <button
+                  onClick={() => setShowHint(!showHint)}
+                  className="text-sm text-amber-600 hover:text-amber-800 font-medium"
+                >
+                  {showHint ? "▾ Hide Hint" : "▸ AI Hint"}
+                </button>
+                {showHint && (
+                  <div className="mt-2 bg-amber-50 rounded-lg p-3 text-sm border border-amber-200">
+                    <KaTeXRenderer html={variant.ai_hint} />
+                  </div>
+                )}
+              </div>
+            )}
+            <div>
+              <button
+                onClick={() => setShowSolution(!showSolution)}
+                className="text-sm text-green-600 hover:text-green-800 font-medium"
+              >
+                {showSolution ? "▾ Hide Solution" : "▸ Solution"}
+              </button>
+              {showSolution && (
+                <div className="mt-2 bg-green-50 rounded-lg p-3 text-sm border border-green-200">
+                  <KaTeXRenderer html={variant.solution} />
+                </div>
+              )}
+            </div>
+          </div>
+
+          <button
+            onClick={onClose}
+            className="mt-4 w-full py-2 rounded-lg text-sm bg-gray-100 text-gray-700 font-medium hover:bg-gray-200"
+          >
+            Close
+          </button>
+        </div>
+      </div>
+    </div>
+  );
+}
--- a/frontend/src/contexts/AuthContext.tsx
+++ b/frontend/src/contexts/AuthContext.tsx
@@ -0,0 +1,49 @@
+import { createContext, useContext, useEffect, useState } from "react";
+import type { Session, User } from "@supabase/supabase-js";
+import { supabase } from "@/lib/supabase";
+
+interface AuthContextValue {
+  session: Session | null;
+  user: User | null;
+  loading: boolean;
+  signOut: () => Promise<void>;
+}
+
+const AuthContext = createContext<AuthContextValue>({
+  session: null,
+  user: null,
+  loading: true,
+  signOut: async () => {},
+});
+
+export function AuthProvider({ children }: { children: React.ReactNode }) {
+  const [session, setSession] = useState<Session | null>(null);
+  const [loading, setLoading] = useState(true);
+
+  useEffect(() => {
+    supabase.auth.getSession().then(({ data }) => {
+      setSession(data.session);
+      setLoading(false);
+    });
+
+    const { data: { subscription } } = supabase.auth.onAuthStateChange((_event, session) => {
+      setSession(session);
+    });
+
+    return () => subscription.unsubscribe();
+  }, []);
+
+  const signOut = async () => {
+    await supabase.auth.signOut();
+  };
+
+  return (
+    <AuthContext.Provider value={{ session, user: session?.user ?? null, loading, signOut }}>
+      {children}
+    </AuthContext.Provider>
+  );
+}
+
+export function useAuth() {
+  return useContext(AuthContext);
+}
--- a/frontend/src/hooks/usePaper.ts
+++ b/frontend/src/hooks/usePaper.ts
@@ -0,0 +1,43 @@
+import { useEffect, useState } from "react";
+import { getPaper } from "@/lib/api";
+import type { Paper } from "@/types/api";
+
+const POLL_INTERVAL = 3000;
+
+export function usePaper(paperId: string) {
+  const [paper, setPaper] = useState<Paper | null>(null);
+  const [loading, setLoading] = useState(true);
+  const [error, setError] = useState<string | null>(null);
+
+  useEffect(() => {
+    let intervalId: number | null = null;
+    let cancelled = false;
+
+    const fetchPaper = async () => {
+      try {
+        const data = await getPaper(paperId);
+        if (cancelled) return;
+        setPaper(data);
+        setLoading(false);
+        if (data.status === "ready" || data.status === "error") {
+          if (intervalId !== null) clearInterval(intervalId);
+        }
+      } catch (err) {
+        if (cancelled) return;
+        setError(err instanceof Error ? err.message : "Unknown error");
+        setLoading(false);
+        if (intervalId !== null) clearInterval(intervalId);
+      }
+    };
+
+    fetchPaper();
+    intervalId = window.setInterval(fetchPaper, POLL_INTERVAL);
+
+    return () => {
+      cancelled = true;
+      if (intervalId !== null) clearInterval(intervalId);
+    };
+  }, [paperId]);
+
+  return { paper, loading, error };
+}
--- a/frontend/src/hooks/useQuestions.ts
+++ b/frontend/src/hooks/useQuestions.ts
@@ -0,0 +1,33 @@
+import { useEffect, useState } from "react";
+import { getQuestions } from "@/lib/api";
+import type { Question } from "@/types/api";
+
+export function useQuestions(paperId: string, enabled: boolean) {
+  const [questions, setQuestions] = useState<Question[]>([]);
+  const [loading, setLoading] = useState(false);
+  const [error, setError] = useState<string | null>(null);
+
+  useEffect(() => {
+    if (!enabled) return;
+    let cancelled = false;
+    setLoading(true);
+
+    getQuestions(paperId)
+      .then((data) => {
+        if (!cancelled) {
+          setQuestions(data);
+          setLoading(false);
+        }
+      })
+      .catch((err) => {
+        if (!cancelled) {
+          setError(err instanceof Error ? err.message : "Unknown error");
+          setLoading(false);
+        }
+      });
+
+    return () => { cancelled = true; };
+  }, [paperId, enabled]);
+
+  return { questions, loading, error };
+}
--- a/frontend/src/lib/api.ts
+++ b/frontend/src/lib/api.ts
@@ -0,0 +1,190 @@
+import type {
+  CourseAnalytics,
+  Paper,
+  Question,
+  QuestionVariant,
+  SimilarQuestion,
+  UploadResponse,
+  UserAttempt,
+} from "@/types/api";
+import { supabase } from "@/lib/supabase";
+
+const API_BASE = "/api";
+
+async function authHeaders(): Promise<Record<string, string>> {
+  const { data } = await supabase.auth.getSession();
+  const token = data.session?.access_token;
+  if (!token) return {};
+  return { Authorization: `Bearer ${token}` };
+}
+
+export async function uploadPaper(formData: FormData): Promise<UploadResponse> {
+  const headers = await authHeaders();
+  const res = await fetch(`${API_BASE}/papers/upload`, {
+    method: "POST",
+    headers,
+    body: formData,
+  });
+  if (!res.ok) throw new Error(`Upload failed: ${res.status}`);
+  return res.json();
+}
+
+export async function getPaper(paperId: string): Promise<Paper> {
+  const res = await fetch(`${API_BASE}/papers/${paperId}`);
+  if (!res.ok) throw new Error(`Paper not found: ${res.status}`);
+  return res.json();
+}
+
+export async function getQuestions(paperId: string): Promise<Question[]> {
+  const res = await fetch(`${API_BASE}/papers/${paperId}/questions`);
+  if (!res.ok) throw new Error(`Questions fetch failed: ${res.status}`);
+  return res.json();
+}
+
+export async function myPapers(): Promise<Paper[]> {
+  const headers = await authHeaders();
+  const res = await fetch(`${API_BASE}/papers/mine`, { headers });
+  if (!res.ok) throw new Error(`My papers fetch failed: ${res.status}`);
+  return res.json();
+}
+
+export async function listPapers(): Promise<Paper[]> {
+  const res = await fetch(`${API_BASE}/papers/`);
+  if (!res.ok) throw new Error(`List papers failed: ${res.status}`);
+  return res.json();
+}
+
+export async function recordAttempt(
+  questionId: string,
+  attemptType: string,
+  userAnswer: string | null,
+  isCorrect: boolean | null,
+): Promise<UserAttempt> {
+  const headers = await authHeaders();
+  const res = await fetch(`${API_BASE}/attempts/`, {
+    method: "POST",
+    headers: { "Content-Type": "application/json", ...headers },
+    body: JSON.stringify({
+      question_id: questionId,
+      attempt_type: attemptType,
+      user_answer: userAnswer,
+      is_correct: isCorrect,
+    }),
+  });
+  if (!res.ok) throw new Error(`Attempt save failed: ${res.status}`);
+  return res.json();
+}
+
+export async function uploadPhoto(
+  questionId: string,
+  photo: File,
+): Promise<{ attempt: UserAttempt; ocr_text: string; grade: { is_correct: boolean; score_given?: number; feedback: string; error_at_step: number | null } }> {
+  const headers = await authHeaders();
+  const fd = new FormData();
+  fd.append("question_id", questionId);
+  fd.append("photo", photo);
+  const res = await fetch(`${API_BASE}/attempts/photo`, {
+    method: "POST",
+    headers,
+    body: fd,
+  });
+  if (!res.ok) throw new Error(`Photo upload failed: ${res.status}`);
+  return res.json();
+}
+
+export async function getPaperAttempts(paperId: string): Promise<{
+  question_id: string;
+  is_correct: boolean;
+  feedback: string | null;
+  photo_ocr_text: string | null;
+}[]> {
+  const headers = await authHeaders();
+  const res = await fetch(`${API_BASE}/attempts/by-paper/${paperId}`, { headers });
+  if (!res.ok) return [];
+  return res.json();
+}
+
+export async function generateVariant(questionId: string): Promise<QuestionVariant> {
+  const headers = await authHeaders();
+  const res = await fetch(`${API_BASE}/questions/${questionId}/variant`, {
+    method: "POST",
+    headers,
+  });
+  if (!res.ok) throw new Error(`Variant generation failed: ${res.status}`);
+  return res.json();
+}
+
+export async function getVariants(questionId: string): Promise<QuestionVariant[]> {
+  const headers = await authHeaders();
+  const res = await fetch(`${API_BASE}/questions/${questionId}/variants`, { headers });
+  if (!res.ok) throw new Error(`Variants fetch failed: ${res.status}`);
+  return res.json();
+}
+
+export async function updateVariant(variantId: string, data: { favorited?: boolean }): Promise<QuestionVariant> {
+  const headers = await authHeaders();
+  const res = await fetch(`${API_BASE}/questions/variant/${variantId}`, {
+    method: "PATCH",
+    headers: { "Content-Type": "application/json", ...headers },
+    body: JSON.stringify(data),
+  });
+  if (!res.ok) throw new Error(`Variant update failed: ${res.status}`);
+  return res.json();
+}
+
+export async function deleteVariant(variantId: string): Promise<void> {
+  const headers = await authHeaders();
+  await fetch(`${API_BASE}/questions/variant/${variantId}`, { method: "DELETE", headers });
+}
+
+export async function getFavoriteVariants(): Promise<QuestionVariant[]> {
+  const headers = await authHeaders();
+  const res = await fetch(`${API_BASE}/questions/variants/favorited`, { headers });
+  if (!res.ok) throw new Error(`Favorited variants fetch failed: ${res.status}`);
+  return res.json();
+}
+
+export async function getErrorBook(courseCode?: string): Promise<UserAttempt[]> {
+  const headers = await authHeaders();
+  const params = new URLSearchParams();
+  if (courseCode) params.set("course_code", courseCode);
+  const query = params.toString() ? `?${params.toString()}` : "";
+  const res = await fetch(`${API_BASE}/attempts/error-book${query}`, { headers });
+  if (!res.ok) throw new Error(`Error book fetch failed: ${res.status}`);
+  return res.json();
+}
+
+export async function updateAttempt(
+  attemptId: string,
+  data: { in_error_book?: boolean; mastered?: boolean },
+): Promise<UserAttempt> {
+  const headers = await authHeaders();
+  const res = await fetch(`${API_BASE}/attempts/${attemptId}`, {
+    method: "PATCH",
+    headers: { "Content-Type": "application/json", ...headers },
+    body: JSON.stringify(data),
+  });
+  if (!res.ok) throw new Error(`Attempt update failed: ${res.status}`);
+  return res.json();
+}
+
+export async function listCourses(): Promise<string[]> {
+  const res = await fetch(`${API_BASE}/analytics/courses`);
+  if (!res.ok) throw new Error(`Courses fetch failed: ${res.status}`);
+  return res.json();
+}
+
+export async function getCourseAnalytics(courseCode: string): Promise<CourseAnalytics> {
+  const res = await fetch(`${API_BASE}/analytics/course/${courseCode}`);
+  if (!res.ok) throw new Error(`Analytics fetch failed: ${res.status}`);
+  return res.json();
+}
+
+export async function getSimilarQuestions(
+  questionId: string,
+  limit = 6,
+): Promise<SimilarQuestion[]> {
+  const res = await fetch(`${API_BASE}/questions/${questionId}/similar?limit=${limit}`);
+  if (!res.ok) throw new Error(`Similar question fetch failed: ${res.status}`);
+  return res.json();
+}
--- a/frontend/src/lib/questionGroups.ts
+++ b/frontend/src/lib/questionGroups.ts
@@ -0,0 +1,45 @@
+import type { Question } from "@/types/api";
+
+export interface QuestionGroup {
+  key: string;
+  label: string;
+  questions: Question[];
+  startPage: number;
+}
+
+function topLevelKey(questionNumber: string): string {
+  const match = questionNumber.match(/^\d+/);
+  return match?.[0] ?? questionNumber;
+}
+
+export function groupQuestions(questions: Question[]): QuestionGroup[] {
+  const groups = new Map<string, QuestionGroup>();
+
+  for (const question of questions) {
+    const key = topLevelKey(question.question_number);
+    const existing = groups.get(key);
+    if (existing) {
+      existing.questions.push(question);
+      existing.startPage = Math.min(existing.startPage, question.page_number ?? existing.startPage);
+      continue;
+    }
+    groups.set(key, {
+      key,
+      label: `Q${key}`,
+      questions: [question],
+      startPage: question.page_number ?? 1,
+    });
+  }
+
+  return Array.from(groups.values()).sort((a, b) => Number(a.key) - Number(b.key));
+}
+
+export function subquestionLabel(question: Question): string {
+  const remainder = question.question_number.replace(/^\d+/, "");
+  if (!remainder) return "Main";
+  return remainder
+    .replace(/^_+/, "")
+    .split("_")
+    .filter(Boolean)
+    .join(".");
+}
--- a/frontend/src/lib/supabase.ts
+++ b/frontend/src/lib/supabase.ts
@@ -0,0 +1,6 @@
+import { createClient } from "@supabase/supabase-js";
+
+const supabaseUrl = import.meta.env.VITE_SUPABASE_URL as string;
+const supabaseAnonKey = import.meta.env.VITE_SUPABASE_ANON_KEY as string;
+
+export const supabase = createClient(supabaseUrl, supabaseAnonKey);
--- a/frontend/src/main.tsx
+++ b/frontend/src/main.tsx
@@ -0,0 +1,16 @@
+import { StrictMode } from "react";
+import { createRoot } from "react-dom/client";
+import { BrowserRouter } from "react-router-dom";
+import App from "./App";
+import { AuthProvider } from "./contexts/AuthContext";
+import "./styles/globals.css";
+
+createRoot(document.getElementById("root")!).render(
+  <StrictMode>
+    <BrowserRouter>
+      <AuthProvider>
+        <App />
+      </AuthProvider>
+    </BrowserRouter>
+  </StrictMode>,
+);
--- a/frontend/src/pages/AnalyticsPage.tsx
+++ b/frontend/src/pages/AnalyticsPage.tsx
@@ -0,0 +1,521 @@
+import { useEffect, useMemo, useState } from "react";
+import { Link, useNavigate, useParams } from "react-router-dom";
+
+import Header from "@/components/layout/Header";
+import { getCourseAnalytics, listCourses } from "@/lib/api";
+import type { CourseAnalytics, AnalyticsTopicQuestion } from "@/types/api";
+
+const typeLabel: Record<string, string> = {
+  mc: "Multiple Choice",
+  true_false: "True / False",
+  fill_blank: "Fill in Blank",
+  long_question: "Long Question",
+  short_answer: "Short Answer",
+  coding: "Coding",
+};
+
+const TYPE_COLORS: Record<string, string> = {
+  mc: "bg-violet-50 text-violet-700 border-violet-200",
+  true_false: "bg-amber-50 text-amber-700 border-amber-200",
+  fill_blank: "bg-teal-50 text-teal-700 border-teal-200",
+  long_question: "bg-sky-50 text-sky-700 border-sky-200",
+  short_answer: "bg-rose-50 text-rose-700 border-rose-200",
+  coding: "bg-emerald-50 text-emerald-700 border-emerald-200",
+};
+
+const DIFF_COLORS: Record<string, string> = {
+  hard: "text-red-600 bg-red-50 border-red-200",
+  medium: "text-amber-600 bg-amber-50 border-amber-200",
+  easy: "text-green-600 bg-green-50 border-green-200",
+};
+
+type QItem = AnalyticsTopicQuestion;
+type Analytics = CourseAnalytics;
+
+const PAGE_SIZE = 8;
+
+export default function AnalyticsPage() {
+  const { courseCode } = useParams<{ courseCode?: string }>();
+  const navigate = useNavigate();
+
+  const [courses, setCourses] = useState<string[]>([]);
+  const [search, setSearch] = useState("");
+
+  useEffect(() => { listCourses().then(setCourses).catch(() => {}); }, []);
+  const filtered = useMemo(() => {
+    const q = search.trim().toUpperCase();
+    return q ? courses.filter((c) => c.includes(q)) : courses;
+  }, [courses, search]);
+
+  const normalizedCourse = courseCode?.toUpperCase();
+  const [analytics, setAnalytics] = useState<Analytics | null>(null);
+  const [loading, setLoading] = useState(false);
+  const [error, setError] = useState<string | null>(null);
+
+  useEffect(() => {
+    if (!normalizedCourse) return;
+    let cancelled = false;
+    setLoading(true);
+    setAnalytics(null);
+    setError(null);
+    getCourseAnalytics(normalizedCourse)
+      .then((data) => { if (!cancelled) { setAnalytics(data); setLoading(false); } })
+      .catch((err) => { if (!cancelled) { setError(err instanceof Error ? err.message : "Failed"); setLoading(false); } });
+    return () => { cancelled = true; };
+  }, [normalizedCourse]);
+
+  // ── Course picker ──
+  if (!normalizedCourse) {
+    return (
+      <div className="min-h-screen bg-gray-50">
+        <Header />
+        <main className="max-w-2xl mx-auto px-6 py-12">
+          <h1 className="text-2xl font-bold text-gray-900 mb-1">Analytics</h1>
+          <p className="text-sm text-gray-500 mb-6">Select a course to view statistics.</p>
+          <input
+            type="text"
+            placeholder="Search course code..."
+            value={search}
+            onChange={(e) => setSearch(e.target.value)}
+            className="w-full px-4 py-2.5 border border-gray-300 rounded-xl text-sm focus:outline-none focus:ring-2 focus:ring-blue-500 mb-4"
+          />
+          {filtered.length === 0 ? (
+            <p className="text-sm text-gray-400">No courses found.</p>
+          ) : (
+            <div className="grid grid-cols-2 gap-3">
+              {filtered.map((code) => (
+                <button key={code} onClick={() => navigate(`/analytics/${code}`)}
+                  className="text-left px-4 py-3 bg-white border border-gray-200 rounded-xl hover:border-blue-400 hover:bg-blue-50 transition-colors">
+                  <span className="font-semibold text-gray-900">{code}</span>
+                </button>
+              ))}
+            </div>
+          )}
+        </main>
+      </div>
+    );
+  }
+
+  // ── Dashboard ──
+  return (
+    <div className="min-h-screen bg-gray-50">
+      <Header />
+      <main className="max-w-7xl mx-auto px-6 py-8">
+        <div className="mb-6 flex items-center gap-3">
+          <button onClick={() => navigate("/analytics")} className="text-sm text-gray-400 hover:text-gray-600">← All courses</button>
+          <span className="text-gray-300">/</span>
+          <h1 className="text-2xl font-bold text-gray-900">{normalizedCourse}</h1>
+        </div>
+
+        {loading && <div className="text-sm text-gray-400">Loading analytics...</div>}
+        {error && <div className="text-sm text-red-600">{error}</div>}
+
+        {!loading && !error && analytics && (
+          <>
+            {/* KPI row */}
+            <section className="grid grid-cols-4 gap-4 mb-6">
+              <KpiCard label="Papers" value={analytics.kpi.papers} />
+              <KpiCard label="Questions" value={analytics.kpi.questions} />
+              <KpiCard label="Topics" value={analytics.kpi.topics} />
+              <KpiCard label="Avg Difficulty" value={analytics.kpi.difficulty} />
+            </section>
+
+            {/* Main area: left = search, right = charts */}
+            <section className="grid grid-cols-[5fr_2fr] gap-6">
+              {/* Left: Global search */}
+              <GlobalSearch questions={analytics.all_questions} topics={analytics.topic_frequency.map((t) => t.label)} />
+
+              {/* Right: Interactive charts + stats */}
+              <div className="space-y-5">
+                <InteractiveChart
+                  topicData={analytics.topic_frequency.slice(0, 8).map((t) => ({ label: t.label, value: t.count }))}
+                  typeData={analytics.question_types.map((t) => ({ label: typeLabel[t.label] ?? t.label, value: t.count }))}
+                  diffData={[
+                    { label: "Easy", value: analytics.difficulty_distribution.easy },
+                    { label: "Medium", value: analytics.difficulty_distribution.medium },
+                    { label: "Hard", value: analytics.difficulty_distribution.hard },
+                  ].filter((d) => d.value > 0)}
+                />
+
+                <Panel title="High Yield Topics">
+                  {analytics.high_yield_topics.length === 0 ? (
+                    <div className="text-sm text-gray-400">No data yet.</div>
+                  ) : (
+                    <ul className="space-y-2">
+                      {analytics.high_yield_topics.map((t, i) => (
+                        <li key={t} className="flex items-center gap-3 text-sm text-gray-700">
+                          <span className="w-6 h-6 rounded-full bg-red-50 text-red-600 flex items-center justify-center text-xs font-semibold">{i + 1}</span>
+                          <span>{t}</span>
+                        </li>
+                      ))}
+                    </ul>
+                  )}
+                </Panel>
+              </div>
+            </section>
+          </>
+        )}
+      </main>
+    </div>
+  );
+}
+
+// ── Global Search Engine ──
+function GlobalSearch({ questions, topics }: { questions: QItem[]; topics: string[] }) {
+  const [search, setSearch] = useState("");
+  const [topicFilter, setTopicFilter] = useState<string | null>(null);
+  const [typeFilter, setTypeFilter] = useState<string | null>(null);
+  const [yearFilter, setYearFilter] = useState<number | null>(null);
+  const [termFilter, setTermFilter] = useState<string | null>(null);
+  const [diffFilter, setDiffFilter] = useState<string | null>(null);
+  const [visibleCount, setVisibleCount] = useState(PAGE_SIZE);
+
+  const types = useMemo(() => [...new Set(questions.map((q) => q.question_type))].sort(), [questions]);
+  const years = useMemo(() => [...new Set(questions.map((q) => q.year).filter(Boolean))].sort((a, b) => (b ?? 0) - (a ?? 0)) as number[], [questions]);
+  const terms = useMemo(() => {
+    const order = ["spring", "summer", "fall", "winter"];
+    return [...new Set(questions.map((q) => q.term).filter(Boolean))].sort((a, b) => order.indexOf(a!) - order.indexOf(b!)) as string[];
+  }, [questions]);
+  const diffs = useMemo(() => [...new Set(questions.map((q) => q.difficulty).filter(Boolean))] as string[], [questions]);
+
+  const filtered = useMemo(() => {
+    const q = search.toLowerCase();
+    return questions.filter((item) => {
+      if (topicFilter && !item.topics?.includes(topicFilter)) return false;
+      if (typeFilter && item.question_type !== typeFilter) return false;
+      if (yearFilter && item.year !== yearFilter) return false;
+      if (termFilter && item.term !== termFilter) return false;
+      if (diffFilter && item.difficulty !== diffFilter) return false;
+      if (q && !item.preview.toLowerCase().includes(q) && !item.source.toLowerCase().includes(q) && !item.question_number.toLowerCase().includes(q) && !item.topics?.some((t) => t.toLowerCase().includes(q))) return false;
+      return true;
+    });
+  }, [questions, search, topicFilter, typeFilter, yearFilter, termFilter, diffFilter]);
+
+  const activeCount = [topicFilter, typeFilter, yearFilter, termFilter, diffFilter].filter(Boolean).length;
+
+  useEffect(() => setVisibleCount(PAGE_SIZE), [search, topicFilter, typeFilter, yearFilter, termFilter, diffFilter]);
+
+  const visible = filtered.slice(0, visibleCount);
+  const hasMore = visibleCount < filtered.length;
+
+  return (
+    <div className="bg-white border border-gray-200 rounded-2xl p-6">
+      <h2 className="text-sm font-semibold text-gray-500 uppercase tracking-wide mb-4">Question Search</h2>
+
+      {/* Search bar */}
+      <div className="relative mb-3">
+        <input
+          type="text"
+          value={search}
+          onChange={(e) => setSearch(e.target.value)}
+          placeholder="Search questions, topics, papers..."
+          className="w-full pl-9 pr-3 py-2.5 text-sm border border-gray-200 rounded-xl bg-gray-50 focus:bg-white focus:outline-none focus:ring-2 focus:ring-blue-400"
+        />
+        <span className="absolute left-3 top-1/2 -translate-y-1/2 text-gray-400">🔍</span>
+      </div>
+
+      {/* Filter rows */}
+      <div className="space-y-2 mb-3">
+        {/* Topic */}
+        <FilterRow label="Topic">
+          <TopicCombobox topics={topics} value={topicFilter} onChange={setTopicFilter} />
+        </FilterRow>
+
+        {/* Type + Year + Term + Difficulty in one row */}
+        <div className="flex items-center gap-3 flex-wrap">
+          <FilterRow label="Type">
+            <div className="flex gap-1 flex-wrap">
+              {types.map((t) => (
+                <Pill key={t} label={typeLabel[t] ?? t} active={typeFilter === t}
+                  color={TYPE_COLORS[t]} onClick={() => setTypeFilter(typeFilter === t ? null : t)} />
+              ))}
+            </div>
+          </FilterRow>
+
+          <FilterRow label="Year">
+            <div className="flex gap-1 flex-wrap">
+              {years.map((y) => (
+                <Pill key={y} label={String(y)} active={yearFilter === y}
+                  onClick={() => setYearFilter(yearFilter === y ? null : y)} />
+              ))}
+            </div>
+          </FilterRow>
+
+          <FilterRow label="Term">
+            <div className="flex gap-1 flex-wrap">
+              {terms.map((t) => (
+                <Pill key={t} label={t.charAt(0).toUpperCase() + t.slice(1)} active={termFilter === t}
+                  onClick={() => setTermFilter(termFilter === t ? null : t)} />
+              ))}
+            </div>
+          </FilterRow>
+
+          <FilterRow label="Diff">
+            <div className="flex gap-1">
+              {(["easy", "medium", "hard"] as const).filter((d) => diffs.includes(d)).map((d) => (
+                <Pill key={d} label={d.charAt(0).toUpperCase() + d.slice(1)} active={diffFilter === d}
+                  color={DIFF_COLORS[d]} onClick={() => setDiffFilter(diffFilter === d ? null : d)} />
+              ))}
+            </div>
+          </FilterRow>
+        </div>
+      </div>
+
+      {/* Results count + clear */}
+      <div className="flex items-center justify-between mb-3 pb-3 border-b border-gray-100">
+        <span className="text-xs text-gray-400">
+          {filtered.length} question{filtered.length !== 1 ? "s" : ""}
+          {activeCount > 0 || search ? " matched" : ""}
+        </span>
+        {(activeCount > 0 || search) && (
+          <button onClick={() => { setTopicFilter(null); setTypeFilter(null); setYearFilter(null); setTermFilter(null); setDiffFilter(null); setSearch(""); }}
+            className="text-xs text-blue-500 hover:text-blue-700">Clear all</button>
+        )}
+      </div>
+
+      {/* Results */}
+      <div className="space-y-2">
+        {visible.map((q, i) => (
+          <QuestionCard key={`${q.paper_id}-${q.question_number}-${i}`} question={q} />
+        ))}
+      </div>
+
+      {hasMore && (
+        <button onClick={() => setVisibleCount((v) => v + PAGE_SIZE)}
+          className="w-full mt-3 py-2 text-xs text-blue-600 hover:text-blue-700 bg-blue-50 rounded-xl font-medium">
+          Show more ({filtered.length - visibleCount} remaining)
+        </button>
+      )}
+      {filtered.length === 0 && (
+        <div className="text-center py-6 text-sm text-gray-400">No questions match your search.</div>
+      )}
+    </div>
+  );
+}
+
+// ── Interactive Pie Chart ──
+const PIE_PALETTE = [
+  "#3B82F6", "#8B5CF6", "#F59E0B", "#10B981", "#EF4444",
+  "#EC4899", "#06B6D4", "#F97316", "#6366F1", "#14B8A6",
+];
+
+function InteractiveChart({ topicData, typeData, diffData }: {
+  topicData: { label: string; value: number }[];
+  typeData: { label: string; value: number }[];
+  diffData: { label: string; value: number }[];
+}) {
+  const [view, setView] = useState<"topic" | "type" | "difficulty">("topic");
+  const [hovered, setHovered] = useState<number | null>(null);
+
+  const data = view === "topic" ? topicData : view === "type" ? typeData : diffData;
+  const colors = view === "difficulty"
+    ? ["#10B981", "#F59E0B", "#EF4444"]
+    : PIE_PALETTE;
+
+  const total = data.reduce((s, d) => s + d.value, 0);
+
+  // Build conic-gradient
+  let cumPct = 0;
+  const segments = data.map((d, i) => {
+    const pct = total ? (d.value / total) * 100 : 0;
+    const start = cumPct;
+    cumPct += pct;
+    return { ...d, pct, start, end: cumPct, color: colors[i % colors.length] };
+  });
+
+  const gradient = segments
+    .map((s) => `${s.color} ${s.start}% ${s.end}%`)
+    .join(", ");
+
+  return (
+    <section className="bg-white border border-gray-200 rounded-2xl p-5">
+      {/* Tab switcher */}
+      <div className="flex gap-1 mb-4">
+        {(["topic", "type", "difficulty"] as const).map((t) => (
+          <button key={t} onClick={() => { setView(t); setHovered(null); }}
+            className={`text-xs px-3 py-1.5 rounded-lg font-medium transition-colors ${
+              view === t ? "bg-gray-900 text-white" : "bg-gray-100 text-gray-500 hover:text-gray-700"
+            }`}>
+            {t === "topic" ? "Topics" : t === "type" ? "Types" : "Difficulty"}
+          </button>
+        ))}
+      </div>
+
+      {/* Pie */}
+      <div className="flex items-center gap-4">
+        <div className="relative w-36 h-36 shrink-0">
+          <div
+            className="w-full h-full rounded-full"
+            style={{ background: `conic-gradient(${gradient})` }}
+          />
+          <div className="absolute inset-3 bg-white rounded-full flex items-center justify-center">
+            {hovered !== null ? (
+              <div className="text-center">
+                <div className="text-lg font-bold text-gray-900">{segments[hovered].value}</div>
+                <div className="text-[9px] text-gray-400">{segments[hovered].pct.toFixed(0)}%</div>
+              </div>
+            ) : (
+              <div className="text-center">
+                <div className="text-lg font-bold text-gray-900">{total}</div>
+                <div className="text-[9px] text-gray-400">total</div>
+              </div>
+            )}
+          </div>
+        </div>
+
+        {/* Legend */}
+        <div className="flex-1 space-y-1 max-h-36 overflow-y-auto">
+          {segments.map((s, i) => (
+            <div
+              key={s.label}
+              onMouseEnter={() => setHovered(i)}
+              onMouseLeave={() => setHovered(null)}
+              className={`flex items-center gap-2 px-2 py-1 rounded-lg cursor-default transition-colors ${
+                hovered === i ? "bg-gray-50" : ""
+              }`}
+            >
+              <span className="w-2.5 h-2.5 rounded-full shrink-0" style={{ backgroundColor: s.color }} />
+              <span className="text-xs text-gray-700 flex-1 truncate">{s.label}</span>
+              <span className="text-xs text-gray-400 tabular-nums">{s.value}</span>
+            </div>
+          ))}
+        </div>
+      </div>
+    </section>
+  );
+}
+
+// ── Shared components ──
+function QuestionCard({ question: q }: { question: QItem }) {
+  const typeColor = TYPE_COLORS[q.question_type] ?? "bg-gray-50 text-gray-600 border-gray-200";
+  const cleanPreview = (q.preview || "")
+    .replace(/^Problem\s+\d+\s*\[.*?\]\s*/i, "")
+    .replace(/^(True\/False Questions?\s*)?Indicate whether.*?(answer\.\s*)/i, "")
+    .trim();
+
+  return (
+    <Link to={`/paper/${q.paper_id}`}
+      className="flex items-start gap-3 bg-gray-50 border border-gray-200 rounded-xl px-3.5 py-2.5 hover:border-blue-300 hover:bg-white hover:shadow-sm transition-all group">
+      <span className="shrink-0 inline-flex items-center justify-center w-8 h-8 rounded-lg bg-blue-600 text-white text-xs font-bold mt-0.5">
+        {q.question_number}
+      </span>
+      <div className="flex-1 min-w-0">
+        <div className="flex items-center gap-1.5 mb-1 flex-wrap">
+          <span className="text-xs font-medium text-blue-600">{q.source}</span>
+          <span className="text-gray-300">·</span>
+          <span className={`text-[10px] px-1.5 py-0.5 rounded border font-medium ${typeColor}`}>
+            {typeLabel[q.question_type] ?? q.question_type}
+          </span>
+          {q.difficulty && (
+            <>
+              <span className="text-gray-300">·</span>
+              <span className={`text-[10px] px-1.5 py-0.5 rounded border font-medium ${DIFF_COLORS[q.difficulty] ?? ""}`}>
+                {q.difficulty}
+              </span>
+            </>
+          )}
+          {q.topics?.slice(0, 2).map((t) => (
+            <span key={t} className="text-[10px] px-1.5 py-0.5 rounded bg-gray-100 text-gray-500 border border-gray-200">{t}</span>
+          ))}
+        </div>
+        <p className="text-xs text-gray-600 line-clamp-2 leading-relaxed">{cleanPreview || q.preview}</p>
+      </div>
+      <span className="shrink-0 text-gray-300 group-hover:text-blue-500 text-sm pt-1">→</span>
+    </Link>
+  );
+}
+
+function FilterRow({ label, children }: { label: string; children: React.ReactNode }) {
+  return (
+    <div className="flex items-center gap-1.5">
+      <span className="text-[10px] text-gray-400 w-10 shrink-0">{label}</span>
+      {children}
+    </div>
+  );
+}
+
+function Pill({ label, active, color, onClick }: { label: string; active: boolean; color?: string; onClick: () => void }) {
+  return (
+    <button onClick={onClick}
+      className={`text-[10px] px-2 py-1 rounded-full border font-medium transition-colors whitespace-nowrap ${
+        active ? (color ?? "bg-blue-50 text-blue-700 border-blue-200") : "bg-white text-gray-400 border-gray-200 hover:text-gray-600"
+      }`}>
+      {label}
+    </button>
+  );
+}
+
+function KpiCard({ label, value }: { label: string; value: string | number }) {
+  return (
+    <div className="bg-white border border-gray-200 rounded-2xl p-5">
+      <div className="text-2xl font-semibold text-gray-900">{value}</div>
+      <div className="text-xs uppercase tracking-wide text-gray-400 mt-2">{label}</div>
+    </div>
+  );
+}
+
+function Panel({ title, children }: { title: string; children: React.ReactNode }) {
+  return (
+    <section className="bg-white border border-gray-200 rounded-2xl p-5">
+      <h2 className="text-sm font-semibold text-gray-500 uppercase tracking-wide mb-4">{title}</h2>
+      {children}
+    </section>
+  );
+}
+
+function TopicCombobox({ topics, value, onChange }: { topics: string[]; value: string | null; onChange: (v: string | null) => void }) {
+  const [input, setInput] = useState("");
+  const [open, setOpen] = useState(false);
+
+  const filtered = useMemo(() => {
+    const q = input.toLowerCase();
+    return q ? topics.filter((t) => t.toLowerCase().includes(q)) : topics;
+  }, [topics, input]);
+
+  const handleSelect = (t: string | null) => {
+    onChange(t);
+    setInput(t ?? "");
+    setOpen(false);
+  };
+
+  return (
+    <div className="relative">
+      <div className="flex items-center gap-1">
+        <input
+          type="text"
+          value={value ? (input || value) : input}
+          onChange={(e) => { setInput(e.target.value); setOpen(true); if (!e.target.value) onChange(null); }}
+          onFocus={() => setOpen(true)}
+          placeholder="All Topics"
+          className="text-xs border border-gray-200 rounded-lg px-2 py-1.5 bg-white focus:outline-none focus:ring-1 focus:ring-blue-400 w-48"
+        />
+        {value && (
+          <button onClick={() => { onChange(null); setInput(""); }} className="text-gray-400 hover:text-gray-600 text-xs">✕</button>
+        )}
+      </div>
+      {open && filtered.length > 0 && (
+        <div className="absolute z-20 top-full mt-1 w-56 max-h-48 overflow-y-auto bg-white border border-gray-200 rounded-lg shadow-lg">
+          {filtered.map((t) => (
+            <button
+              key={t}
+              onClick={() => handleSelect(t)}
+              className={`w-full text-left px-3 py-1.5 text-xs hover:bg-blue-50 transition-colors ${value === t ? "bg-blue-50 text-blue-700 font-medium" : "text-gray-700"}`}
+            >
+              {t}
+            </button>
+          ))}
+        </div>
+      )}
+      {open && <div className="fixed inset-0 z-10" onClick={() => setOpen(false)} />}
+    </div>
+  );
+}
+
+function DiffStat({ label, value }: { label: string; value: number }) {
+  return (
+    <div className="bg-gray-50 rounded-xl px-3 py-4">
+      <div className="text-xl font-semibold text-gray-900">{value}</div>
+      <div className="text-xs uppercase tracking-wide text-gray-400 mt-1">{label}</div>
+    </div>
+  );
+}
--- a/frontend/src/pages/ErrorBookPage.tsx
+++ b/frontend/src/pages/ErrorBookPage.tsx
@@ -0,0 +1,296 @@
+import { useEffect, useMemo, useState } from "react";
+import { Link } from "react-router-dom";
+
+import Header from "@/components/layout/Header";
+import KaTeXRenderer from "@/components/shared/KaTeXRenderer";
+import { getErrorBook, updateAttempt, getFavoriteVariants, updateVariant } from "@/lib/api";
+import { useAuth } from "@/contexts/AuthContext";
+import type { UserAttempt, QuestionVariant } from "@/types/api";
+
+const typeLabel: Record<string, string> = {
+  mc: "Multiple Choice",
+  true_false: "True / False",
+  fill_blank: "Fill in Blank",
+  long_question: "Long Question",
+  short_answer: "Short Answer",
+  coding: "Coding",
+};
+
+const TYPE_COLORS: Record<string, string> = {
+  mc: "bg-violet-50 text-violet-700",
+  true_false: "bg-amber-50 text-amber-700",
+  fill_blank: "bg-teal-50 text-teal-700",
+  long_question: "bg-sky-50 text-sky-700",
+  short_answer: "bg-rose-50 text-rose-700",
+  coding: "bg-emerald-50 text-emerald-700",
+};
+
+const DIFF_COLORS: Record<string, string> = {
+  easy: "text-green-600",
+  medium: "text-amber-600",
+  hard: "text-red-600",
+};
+
+export default function ErrorBookPage() {
+  const { user } = useAuth();
+  const [entries, setEntries] = useState<UserAttempt[]>([]);
+  const [favoriteVariants, setFavoriteVariants] = useState<QuestionVariant[]>([]);
+  const [loading, setLoading] = useState(true);
+  const [error, setError] = useState<string | null>(null);
+  const [courseFilter, setCourseFilter] = useState<string>("all");
+
+  useEffect(() => {
+    if (!user) { setLoading(false); return; }
+    let cancelled = false;
+    setLoading(true);
+    Promise.all([getErrorBook(), getFavoriteVariants()])
+      .then(([attempts, variants]) => {
+        if (cancelled) return;
+        setEntries(attempts);
+        setFavoriteVariants(variants);
+        setLoading(false);
+      })
+      .catch((err) => {
+        if (cancelled) return;
+        setError(err instanceof Error ? err.message : "Failed to load error book");
+        setLoading(false);
+      });
+    return () => { cancelled = true; };
+  }, [user]);
+
+  const courses = useMemo(
+    () => Array.from(new Set(
+      entries.map((e) => e.paper_questions?.paper?.course_code).filter((v): v is string => Boolean(v)),
+    )).sort(),
+    [entries],
+  );
+
+  const filteredEntries = useMemo(() => {
+    if (courseFilter === "all") return entries;
+    return entries.filter((e) => e.paper_questions?.paper?.course_code === courseFilter);
+  }, [courseFilter, entries]);
+
+  async function handleMarkMastered(attemptId: string) {
+    await updateAttempt(attemptId, { mastered: true });
+    setEntries((prev) => prev.filter((e) => e.id !== attemptId));
+  }
+
+  async function handleRemove(attemptId: string) {
+    await updateAttempt(attemptId, { in_error_book: false });
+    setEntries((prev) => prev.filter((e) => e.id !== attemptId));
+  }
+
+  async function handleUnfavoriteVariant(variantId: string) {
+    await updateVariant(variantId, { favorited: false });
+    setFavoriteVariants((prev) => prev.filter((v) => v.id !== variantId));
+  }
+
+  return (
+    <div className="min-h-screen bg-gray-50">
+      <Header />
+      <main className="max-w-4xl mx-auto px-6 py-8">
+        {/* Header */}
+        <div className="flex items-end justify-between gap-4 mb-6">
+          <div>
+            <h1 className="text-2xl font-bold text-gray-900">Error Book</h1>
+            <p className="text-sm text-gray-500 mt-1">Review your mistakes and track progress.</p>
+          </div>
+          <div className="flex gap-3 text-sm">
+            <StatCard label="To Review" value={filteredEntries.length} color="red" />
+            <StatCard label="Courses" value={courses.length} color="blue" />
+          </div>
+        </div>
+
+        {/* Course filter */}
+        <div className="flex gap-2 mb-6 flex-wrap">
+          <Pill active={courseFilter === "all"} onClick={() => setCourseFilter("all")} label="All" />
+          {courses.map((c) => (
+            <Pill key={c} active={courseFilter === c} onClick={() => setCourseFilter(c)} label={c} />
+          ))}
+        </div>
+
+        {!user && (
+          <div className="bg-white border border-gray-200 rounded-xl p-12 text-center">
+            <div className="text-3xl mb-3">🔒</div>
+            <p className="text-gray-500 mb-4">Sign in to unlock your Error Book</p>
+            <Link to="/login" className="inline-block px-5 py-2 bg-indigo-600 text-white text-sm font-medium rounded-lg hover:bg-indigo-700 transition-colors">
+              Sign in
+            </Link>
+          </div>
+        )}
+        {user && loading && <div className="text-sm text-gray-400">Loading...</div>}
+        {user && error && <div className="text-sm text-red-600">{error}</div>}
+
+        {user && !loading && !error && filteredEntries.length === 0 && favoriteVariants.length === 0 && (
+          <div className="bg-white border border-gray-200 rounded-xl p-12 text-center">
+            <div className="text-3xl mb-3">🎉</div>
+            <p className="text-gray-500">No mistakes yet. Keep practicing!</p>
+          </div>
+        )}
+
+        {/* Saved variants */}
+        {favoriteVariants.length > 0 && (
+          <div className="mb-8">
+            <h2 className="text-xs font-semibold text-gray-400 uppercase tracking-wide mb-3">
+              Saved Variants ({favoriteVariants.length})
+            </h2>
+            <div className="space-y-2">
+              {favoriteVariants.map((v) => (
+                <div key={v.id} className="flex items-center gap-3 bg-white border border-yellow-200 rounded-xl px-4 py-3">
+                  <span className="text-yellow-400">★</span>
+                  <div className="flex-1 min-w-0">
+                    <span className="text-sm font-medium text-gray-700">Variant of Q{v.source_question_number}</span>
+                    <p className="text-xs text-gray-500 truncate">{v.variant_data.question_text?.replace(/<[^>]*>/g, "").slice(0, 100)}</p>
+                  </div>
+                  <button onClick={() => void handleUnfavoriteVariant(v.id)} className="text-xs text-gray-400 hover:text-red-500">Remove</button>
+                </div>
+              ))}
+            </div>
+          </div>
+        )}
+
+        {/* Error entries */}
+        <div className="space-y-4">
+          {filteredEntries.map((entry) => (
+            <ErrorCard
+              key={entry.id}
+              entry={entry}
+              onMastered={() => void handleMarkMastered(entry.id)}
+              onRemove={() => void handleRemove(entry.id)}
+            />
+          ))}
+        </div>
+      </main>
+    </div>
+  );
+}
+
+function ErrorCard({ entry, onMastered, onRemove }: { entry: UserAttempt; onMastered: () => void; onRemove: () => void }) {
+  const [showFeedback, setShowFeedback] = useState(true);
+  const question = entry.paper_questions;
+  if (!question) return null;
+
+  const courseCode = question.paper?.course_code;
+  const paperId = question.paper?.id;
+  const paper = question.paper;
+  const paperInfo = paper ? `${paper.year} ${paper.term} ${paper.exam_type}` : "";
+  const typeColor = TYPE_COLORS[question.question_type] ?? "bg-gray-100 text-gray-600";
+  const diffColor = DIFF_COLORS[question.difficulty ?? ""] ?? "";
+
+  // Clean preview: strip boilerplate
+  const preview = (question.question_text || "")
+    .replace(/^Problem\s+\d+\s*\[.*?\]\s*/i, "")
+    .slice(0, 200);
+
+  return (
+    <article className="bg-white border border-gray-200 rounded-xl overflow-hidden">
+      {/* Header */}
+      <div className="px-5 pt-4 pb-3">
+        <div className="flex items-start justify-between gap-3">
+          <div className="flex items-center gap-2 flex-wrap">
+            <span className="inline-flex items-center justify-center w-9 h-9 rounded-lg bg-red-600 text-white text-sm font-bold">
+              {question.question_number}
+            </span>
+            <div>
+              <div className="flex items-center gap-1.5">
+                <span className={`text-[11px] px-2 py-0.5 rounded-full font-medium ${typeColor}`}>
+                  {typeLabel[question.question_type] ?? question.question_type}
+                </span>
+                {question.difficulty && (
+                  <span className={`text-[11px] font-medium ${diffColor}`}>{question.difficulty}</span>
+                )}
+                {courseCode && (
+                  <Link to={`/analytics/${courseCode}`} className="text-[11px] px-2 py-0.5 rounded-full bg-blue-50 text-blue-700 hover:bg-blue-100">
+                    {courseCode}
+                  </Link>
+                )}
+              </div>
+              <div className="text-[11px] text-gray-400 mt-0.5">
+                {paperId ? <Link to={`/paper/${paperId}`} className="hover:text-blue-600">{paperInfo}</Link> : paperInfo}
+                {" · "}
+                {new Date(entry.created_at).toLocaleDateString("en-CA")}
+              </div>
+            </div>
+          </div>
+
+          {/* Score badge */}
+          {entry.feedback && (
+            <div className="flex items-center gap-1 bg-red-50 border border-red-200 rounded-lg px-2.5 py-1">
+              <span className="text-red-600 text-sm font-bold">✗</span>
+              <span className="text-xs text-red-600 font-medium">Incorrect</span>
+            </div>
+          )}
+        </div>
+
+        {/* Question preview */}
+        <p className="text-sm text-gray-600 mt-3 line-clamp-2">{preview}</p>
+
+        {/* Topics */}
+        {question.topics && question.topics.length > 0 && (
+          <div className="flex gap-1 mt-2 flex-wrap">
+            {question.topics.slice(0, 4).map((t) => (
+              <span key={t} className="text-[10px] px-1.5 py-0.5 rounded bg-gray-100 text-gray-500">{t}</span>
+            ))}
+          </div>
+        )}
+      </div>
+
+      {/* AI Feedback section */}
+      {entry.feedback && (
+        <div className="border-t border-gray-100">
+          <button
+            onClick={() => setShowFeedback((v) => !v)}
+            className="w-full flex items-center justify-between px-5 py-2.5 text-xs font-medium text-blue-700 bg-blue-50/50 hover:bg-blue-50"
+          >
+            <span>AI Feedback</span>
+            <span>{showFeedback ? "▲" : "▼"}</span>
+          </button>
+          {showFeedback && (
+            <div className="px-5 py-4 bg-white">
+              <KaTeXRenderer html={entry.feedback} className="text-sm text-gray-700 leading-relaxed" />
+            </div>
+          )}
+        </div>
+      )}
+
+      {/* Actions */}
+      <div className="border-t border-gray-100 px-5 py-2.5 flex items-center gap-4 bg-gray-50/50">
+        {paperId && (
+          <Link to={`/paper/${paperId}`} className="text-xs font-medium text-blue-600 hover:text-blue-700">
+            Open paper →
+          </Link>
+        )}
+        <button onClick={onMastered} className="text-xs font-medium text-green-600 hover:text-green-700">
+          Mark mastered
+        </button>
+        <button onClick={onRemove} className="text-xs font-medium text-gray-400 hover:text-gray-600">
+          Remove
+        </button>
+      </div>
+    </article>
+  );
+}
+
+function StatCard({ label, value, color }: { label: string; value: number; color: string }) {
+  const bg = color === "red" ? "bg-red-50 border-red-200" : "bg-blue-50 border-blue-200";
+  const text = color === "red" ? "text-red-700" : "text-blue-700";
+  return (
+    <div className={`border rounded-xl px-4 py-2.5 ${bg}`}>
+      <div className={`text-xl font-bold ${text}`}>{value}</div>
+      <div className="text-[10px] uppercase tracking-wide text-gray-400 mt-0.5">{label}</div>
+    </div>
+  );
+}
+
+function Pill({ active, onClick, label }: { active: boolean; onClick: () => void; label: string }) {
+  return (
+    <button
+      onClick={onClick}
+      className={`px-3 py-1.5 text-xs font-medium rounded-full border transition-colors ${
+        active ? "bg-gray-900 text-white border-gray-900" : "bg-white text-gray-600 border-gray-200 hover:border-gray-300"
+      }`}
+    >
+      {label}
+    </button>
+  );
+}
--- a/frontend/src/pages/HomePage.tsx
+++ b/frontend/src/pages/HomePage.tsx
@@ -0,0 +1,705 @@
+import { useEffect, useRef, useState } from "react";
+import { Link, useNavigate } from "react-router-dom";
+import { listPapers, myPapers } from "@/lib/api";
+import { useAuth } from "@/contexts/AuthContext";
+import type { Paper } from "@/types/api";
+
+function getWorkedIds(userId: string): string[] {
+  try {
+    const raw = localStorage.getItem(`worked_papers_${userId}`);
+    return raw ? JSON.parse(raw) : [];
+  } catch { return []; }
+}
+
+const fontSora = { fontFamily: "'Sora', sans-serif" };
+const fontMono = { fontFamily: "'IBM Plex Mono', monospace" };
+
+/* ── Feature cards data ── */
+const FEATURES = [
+  {
+    icon: (
+      <svg className="w-6 h-6" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={1.5}>
+        <path strokeLinecap="round" strokeLinejoin="round" d="M9.813 15.904L9 18.75l-.813-2.846a4.5 4.5 0 00-3.09-3.09L2.25 12l2.846-.813a4.5 4.5 0 003.09-3.09L9 5.25l.813 2.846a4.5 4.5 0 003.09 3.09L15.75 12l-2.846.813a4.5 4.5 0 00-3.09 3.09zM18.259 8.715L18 9.75l-.259-1.035a3.375 3.375 0 00-2.455-2.456L14.25 6l1.036-.259a3.375 3.375 0 002.455-2.456L18 2.25l.259 1.035a3.375 3.375 0 002.455 2.456L21.75 6l-1.036.259a3.375 3.375 0 00-2.455 2.456z" />
+      </svg>
+    ),
+    title: "AI Analysis",
+    desc: "Every question gets knowledge reminders, hints, and step-by-step solutions.",
+    color: "#6366F1",
+  },
+  {
+    icon: (
+      <svg className="w-6 h-6" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={1.5}>
+        <path strokeLinecap="round" strokeLinejoin="round" d="M12 6.042A8.967 8.967 0 006 3.75c-1.052 0-2.062.18-3 .512v14.25A8.987 8.987 0 016 18c2.305 0 4.408.867 6 2.292m0-14.25a8.966 8.966 0 016-2.292c1.052 0 2.062.18 3 .512v14.25A8.987 8.987 0 0018 18a8.967 8.967 0 00-6 2.292m0-14.25v14.25" />
+      </svg>
+    ),
+    title: "Smart Error Book",
+    desc: "Auto-collect mistakes with AI feedback. Review, understand, and master.",
+    color: "#E11D48",
+  },
+  {
+    icon: (
+      <svg className="w-6 h-6" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={1.5}>
+        <path strokeLinecap="round" strokeLinejoin="round" d="M3 13.125C3 12.504 3.504 12 4.125 12h2.25c.621 0 1.125.504 1.125 1.125v6.75C7.5 20.496 6.996 21 6.375 21h-2.25A1.125 1.125 0 013 19.875v-6.75zM9.75 8.625c0-.621.504-1.125 1.125-1.125h2.25c.621 0 1.125.504 1.125 1.125v11.25c0 .621-.504 1.125-1.125 1.125h-2.25a1.125 1.125 0 01-1.125-1.125V8.625zM16.5 4.125c0-.621.504-1.125 1.125-1.125h2.25C20.496 3 21 3.504 21 4.125v15.75c0 .621-.504 1.125-1.125 1.125h-2.25a1.125 1.125 0 01-1.125-1.125V4.125z" />
+      </svg>
+    ),
+    title: "Course Analytics",
+    desc: "Topic frequency, difficulty distribution, and high-yield focus areas.",
+    color: "#0D9488",
+  },
+  {
+    icon: (
+      <svg className="w-6 h-6" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={1.5}>
+        <path strokeLinecap="round" strokeLinejoin="round" d="M19.5 12c0-1.232-.046-2.453-.138-3.662a4.006 4.006 0 00-3.7-3.7 48.678 48.678 0 00-7.324 0 4.006 4.006 0 00-3.7 3.7c-.017.22-.032.441-.046.662M19.5 12l3-3m-3 3l-3-3m-12 3c0 1.232.046 2.453.138 3.662a4.006 4.006 0 003.7 3.7 48.656 48.656 0 007.324 0 4.006 4.006 0 003.7-3.7c.017-.22.032-.441.046-.662M4.5 12l3 3m-3-3l-3 3" />
+      </svg>
+    ),
+    title: "Variant Generation",
+    desc: "Generate unlimited similar questions for extra practice on weak topics.",
+    color: "#7C3AED",
+  },
+];
+
+/* ── Filter options ── */
+const COURSE_OPTIONS = ["COMP2011", "COMP2211", "MATH1014", "PHYS1112", "MATH2023", "ELEC2100"];
+const TERM_OPTIONS = ["spring", "fall"];
+const TYPE_OPTIONS = ["midterm", "final"];
+
+/* ── Chevron SVG ── */
+function ChevronDown({ className = "" }: { className?: string }) {
+  return (
+    <svg className={className} fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2.5}>
+      <path strokeLinecap="round" strokeLinejoin="round" d="M19 9l-7 7-7-7" />
+    </svg>
+  );
+}
+
+/* ── Dropdown select component ── */
+function Dropdown({
+  label,
+  value,
+  options,
+  onChange,
+}: {
+  label: string;
+  value: string | null;
+  options: { value: string; label: string }[];
+  onChange: (v: string | null) => void;
+}) {
+  const [open, setOpen] = useState(false);
+  const ref = useRef<HTMLDivElement>(null);
+
+  useEffect(() => {
+    const handler = (e: MouseEvent) => {
+      if (ref.current && !ref.current.contains(e.target as Node)) setOpen(false);
+    };
+    document.addEventListener("mousedown", handler);
+    return () => document.removeEventListener("mousedown", handler);
+  }, []);
+
+  const selected = options.find((o) => o.value === value);
+
+  return (
+    <div ref={ref} className="relative" style={{ minWidth: 150 }}>
+      <div className="text-[11px] font-semibold text-indigo-300 uppercase tracking-wider mb-1.5" style={fontSora}>
+        {label}
+      </div>
+      <button
+        onClick={() => setOpen(!open)}
+        className="w-full flex items-center justify-between bg-white px-3.5 py-2.5 text-sm cursor-pointer whitespace-nowrap"
+        style={{ borderRadius: 0, ...fontMono }}
+      >
+        <span className={`${selected ? "text-slate-800 font-semibold" : "text-slate-400"} mr-2`}>
+          {selected ? selected.label : `All ${label}s`}
+        </span>
+        <ChevronDown className={`w-4 h-4 text-slate-400 transition-transform ${open ? "rotate-180" : ""}`} />
+      </button>
+      {open && (
+        <div
+          className="absolute top-full left-0 right-0 mt-1 bg-white shadow-lg z-50 overflow-hidden"
+          style={{ borderRadius: 0, border: "1px solid #E2E8F0" }}
+        >
+          <button
+            onClick={() => { onChange(null); setOpen(false); }}
+            className={`w-full text-left px-3.5 py-2 text-sm hover:bg-indigo-50 transition-colors ${
+              !value ? "text-indigo-600 font-semibold bg-indigo-50/50" : "text-slate-500"
+            }`}
+            style={fontMono}
+          >
+            All {label}s
+          </button>
+          {options.map((o) => (
+            <button
+              key={o.value}
+              onClick={() => { onChange(o.value); setOpen(false); }}
+              className={`w-full text-left px-3.5 py-2 text-sm hover:bg-indigo-50 transition-colors ${
+                value === o.value ? "text-indigo-600 font-semibold bg-indigo-50/50" : "text-slate-600"
+              }`}
+              style={fontMono}
+            >
+              {o.label}
+            </button>
+          ))}
+        </div>
+      )}
+    </div>
+  );
+}
+
+export default function HomePage() {
+  const navigate = useNavigate();
+  const { user, signOut } = useAuth();
+  const [papers, setPapers] = useState<Paper[]>([]);
+  const [papersLoading, setPapersLoading] = useState(false);
+  const [myUploadedPapers, setMyUploadedPapers] = useState<Paper[]>([]);
+  const [workedPapers, setWorkedPapers] = useState<Paper[]>([]);
+  const [courseInput, setCourseInput] = useState("");
+  const [courseFilter, setCourseFilter] = useState<string | null>(null);
+  const [showSuggestions, setShowSuggestions] = useState(false);
+  const [termFilter, setTermFilter] = useState<string | null>(null);
+  const [typeFilter, setTypeFilter] = useState<string | null>(null);
+  const [analyzing, setAnalyzing] = useState(false);
+  const inputRef = useRef<HTMLDivElement>(null);
+
+  // Autocomplete suggestions
+  const suggestions = courseInput.trim()
+    ? COURSE_OPTIONS.filter((c) =>
+        c.toLowerCase().includes(courseInput.trim().toLowerCase())
+      )
+    : [];
+
+  // Close suggestions on outside click
+  useEffect(() => {
+    const handler = (e: MouseEvent) => {
+      if (inputRef.current && !inputRef.current.contains(e.target as Node)) setShowSuggestions(false);
+    };
+    document.addEventListener("mousedown", handler);
+    return () => document.removeEventListener("mousedown", handler);
+  }, []);
+
+  useEffect(() => {
+    let cancelled = false;
+    setPapersLoading(true);
+    listPapers()
+      .then((data) => {
+        if (cancelled) return;
+        setPapers(
+          data.sort((a, b) => {
+            if (a.course_code !== b.course_code) return a.course_code.localeCompare(b.course_code);
+            if (a.year !== b.year) return b.year - a.year;
+            if (a.term !== b.term) return a.term.localeCompare(b.term);
+            return a.exam_type.localeCompare(b.exam_type);
+          }),
+        );
+      })
+      .catch(() => {
+        if (!cancelled) setPapers([]);
+      })
+      .finally(() => {
+        if (!cancelled) setPapersLoading(false);
+      });
+
+    return () => {
+      cancelled = true;
+    };
+  }, []);
+
+  // My Papers
+  useEffect(() => {
+    if (!user) return;
+    let cancelled = false;
+    myPapers().then((data) => {
+      if (cancelled) return;
+      setMyUploadedPapers(data.filter((p) => p.status !== "error"));
+    }).catch(() => {});
+    return () => { cancelled = true; };
+  }, [user]);
+
+  useEffect(() => {
+    if (!user || papers.length === 0) return;
+    const workedIds = new Set(getWorkedIds(user.id));
+    setWorkedPapers(papers.filter((p) => workedIds.has(p.id)));
+  }, [user, papers]);
+
+  // Filter papers
+  const hasFilter = courseFilter || termFilter || typeFilter;
+  const filteredPapers = papers.filter((p) => {
+    if (courseFilter && p.course_code !== courseFilter) return false;
+    if (termFilter && p.term !== termFilter) return false;
+    if (typeFilter && p.exam_type !== typeFilter) return false;
+    return true;
+  });
+
+  const selectCourse = (code: string) => {
+    setCourseInput(code);
+    setCourseFilter(code);
+    setShowSuggestions(false);
+  };
+
+  return (
+    <div className="min-h-screen" style={{ background: "#FAFAFA" }}>
+      {/* ══════ Nav ══════ */}
+      <nav className="bg-white border-b border-slate-200">
+        <div className="max-w-[1200px] mx-auto px-6 h-14 flex items-center justify-between">
+          <div className="flex items-center gap-2">
+            <div
+              className="w-8 h-8 flex items-center justify-center text-white text-sm font-bold"
+              style={{ background: "#6366F1", borderRadius: 0 }}
+            >
+              PM
+            </div>
+            <span className="text-lg font-bold text-slate-800" style={fontSora}>
+              PastPaper Master
+            </span>
+          </div>
+          <div className="flex items-center gap-5 text-sm" style={fontSora}>
+            <Link to="/" className="text-indigo-600 font-semibold">
+              Home
+            </Link>
+            <Link to="/analytics" className="text-slate-500 hover:text-slate-800 transition-colors">
+              Analytics
+            </Link>
+            <Link to="/error-book" className="text-slate-500 hover:text-slate-800 transition-colors">
+              Error Book
+            </Link>
+            <Link
+              to="/upload"
+              className="px-4 py-1.5 text-white text-xs font-semibold"
+              style={{ background: "#6366F1", borderRadius: 0 }}
+            >
+              Upload Paper
+            </Link>
+            {user ? (
+              <div className="flex items-center gap-3 pl-3 border-l border-slate-200">
+                <span className="text-xs text-slate-400 max-w-[140px] truncate" style={fontMono}>{user.email}</span>
+                <button
+                  onClick={() => void signOut()}
+                  className="text-xs text-slate-400 hover:text-red-500 transition-colors"
+                >
+                  Sign out
+                </button>
+              </div>
+            ) : (
+              <Link
+                to="/login"
+                className="text-sm text-indigo-600 font-semibold pl-3 border-l border-slate-200 hover:text-indigo-800 transition-colors"
+              >
+                Sign in
+              </Link>
+            )}
+          </div>
+        </div>
+      </nav>
+
+      {/* ══════ Hero + Filter ══════ */}
+      <section
+        className="relative overflow-hidden"
+        style={{ background: "linear-gradient(135deg, #1E1B4B 0%, #312E81 50%, #4338CA 100%)" }}
+      >
+        <div className="max-w-[1200px] mx-auto px-6 pt-16 pb-10 text-center relative z-10">
+          <h1
+            className="text-4xl font-bold text-white mb-4 leading-tight"
+            style={fontSora}
+          >
+            The Smartest Way to<br />
+            <span style={{ color: "#A5B4FC" }}>Master Past Papers</span>
+          </h1>
+          <p className="text-indigo-200 text-base mb-10 max-w-xl mx-auto" style={fontSora}>
+            Upload any HKUST past paper. AI breaks down every question with analysis,
+            hints, and solutions — so you study smarter, not harder.
+          </p>
+
+          {/* ── Filter row: Course input + Term dropdown + Type dropdown ── */}
+          <div className="max-w-[680px] mx-auto">
+            <div className="flex gap-3 items-end">
+              {/* Course code input with autocomplete */}
+              <div ref={inputRef} className="relative flex-1">
+                <div className="text-[11px] font-semibold text-indigo-300 uppercase tracking-wider mb-1.5 text-left" style={fontSora}>
+                  Course Code
+                </div>
+                <div className="flex bg-white" style={{ borderRadius: 0 }}>
+                  <input
+                    type="text"
+                    value={courseInput}
+                    onChange={(e) => {
+                      const v = e.target.value.toUpperCase();
+                      setCourseInput(v);
+                      setCourseFilter(COURSE_OPTIONS.includes(v) ? v : null);
+                      setShowSuggestions(true);
+                    }}
+                    onFocus={() => setShowSuggestions(true)}
+                    placeholder="e.g. COMP2011"
+                    className="flex-1 px-3.5 py-2.5 text-sm text-slate-800 outline-none bg-transparent font-semibold"
+                    style={fontMono}
+                  />
+                  {courseInput && (
+                    <button
+                      onClick={() => { setCourseInput(""); setCourseFilter(null); }}
+                      className="px-2 text-slate-300 hover:text-slate-500 transition-colors"
+                    >
+                      <svg className="w-4 h-4" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
+                        <path strokeLinecap="round" strokeLinejoin="round" d="M6 18L18 6M6 6l12 12" />
+                      </svg>
+                    </button>
+                  )}
+                </div>
+                {/* Autocomplete dropdown */}
+                {showSuggestions && suggestions.length > 0 && !courseFilter && (
+                  <div
+                    className="absolute top-full left-0 right-0 mt-1 bg-white shadow-lg z-50 overflow-hidden"
+                    style={{ borderRadius: 0, border: "1px solid #E2E8F0" }}
+                  >
+                    {suggestions.map((c) => (
+                      <button
+                        key={c}
+                        onClick={() => selectCourse(c)}
+                        className="w-full text-left px-3.5 py-2.5 text-sm text-slate-700 hover:bg-indigo-50 hover:text-indigo-600 transition-colors"
+                        style={fontMono}
+                      >
+                        <span className="font-semibold">{c.slice(0, courseInput.length)}</span>
+                        {c.slice(courseInput.length)}
+                      </button>
+                    ))}
+                  </div>
+                )}
+              </div>
+
+              {/* Term dropdown */}
+              <Dropdown
+                label="Term"
+                value={termFilter}
+                options={[
+                  { value: "spring", label: "Spring" },
+                  { value: "fall", label: "Fall" },
+                ]}
+                onChange={setTermFilter}
+              />
+
+              {/* Exam Type dropdown */}
+              <Dropdown
+                label="Exam Type"
+                value={typeFilter}
+                options={[
+                  { value: "midterm", label: "Midterm" },
+                  { value: "final", label: "Final" },
+                ]}
+                onChange={setTypeFilter}
+              />
+
+              {/* Buttons */}
+              <div className="flex gap-2 items-end">
+                <div>
+                  <div className="mb-1.5" />
+                  <button
+                    className="px-6 py-2.5 text-white text-sm font-semibold shrink-0"
+                    style={{ background: "#6366F1", borderRadius: 0, ...fontSora }}
+                  >
+                    Search
+                  </button>
+                </div>
+                <div>
+                  <div className="mb-1.5" />
+                  <button
+                    onClick={() => {
+                      setAnalyzing(true);
+                      setTimeout(() => {
+                        if (courseFilter) navigate(`/analytics/${courseFilter}`);
+                        else navigate("/analytics");
+                      }, 1200);
+                    }}
+                    disabled={analyzing}
+                    className="px-5 py-2.5 text-sm font-semibold shrink-0 border transition-all flex items-center gap-2"
+                    style={{
+                      borderRadius: 0,
+                      background: analyzing ? "#BE123C" : courseFilter ? "#E11D48" : "transparent",
+                      color: courseFilter || analyzing ? "#fff" : "rgba(165,180,252,0.7)",
+                      borderColor: analyzing ? "#BE123C" : courseFilter ? "#E11D48" : "rgba(165,180,252,0.3)",
+                      ...fontSora,
+                    }}
+                  >
+                    {analyzing && (
+                      <svg className="w-4 h-4 animate-spin" viewBox="0 0 24 24" fill="none">
+                        <circle className="opacity-25" cx="12" cy="12" r="10" stroke="currentColor" strokeWidth="3" />
+                        <path className="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4z" />
+                      </svg>
+                    )}
+                    {analyzing ? "Analyzing..." : "Analyze"}
+                  </button>
+                </div>
+              </div>
+            </div>
+
+            {/* ── Results panel ── */}
+            {hasFilter && (
+              <div
+                className="mt-3 text-left max-h-[300px] overflow-y-auto"
+                style={{ background: "rgba(255,255,255,0.06)", backdropFilter: "blur(8px)", border: "1px solid rgba(255,255,255,0.1)" }}
+              >
+                {papersLoading ? (
+                  <div className="p-6 text-center">
+                    <p className="text-indigo-300 text-sm" style={fontSora}>Loading papers...</p>
+                  </div>
+                ) : filteredPapers.length === 0 ? (
+                  <div className="p-6 text-center">
+                    <p className="text-indigo-300 text-sm" style={fontSora}>No papers match these filters</p>
+                  </div>
+                ) : (
+                  <>
+                    <div className="px-4 pt-3 pb-1 flex items-center justify-between">
+                      <span className="text-[11px] font-semibold text-indigo-400 uppercase tracking-wider" style={fontSora}>
+                        {filteredPapers.length} paper{filteredPapers.length > 1 ? "s" : ""} found
+                      </span>
+                      {courseFilter && (
+                        <Link
+                          to={`/analytics/${courseFilter}`}
+                          className="flex items-center gap-1.5 px-3 py-1 text-[11px] font-bold text-white hover:opacity-90 transition-opacity"
+                          style={{ background: "#6366F1", borderRadius: 0, ...fontMono }}
+                        >
+                          <svg className="w-3 h-3" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
+                            <path strokeLinecap="round" strokeLinejoin="round" d="M3 13.125C3 12.504 3.504 12 4.125 12h2.25c.621 0 1.125.504 1.125 1.125v6.75C7.5 20.496 6.996 21 6.375 21h-2.25A1.125 1.125 0 013 19.875v-6.75zM9.75 8.625c0-.621.504-1.125 1.125-1.125h2.25c.621 0 1.125.504 1.125 1.125v11.25c0 .621-.504 1.125-1.125 1.125h-2.25a1.125 1.125 0 01-1.125-1.125V8.625zM16.5 4.125c0-.621.504-1.125 1.125-1.125h2.25C20.496 3 21 3.504 21 4.125v15.75c0 .621-.504 1.125-1.125 1.125h-2.25a1.125 1.125 0 01-1.125-1.125V4.125z" />
+                          </svg>
+                          AI Analytics · {courseFilter}
+                        </Link>
+                      )}
+                    </div>
+                    {filteredPapers.map((p) => (
+                      <button
+                        key={p.id}
+                        onClick={() => { navigate(`/paper/${p.id}`); }}
+                        className="w-full flex items-center justify-between px-4 py-3 text-left transition-colors hover:bg-white/10 cursor-pointer"
+                        style={{ borderBottom: "1px solid rgba(255,255,255,0.06)" }}
+                      >
+                        <div className="flex items-center gap-3">
+                          <div className="w-8 h-8 flex items-center justify-center shrink-0" style={{ background: "rgba(255,255,255,0.1)" }}>
+                            <svg className="w-4 h-4 text-indigo-300" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={1.5}>
+                              <path strokeLinecap="round" strokeLinejoin="round" d="M19.5 14.25v-2.625a3.375 3.375 0 00-3.375-3.375h-1.5A1.125 1.125 0 0113.5 7.125v-1.5a3.375 3.375 0 00-3.375-3.375H8.25m2.25 0H5.625c-.621 0-1.125.504-1.125 1.125v17.25c0 .621.504 1.125 1.125 1.125h12.75c.621 0 1.125-.504 1.125-1.125V11.25a9 9 0 00-9-9z" />
+                            </svg>
+                          </div>
+                          <div>
+                            <span className="text-sm font-bold text-white" style={fontMono}>{p.course_code}</span>
+                            <span className="text-sm text-indigo-300 capitalize ml-2" style={fontSora}>
+                              {p.year} {p.term} {p.exam_type}
+                            </span>
+                            <div className="flex gap-3 mt-0.5">
+                              {p.question_count != null && (
+                                <span className="text-[11px] text-indigo-400" style={fontMono}>{p.question_count} Qs</span>
+                              )}
+                              {p.difficulty_level && (
+                                <span className="text-[11px] text-indigo-400 capitalize" style={fontMono}>{p.difficulty_level}</span>
+                              )}
+                            </div>
+                          </div>
+                        </div>
+                        <div className="flex items-center gap-2">
+                          <span
+                            className={`px-2 py-0.5 text-[10px] font-bold border ${
+                              p.status === "ready"
+                                ? "text-emerald-400 border-emerald-400/40"
+                                : p.status === "processing"
+                                  ? "text-amber-300 border-amber-300/40"
+                                  : "text-indigo-400/60 border-indigo-400/20"
+                            }`}
+                            style={{ borderRadius: 0, ...fontMono }}
+                          >
+                            {p.status.toUpperCase()}
+                          </span>
+                          <svg className="w-4 h-4 text-indigo-400" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
+                            <path strokeLinecap="round" strokeLinejoin="round" d="M8.25 4.5l7.5 7.5-7.5 7.5" />
+                          </svg>
+                        </div>
+                      </button>
+                    ))}
+                  </>
+                )}
+              </div>
+            )}
+          </div>
+
+          {/* Quick stats — real data */}
+          <div className="flex justify-center gap-8 mt-10">
+            {[
+              [String(papers.filter(p => p.status === "ready").length), "Past Papers"],
+              [String(papers.reduce((s, p) => s + (p.question_count || 0), 0)), "Questions Analyzed"],
+              [String(new Set(papers.filter(p => p.status === "ready").map(p => p.course_code)).size), "Courses"],
+            ].map(([num, label]) => (
+              <div key={label} className="text-center">
+                <div className="text-2xl font-bold text-white" style={fontMono}>{num}</div>
+                <div className="text-xs text-indigo-300" style={fontSora}>{label}</div>
+              </div>
+            ))}
+          </div>
+        </div>
+
+        {/* Decorative grid */}
+        <div
+          className="absolute inset-0 opacity-[0.04]"
+          style={{
+            backgroundImage: "linear-gradient(#fff 1px, transparent 1px), linear-gradient(90deg, #fff 1px, transparent 1px)",
+            backgroundSize: "40px 40px",
+          }}
+        />
+      </section>
+
+      <main className="max-w-[1200px] mx-auto px-6">
+        {/* ══════ Features ══════ */}
+        <section className="py-12">
+          <h2
+            className="text-sm font-semibold text-slate-400 uppercase tracking-wider mb-6"
+            style={fontSora}
+          >
+            Platform Features
+          </h2>
+          <div className="grid grid-cols-4 gap-4">
+            {FEATURES.map((f) => (
+              <div
+                key={f.title}
+                className="bg-white border border-slate-200 p-5 hover:border-slate-300 transition-colors group"
+                style={{ borderRadius: 0 }}
+              >
+                <div
+                  className="w-10 h-10 flex items-center justify-center text-white mb-4"
+                  style={{ background: f.color, borderRadius: 0 }}
+                >
+                  {f.icon}
+                </div>
+                <h3
+                  className="text-sm font-bold text-slate-800 mb-1.5"
+                  style={fontSora}
+                >
+                  {f.title}
+                </h3>
+                <p className="text-xs text-slate-400 leading-relaxed" style={fontSora}>
+                  {f.desc}
+                </p>
+              </div>
+            ))}
+          </div>
+        </section>
+
+        {/* ══════ My Papers ══════ */}
+        {user && (
+          <section className="pb-12">
+            <h2 className="text-sm font-semibold text-slate-400 uppercase tracking-wider mb-6" style={fontSora}>
+              My Papers
+            </h2>
+            {myUploadedPapers.length === 0 && workedPapers.length === 0 ? (
+              <div className="bg-white border border-slate-200 px-6 py-8 text-center" style={{ borderRadius: 0 }}>
+                <p className="text-sm text-slate-400" style={fontSora}>No papers yet. Upload a past paper or open one to get started.</p>
+              </div>
+            ) : (
+            <div className="grid grid-cols-2 gap-6">
+              {/* Uploaded */}
+              {myUploadedPapers.length > 0 && (
+                <div>
+                  <div className="text-xs font-semibold text-slate-500 uppercase tracking-wider mb-3" style={fontSora}>
+                    Uploaded
+                  </div>
+                  <div className="space-y-2">
+                    {myUploadedPapers.map((p) => (
+                      <Link
+                        key={p.id}
+                        to={p.status === "ready" ? `/paper/${p.id}` : "#"}
+                        className="flex items-center justify-between bg-white border border-slate-200 px-4 py-3 hover:border-indigo-300 transition-colors"
+                        style={{ borderRadius: 0 }}
+                      >
+                        <div>
+                          <span className="text-sm font-bold text-slate-800" style={fontMono}>{p.course_code}</span>
+                          <span className="text-sm text-slate-500 capitalize ml-2" style={fontSora}>{p.year} {p.term} {p.exam_type}</span>
+                        </div>
+                        <span className={`text-[10px] font-bold px-2 py-0.5 border ${
+                          p.status === "ready" ? "text-emerald-600 border-emerald-300 bg-emerald-50"
+                          : p.status === "processing" ? "text-amber-600 border-amber-300 bg-amber-50"
+                          : "text-slate-400 border-slate-200"
+                        }`} style={{ borderRadius: 0, ...fontMono }}>
+                          {p.status === "processing" ? (
+                            <span className="flex items-center gap-1">
+                              <span className="w-2 h-2 border border-amber-500 border-t-transparent rounded-full animate-spin inline-block" />
+                              PROCESSING
+                            </span>
+                          ) : p.status.toUpperCase()}
+                        </span>
+                      </Link>
+                    ))}
+                  </div>
+                </div>
+              )}
+
+              {/* Worked on */}
+              {workedPapers.length > 0 && (
+                <div>
+                  <div className="text-xs font-semibold text-slate-500 uppercase tracking-wider mb-3" style={fontSora}>
+                    Recently Worked
+                  </div>
+                  <div className="space-y-2">
+                    {workedPapers.map((p) => (
+                      <Link
+                        key={p.id}
+                        to={`/paper/${p.id}`}
+                        className="flex items-center justify-between bg-white border border-slate-200 px-4 py-3 hover:border-indigo-300 transition-colors"
+                        style={{ borderRadius: 0 }}
+                      >
+                        <div>
+                          <span className="text-sm font-bold text-slate-800" style={fontMono}>{p.course_code}</span>
+                          <span className="text-sm text-slate-500 capitalize ml-2" style={fontSora}>{p.year} {p.term} {p.exam_type}</span>
+                        </div>
+                        <svg className="w-4 h-4 text-slate-300" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
+                          <path strokeLinecap="round" strokeLinejoin="round" d="M8.25 4.5l7.5 7.5-7.5 7.5" />
+                        </svg>
+                      </Link>
+                    ))}
+                  </div>
+                </div>
+              )}
+            </div>
+            )}
+          </section>
+        )}
+
+        {/* ══════ CTA Banner ══════ */}
+        <section className="pb-16">
+          <div
+            className="p-8 flex items-center justify-between"
+            style={{ background: "linear-gradient(135deg, #1E1B4B, #312E81)", borderRadius: 0 }}
+          >
+            <div>
+              <h3 className="text-lg font-bold text-white mb-1" style={fontSora}>
+                Ready to ace your exams?
+              </h3>
+              <p className="text-sm text-indigo-300" style={fontSora}>
+                Upload a past paper and let AI do the heavy lifting.
+              </p>
+            </div>
+            <div className="flex gap-3">
+              <Link
+                to="/upload"
+                className="px-5 py-2.5 text-sm font-semibold text-white"
+                style={{ background: "#6366F1", borderRadius: 0, ...fontSora }}
+              >
+                Upload Paper
+              </Link>
+              <Link
+                to="/analytics"
+                className="px-5 py-2.5 text-sm font-semibold text-indigo-200 border border-indigo-400 hover:bg-indigo-900/30 transition-colors"
+                style={{ borderRadius: 0, ...fontSora }}
+              >
+                View Analytics
+              </Link>
+            </div>
+          </div>
+        </section>
+      </main>
+
+      {/* ══════ Footer ══════ */}
+      <footer className="border-t border-slate-200 bg-white">
+        <div className="max-w-[1200px] mx-auto px-6 py-6 flex items-center justify-between">
+          <span className="text-xs text-slate-400" style={fontSora}>
+            PastPaper Master &middot; HKUST &middot; 2025
+          </span>
+          <div className="flex gap-4 text-xs text-slate-400" style={fontSora}>
+            <span>About</span>
+            <span>Contact</span>
+            <span>Privacy</span>
+          </div>
+        </div>
+      </footer>
+    </div>
+  );
+}
--- a/frontend/src/pages/LoginPage.tsx
+++ b/frontend/src/pages/LoginPage.tsx
@@ -0,0 +1,90 @@
+import { useState } from "react";
+import { supabase } from "@/lib/supabase";
+
+export default function LoginPage() {
+  const [email, setEmail] = useState("");
+  const [password, setPassword] = useState("");
+  const [mode, setMode] = useState<"signin" | "signup">("signin");
+  const [error, setError] = useState<string | null>(null);
+  const [loading, setLoading] = useState(false);
+  const handleSubmit = async (e: React.FormEvent) => {
+    e.preventDefault();
+    setError(null);
+    setLoading(true);
+    try {
+      if (mode === "signin") {
+        const { error } = await supabase.auth.signInWithPassword({ email, password });
+        if (error) throw error;
+      } else {
+        const { error } = await supabase.auth.signUp({ email, password });
+        if (error) throw error;
+        // Auto sign in after signup (requires email confirm disabled in Supabase dashboard)
+        const { error: signInError } = await supabase.auth.signInWithPassword({ email, password });
+        if (signInError) throw signInError;
+      }
+    } catch (err: unknown) {
+      setError(err instanceof Error ? err.message : "Something went wrong");
+    } finally {
+      setLoading(false);
+    }
+  };
+
+  return (
+    <div className="min-h-screen bg-gray-50 flex items-center justify-center">
+      <div className="bg-white rounded-2xl shadow-sm border border-gray-200 p-8 w-full max-w-sm">
+        <div className="mb-6">
+          <h1 className="text-xl font-bold text-gray-900">PastPaper Master</h1>
+          <p className="text-sm text-gray-500 mt-1">{mode === "signin" ? "Sign in to continue" : "Create your account"}</p>
+        </div>
+
+        <form onSubmit={handleSubmit} className="space-y-4">
+          <div>
+            <label className="block text-xs font-medium text-gray-700 mb-1">Email</label>
+            <input
+              type="email"
+              value={email}
+              onChange={(e) => setEmail(e.target.value)}
+              required
+              className="w-full px-3 py-2 border border-gray-300 rounded-lg text-sm focus:outline-none focus:ring-2 focus:ring-blue-500 focus:border-transparent"
+              placeholder="you@example.com"
+            />
+          </div>
+          <div>
+            <label className="block text-xs font-medium text-gray-700 mb-1">Password</label>
+            <input
+              type="password"
+              value={password}
+              onChange={(e) => setPassword(e.target.value)}
+              required
+              minLength={6}
+              className="w-full px-3 py-2 border border-gray-300 rounded-lg text-sm focus:outline-none focus:ring-2 focus:ring-blue-500 focus:border-transparent"
+              placeholder="••••••"
+            />
+          </div>
+
+          {error && (
+            <p className="text-xs text-red-600 bg-red-50 border border-red-200 rounded-lg px-3 py-2">{error}</p>
+          )}
+
+          <button
+            type="submit"
+            disabled={loading}
+            className="w-full py-2.5 bg-blue-600 text-white text-sm font-medium rounded-lg hover:bg-blue-700 disabled:opacity-50 transition-colors"
+          >
+            {loading ? "..." : mode === "signin" ? "Sign in" : "Create account"}
+          </button>
+        </form>
+
+        <p className="text-center text-xs text-gray-500 mt-4">
+          {mode === "signin" ? "No account? " : "Already have one? "}
+          <button
+            onClick={() => { setMode(mode === "signin" ? "signup" : "signin"); setError(null); }}
+            className="text-blue-600 hover:underline font-medium"
+          >
+            {mode === "signin" ? "Sign up" : "Sign in"}
+          </button>
+        </p>
+      </div>
+    </div>
+  );
+}
--- a/frontend/src/pages/UploadPage.tsx
+++ b/frontend/src/pages/UploadPage.tsx
@@ -0,0 +1,16 @@
+import Header from "@/components/layout/Header";
+import UploadForm from "@/components/upload/UploadForm";
+
+export default function UploadPage() {
+  return (
+    <div className="min-h-screen bg-gray-50">
+      <Header />
+      <main className="py-10 px-6">
+        <h1 className="text-xl font-bold text-center mb-8 text-gray-800">
+          Upload Past Paper
+        </h1>
+        <UploadForm />
+      </main>
+    </div>
+  );
+}
--- a/frontend/src/pages/WorkbenchPage.tsx
+++ b/frontend/src/pages/WorkbenchPage.tsx
@@ -0,0 +1,524 @@
+import { useState, useEffect, useCallback, useRef } from "react";
+import { useParams } from "react-router-dom";
+import Header from "@/components/layout/Header";
+import PdfViewer from "@/components/workbench/PdfViewer";
+import QuestionNav from "@/components/workbench/QuestionNav";
+import QuestionDetail from "@/components/workbench/QuestionDetail";
+import AiTrioPanel from "@/components/workbench/AiTrioPanel";
+import SimilarHistoryPanel from "@/components/workbench/SimilarHistoryPanel";
+import ActionBar from "@/components/workbench/ActionBar";
+import PhotoUpload from "@/components/workbench/PhotoUpload";
+import VariantDetail from "@/components/workbench/VariantDetail";
+import KaTeXRenderer from "@/components/shared/KaTeXRenderer";
+import { usePaper } from "@/hooks/usePaper";
+import { useQuestions } from "@/hooks/useQuestions";
+import { generateVariant, getVariants, updateVariant, deleteVariant, recordAttempt, getPaperAttempts } from "@/lib/api";
+import { groupQuestions } from "@/lib/questionGroups";
+import { useAuth } from "@/contexts/AuthContext";
+import type { QuestionVariant } from "@/types/api";
+
+const WORKED_KEY = (userId: string) => `worked_papers_${userId}`;
+const WORKED_THRESHOLD_MS = 3 * 60 * 1000; // 3 minutes
+
+function markWorked(userId: string, paperId: string) {
+  try {
+    const raw = localStorage.getItem(WORKED_KEY(userId));
+    const ids: string[] = raw ? JSON.parse(raw) : [];
+    if (!ids.includes(paperId)) {
+      localStorage.setItem(WORKED_KEY(userId), JSON.stringify([...ids, paperId]));
+    }
+  } catch { /* silent */ }
+}
+
+export default function WorkbenchPage() {
+  const { id } = useParams<{ id: string }>();
+  const { user } = useAuth();
+  const { paper, loading: paperLoading, error: paperError } = usePaper(id!);
+  const isReady = paper?.status === "ready";
+  const { questions, loading: questionsLoading } = useQuestions(id!, isReady);
+  const [currentQuestionId, setCurrentQuestionId] = useState<string | null>(null);
+  const [showPhoto, setShowPhoto] = useState(false);
+  // Grading result per question
+  const [gradingResults, setGradingResults] = useState<Map<string, {
+    isCorrect: boolean;
+    feedback: string;
+    ocrText: string;
+    scoreGiven?: number;
+    loading?: boolean;
+  }>>(new Map());
+  // Track which grading panels are expanded
+  const [gradingExpanded, setGradingExpanded] = useState<Set<string>>(new Set());
+
+  // Tab state
+  const [activeTab, setActiveTab] = useState<"questions" | "variants">("questions");
+  // variants per question: questionId → QuestionVariant[]
+  const [variantMap, setVariantMap] = useState<Map<string, QuestionVariant[]>>(new Map());
+  // which question IDs have been fetched from server
+  const loadedRef = useRef<Set<string>>(new Set());
+  // generating state
+  const [isGenerating, setIsGenerating] = useState(false);
+  // Currently viewing variant (full detail view)
+  const [activeVariantId, setActiveVariantId] = useState<string | null>(null);
+
+  // Cooldown: ignore scroll-based updates for 2s after user clicks a question
+  const lastUserSelectTime = useRef(0);
+
+  const handleQuestionSelect = useCallback((questionId: string) => {
+    lastUserSelectTime.current = Date.now();
+    setCurrentQuestionId(questionId);
+  }, []);
+
+  const groups = groupQuestions(questions);
+  const currentQuestion =
+    questions.find((question) => question.id === currentQuestionId)
+    ?? questions[0]
+    ?? null;
+  const currentGroupKey = currentQuestion?.question_number.match(/^\d+/)?.[0] ?? null;
+  const paperTitle = paper
+    ? `${paper.year} ${paper.term} ${paper.exam_type}`
+    : undefined;
+
+  const currentVariants = variantMap.get(currentQuestion?.id ?? "") ?? [];
+  const activeVariant = currentVariants.find((v) => v.id === activeVariantId) ?? null;
+
+  const handleGroupSelect = useCallback((groupKey: string) => {
+    lastUserSelectTime.current = Date.now();
+    const group = groups.find((item) => item.key === groupKey);
+    if (group?.questions[0]) {
+      setCurrentQuestionId(group.questions[0].id);
+    }
+  }, [groups]);
+
+  useEffect(() => {
+    if (questions.length === 0) {
+      setCurrentQuestionId(null);
+      return;
+    }
+    setCurrentQuestionId((prev) =>
+      prev && questions.some((question) => question.id === prev) ? prev : questions[0].id,
+    );
+  }, [questions]);
+
+  // 3-minute worked tracking
+  useEffect(() => {
+    if (!id || !user) return;
+    const timer = setTimeout(() => markWorked(user.id, id), WORKED_THRESHOLD_MS);
+    return () => clearTimeout(timer);
+  }, [id, user]);
+
+  // Load historical grading results
+  useEffect(() => {
+    if (!id || !user || !isReady) return;
+    getPaperAttempts(id).then((attempts) => {
+      const map = new Map<string, { isCorrect: boolean; feedback: string; ocrText: string; scoreGiven?: number }>();
+      for (const a of attempts) {
+        map.set(a.question_id, {
+          isCorrect: a.is_correct,
+          feedback: a.feedback || "",
+          ocrText: a.photo_ocr_text || "",
+        });
+      }
+      if (map.size > 0) {
+        setGradingResults((prev) => {
+          const next = new Map(prev);
+          for (const [k, v] of map) {
+            if (!next.has(k)) next.set(k, v); // don't overwrite current session
+          }
+          return next;
+        });
+        setGradingExpanded(new Set(map.keys()));
+      }
+    }).catch(() => {});
+  }, [id, user, isReady]);
+
+  // Load variants for current question (once per question ID)
+  useEffect(() => {
+    if (!currentQuestionId || loadedRef.current.has(currentQuestionId)) return;
+    loadedRef.current.add(currentQuestionId);
+    getVariants(currentQuestionId)
+      .then((data) => {
+        setVariantMap((prev) => new Map(prev).set(currentQuestionId, data));
+      })
+      .catch(() => {});
+  }, [currentQuestionId]);
+
+  // When user scrolls PDF, find the question closest to that page
+  // But ignore if user just clicked a question (2s cooldown)
+  const handlePdfPageChange = useCallback(
+    (page: number) => {
+      if (questions.length === 0) return;
+      if (Date.now() - lastUserSelectTime.current < 2000) return;
+      let best = questions[0];
+      for (let i = 0; i < questions.length; i++) {
+        if ((questions[i].page_number ?? 1) <= page) best = questions[i];
+      }
+      setCurrentQuestionId(best.id);
+    },
+    [questions],
+  );
+
+  // Track answer state per question for ActionBar feedback
+  const [answerStates, setAnswerStates] = useState<Map<string, "correct" | "wrong">>(new Map());
+
+  const handleAnswerResult = async (isCorrect: boolean, userAnswer: string) => {
+    if (!currentQuestion) return;
+    const state = isCorrect ? "correct" : "wrong";
+    setAnswerStates((prev) => new Map(prev).set(currentQuestion.id, state));
+    try {
+      const type = currentQuestion.question_type === "mc" ? "select" : "input";
+      await recordAttempt(currentQuestion.id, type, userAnswer, isCorrect);
+      // Wrong answer → auto generate variant
+      if (!isCorrect) {
+        handleGenerateVariant();
+      }
+    } catch {
+      // silent
+    }
+  };
+
+  const handleGenerateVariant = async () => {
+    if (!currentQuestion || isGenerating) return;
+    setIsGenerating(true);
+    setActiveTab("variants");
+    try {
+      const saved = await generateVariant(currentQuestion.id);
+      setVariantMap((prev) => {
+        const existing = prev.get(currentQuestion.id) ?? [];
+        return new Map(prev).set(currentQuestion.id, [saved, ...existing]);
+      });
+    } catch {
+      // silent
+    } finally {
+      setIsGenerating(false);
+    }
+  };
+
+  const handleToggleFavorite = async (v: QuestionVariant) => {
+    const updated = await updateVariant(v.id, { favorited: !v.favorited });
+    setVariantMap((prev) => {
+      const existing = prev.get(v.source_question_id) ?? [];
+      return new Map(prev).set(
+        v.source_question_id,
+        existing.map((item) => (item.id === v.id ? updated : item)),
+      );
+    });
+  };
+
+  const handleDeleteVariant = async (v: QuestionVariant) => {
+    await deleteVariant(v.id);
+    if (activeVariantId === v.id) setActiveVariantId(null);
+    setVariantMap((prev) => {
+      const existing = prev.get(v.source_question_id) ?? [];
+      return new Map(prev).set(
+        v.source_question_id,
+        existing.filter((item) => item.id !== v.id),
+      );
+    });
+  };
+
+  if (paperLoading) {
+    return (
+      <div className="min-h-screen bg-gray-50 flex items-center justify-center">
+        <div className="text-gray-400 text-sm">Loading...</div>
+      </div>
+    );
+  }
+
+  if (paperError || !paper) {
+    return (
+      <div className="min-h-screen bg-gray-50 flex items-center justify-center">
+        <div className="text-red-500 text-sm">{paperError ?? "Paper not found"}</div>
+      </div>
+    );
+  }
+
+  return (
+    <div className="h-screen flex flex-col">
+      <Header courseCode={paper.course_code} paperTitle={paperTitle} />
+
+      {/* Processing overlay */}
+      {paper.status === "processing" && (
+        <div className="flex-1 flex items-center justify-center bg-gray-50">
+          <div className="text-center">
+            <div className="inline-block w-8 h-8 border-3 border-blue-600 border-t-transparent rounded-full animate-spin mb-4" />
+            <p className="text-gray-600 text-sm">AI is analyzing the paper...</p>
+            <p className="text-gray-400 text-xs mt-1">
+              {paper.question_count
+                ? `${paper.question_count} questions found, generating analysis...`
+                : "Extracting and structuring questions..."}
+            </p>
+          </div>
+        </div>
+      )}
+
+      {/* Error state */}
+      {paper.status === "error" && (
+        <div className="flex-1 flex items-center justify-center bg-gray-50">
+          <div className="text-center max-w-md">
+            <p className="text-red-600 font-medium mb-2">Processing Failed</p>
+            <p className="text-gray-500 text-sm">{paper.error_message}</p>
+          </div>
+        </div>
+      )}
+
+      {/* Ready — workbench */}
+      {paper.status === "ready" && (
+        <div className="flex-1 flex overflow-hidden">
+          {/* Left: PDF viewer */}
+          <div className="w-[60%] border-r border-gray-200">
+            <PdfViewer
+              fileUrl={paper.paper_file_url}
+              currentPage={currentQuestion?.page_number ?? 1}
+              onPageChange={handlePdfPageChange}
+            />
+          </div>
+
+          {/* Right: analysis panel */}
+          <div className="w-[40%] flex flex-col overflow-hidden">
+            {questionsLoading ? (
+              <div className="flex-1 flex items-center justify-center text-gray-400 text-sm">
+                Loading questions...
+              </div>
+            ) : activeVariantId && activeVariant ? (
+              /* ===== Variant Detail View ===== */
+              <>
+                <button
+                  onClick={() => setActiveVariantId(null)}
+                  className="flex items-center gap-2 px-4 py-2.5 text-sm font-medium text-blue-600 bg-gray-50 border-b border-gray-200 hover:bg-gray-100 shrink-0"
+                >
+                  <span>←</span>
+                  <span>Back to Questions</span>
+                  <span className="ml-2 px-2 py-0.5 bg-purple-100 text-purple-700 text-xs rounded-full font-medium">
+                    Variant Q{activeVariant.source_question_number}
+                  </span>
+                </button>
+                <div className="flex-1 overflow-y-auto p-4">
+                  <VariantDetail variant={activeVariant.variant_data} />
+                </div>
+              </>
+            ) : (
+              /* ===== Normal Tab View ===== */
+              <>
+                {/* Tab bar */}
+                <div className="flex border-b border-gray-200 shrink-0">
+                  <button
+                    onClick={() => setActiveTab("questions")}
+                    className={`flex-1 py-2.5 text-sm font-medium text-center transition-colors ${
+                      activeTab === "questions"
+                        ? "text-gray-900 border-b-2 border-blue-600"
+                        : "text-gray-400 hover:text-gray-600"
+                    }`}
+                  >
+                    Questions
+                  </button>
+                  <button
+                    onClick={() => setActiveTab("variants")}
+                    className={`flex-1 py-2.5 text-sm font-medium text-center transition-colors flex items-center justify-center gap-1.5 ${
+                      activeTab === "variants"
+                        ? "text-gray-900 border-b-2 border-blue-600"
+                        : "text-gray-400 hover:text-gray-600"
+                    }`}
+                  >
+                    Variants
+                    {currentVariants.length > 0 && (
+                      <span className="w-5 h-5 flex items-center justify-center bg-purple-500 text-white text-xs font-bold rounded-full">
+                        {currentVariants.length}
+                      </span>
+                    )}
+                  </button>
+                </div>
+
+                {/* Question nav — always visible */}
+                <QuestionNav
+                  groups={groups}
+                  currentGroupKey={currentGroupKey}
+                  currentQuestionId={currentQuestion?.id ?? null}
+                  onSelectGroup={handleGroupSelect}
+                  onSelectQuestion={handleQuestionSelect}
+                />
+
+                {/* Questions tab content */}
+                {activeTab === "questions" && (
+                  <>
+                    <div className="flex-1 overflow-y-auto p-4">
+                      {currentQuestion && (
+                        <>
+                          <QuestionDetail
+                            question={currentQuestion}
+                            onAnswerResult={handleAnswerResult}
+                          />
+                          {/* Grading result panel */}
+                          {gradingResults.has(currentQuestion.id) && (() => {
+                            const gr = gradingResults.get(currentQuestion.id)!;
+                            const expanded = gradingExpanded.has(currentQuestion.id);
+                            const toggleExpand = () => setGradingExpanded((prev) => {
+                              const next = new Set(prev);
+                              next.has(currentQuestion.id) ? next.delete(currentQuestion.id) : next.add(currentQuestion.id);
+                              return next;
+                            });
+
+                            if (gr.loading) {
+                              return (
+                                <div className="mb-4 rounded-lg border border-blue-200 bg-blue-50 p-3">
+                                  <div className="flex items-center gap-2">
+                                    <span className="w-4 h-4 border-2 border-blue-600 border-t-transparent rounded-full animate-spin" />
+                                    <span className="text-sm font-medium text-blue-700">Grading your answer...</span>
+                                  </div>
+                                </div>
+                              );
+                            }
+
+                            return (
+                              <div className={`mb-4 rounded-lg border ${gr.isCorrect ? "border-green-200" : "border-red-200"}`}>
+                                <button
+                                  onClick={toggleExpand}
+                                  className={`w-full flex items-center justify-between px-3 py-2.5 rounded-t-lg ${gr.isCorrect ? "bg-green-50" : "bg-red-50"}`}
+                                >
+                                  <div className="flex items-center gap-2">
+                                    <span className="text-lg">{gr.isCorrect ? "✓" : "✗"}</span>
+                                    <span className={`font-semibold text-sm ${gr.isCorrect ? "text-green-700" : "text-red-700"}`}>
+                                      AI Grading: {gr.isCorrect ? "Correct" : "Incorrect"}
+                                      {gr.scoreGiven !== undefined && ` — ${gr.scoreGiven} pts`}
+                                    </span>
+                                  </div>
+                                  <span className="text-gray-400 text-xs">{expanded ? "▲" : "▼"}</span>
+                                </button>
+                                {expanded && (
+                                  <div className="p-3 border-t border-gray-100 bg-white rounded-b-lg">
+                                    {gr.ocrText && (
+                                      <details className="mb-3 bg-gray-50 rounded-lg border border-gray-200">
+                                        <summary className="px-3 py-2 text-xs font-medium text-gray-500 cursor-pointer">Your Answer (OCR)</summary>
+                                        <div className="px-3 pb-3">
+                                          <KaTeXRenderer html={gr.ocrText.replace(/\n/g, "<br/>")} className="text-xs text-gray-700" />
+                                        </div>
+                                      </details>
+                                    )}
+                                    <KaTeXRenderer html={gr.feedback} className="text-gray-700 text-sm" />
+                                  </div>
+                                )}
+                              </div>
+                            );
+                          })()}
+                          <AiTrioPanel question={currentQuestion} />
+                          <SimilarHistoryPanel question={currentQuestion} />
+                        </>
+                      )}
+                    </div>
+                    <ActionBar
+                      question={currentQuestion}
+                      onGenerateVariant={handleGenerateVariant}
+                      isGenerating={isGenerating}
+                      onPhotoOpen={() => setShowPhoto(true)}
+                      answerState={currentQuestion ? answerStates.get(currentQuestion.id) ?? null : null}
+                    />
+                  </>
+                )}
+
+                {/* Variants tab content */}
+                {activeTab === "variants" && (
+                  <div className="flex-1 overflow-y-auto p-4">
+                    <div className="mb-3">
+                      <button
+                        onClick={handleGenerateVariant}
+                        disabled={!currentQuestion || isGenerating}
+                        className="w-full py-2 rounded-lg text-sm font-medium bg-purple-50 text-purple-700 border border-purple-200 hover:bg-purple-100 disabled:opacity-50 transition-colors"
+                      >
+                        {isGenerating ? (
+                          <span className="flex items-center justify-center gap-2">
+                            <span className="w-3 h-3 border-2 border-purple-600 border-t-transparent rounded-full animate-spin" />
+                            Generating...
+                          </span>
+                        ) : "+ Generate Variant"}
+                      </button>
+                    </div>
+
+                    {currentVariants.length === 0 && !isGenerating ? (
+                      <div className="text-center py-12">
+                        <p className="text-gray-400 text-sm">No variants yet for this question.</p>
+                      </div>
+                    ) : (
+                      <div className="space-y-3">
+                        {currentVariants.map((v) => (
+                          <div key={v.id} className="bg-gray-50 rounded-lg border border-gray-200 p-4">
+                            <div className="flex items-center justify-between mb-2">
+                              <span className="text-xs text-gray-400">
+                                {new Date(v.created_at).toLocaleDateString("en-CA")}
+                              </span>
+                              <div className="flex items-center gap-2">
+                                <button
+                                  onClick={() => void handleToggleFavorite(v)}
+                                  title={v.favorited ? "Unfavorite" : "Save to Error Book"}
+                                  className={`text-lg leading-none ${v.favorited ? "text-yellow-400" : "text-gray-300 hover:text-yellow-400"}`}
+                                >
+                                  ★
+                                </button>
+                                <button
+                                  onClick={() => void handleDeleteVariant(v)}
+                                  className="text-gray-300 hover:text-red-400 text-sm leading-none"
+                                  title="Delete"
+                                >
+                                  ×
+                                </button>
+                              </div>
+                            </div>
+                            <p className="text-xs text-gray-600 line-clamp-2 mb-3">
+                              {v.variant_data.question_text?.replace(/<[^>]*>/g, "").slice(0, 140)}
+                            </p>
+                            <button
+                              onClick={() => setActiveVariantId(v.id)}
+                              className="px-3 py-1.5 bg-blue-600 text-white text-xs font-medium rounded-lg hover:bg-blue-700"
+                            >
+                              Practice →
+                            </button>
+                          </div>
+                        ))}
+                      </div>
+                    )}
+                  </div>
+                )}
+              </>
+            )}
+          </div>
+        </div>
+      )}
+
+      {/* Photo upload modal */}
+      {showPhoto && currentQuestion && (() => {
+        const qid = currentQuestion.id;
+        return (
+          <PhotoUpload
+            questionId={qid}
+            onClose={() => setShowPhoto(false)}
+            onSubmitted={async (promise) => {
+              // Set loading state
+              setGradingResults((prev) => new Map(prev).set(qid, { isCorrect: false, feedback: "", ocrText: "", loading: true }));
+              setGradingExpanded((prev) => new Set(prev).add(qid));
+              try {
+                const res = await promise;
+                const { is_correct, feedback, score_given } = res.grade;
+                setGradingResults((prev) => new Map(prev).set(qid, {
+                  isCorrect: is_correct,
+                  feedback,
+                  ocrText: res.ocr_text,
+                  scoreGiven: score_given,
+                  loading: false,
+                }));
+                // Wrong → auto generate variant
+                if (!is_correct) {
+                  handleGenerateVariant();
+                }
+              } catch {
+                setGradingResults((prev) => new Map(prev).set(qid, {
+                  isCorrect: false,
+                  feedback: "Grading failed. Please try again.",
+                  ocrText: "",
+                  loading: false,
+                }));
+              }
+            }}
+          />
+        );
+      })()}
+    </div>
+  );
+}
--- a/frontend/src/styles/globals.css
+++ b/frontend/src/styles/globals.css
@@ -0,0 +1,79 @@
+@import "tailwindcss";
+@import "katex/dist/katex.min.css";
+
+/* ── Google Fonts: Sora (headings) + IBM Plex Mono (data) ── */
+@import url("https://fonts.googleapis.com/css2?family=Sora:wght@400;500;600;700&family=IBM+Plex+Mono:wght@400;500;600&display=swap");
+
+/* Hide scrollbar on horizontal tab rows */
+.hide-scrollbar { -ms-overflow-style: none; scrollbar-width: none; }
+.hide-scrollbar::-webkit-scrollbar { display: none; }
+
+/* ── Knowledge Base HTML content styling (from SOS project) ── */
+.kb-html-content h1 { font-size: 1.25rem; font-weight: 700; margin: 0.75rem 0 0.5rem; line-height: 1.3; }
+.kb-html-content h2 { font-size: 1.1rem; font-weight: 600; margin: 0.75rem 0 0.4rem; color: #1e40af; border-bottom: 1px solid #e5e7eb; padding-bottom: 0.25rem; }
+.kb-html-content h3 { font-size: 0.95rem; font-weight: 600; margin: 0.6rem 0 0.3rem; color: #374151; }
+.kb-html-content h4 { font-size: 0.875rem; font-weight: 600; margin: 0.5rem 0 0.25rem; color: #6b7280; }
+.kb-html-content p { margin: 0.3rem 0; line-height: 1.6; }
+.kb-html-content p.summary { background: #eff6ff; border-left: 3px solid #3b82f6; padding: 0.5rem 0.75rem; border-radius: 0 0.25rem 0.25rem 0; color: #1e3a5f; margin-bottom: 0.75rem; }
+.kb-html-content ul, .kb-html-content ol { margin: 0.3rem 0 0.3rem 1.25rem; line-height: 1.6; }
+.kb-html-content ul { list-style: disc; }
+.kb-html-content ol { list-style: decimal; }
+.kb-html-content li { margin: 0.15rem 0; }
+.kb-html-content strong { font-weight: 600; color: #1e293b; }
+.kb-html-content blockquote { border-left: 3px solid #d1d5db; padding: 0.4rem 0.75rem; margin: 0.4rem 0; background: #f9fafb; color: #4b5563; font-style: italic; border-radius: 0 0.25rem 0.25rem 0; }
+.kb-html-content pre { background: #1e293b; color: #e2e8f0; padding: 0.75rem; border-radius: 0.375rem; overflow-x: auto; margin: 0.4rem 0; font-size: 0.8rem; }
+.kb-html-content code { font-family: ui-monospace, monospace; font-size: 0.85em; }
+.kb-html-content :not(pre) > code { background: #f1f5f9; padding: 0.1rem 0.3rem; border-radius: 0.2rem; color: #be185d; }
+.kb-html-content table { border-collapse: collapse; width: 100%; margin: 0.4rem 0; font-size: 0.8rem; }
+.kb-html-content th, .kb-html-content td { border: 1px solid #e5e7eb; padding: 0.35rem 0.5rem; text-align: left; }
+.kb-html-content th { background: #f3f4f6; font-weight: 600; }
+.kb-html-content section { margin: 0.5rem 0; }
+.kb-html-content .tag { display: inline-block; background: #dbeafe; color: #1e40af; padding: 0.1rem 0.5rem; border-radius: 9999px; font-size: 0.75rem; margin: 0.15rem 0.15rem; }
+.kb-html-content hr { border: none; border-top: 1px solid #e5e7eb; margin: 0.75rem 0; }
+
+/* ── Example blocks ── */
+.kb-html-content .example { background: #fffbeb; border: 1px solid #fbbf24; border-radius: 0.375rem; padding: 0.75rem; margin: 0.6rem 0; }
+.kb-html-content .example-title { font-weight: 700; color: #92400e; margin-bottom: 0.4rem; font-size: 0.9rem; }
+.kb-html-content .example-solution { border-top: 1px dashed #d97706; padding-top: 0.4rem; }
+
+/* ── LaTeX blocks ── */
+.kb-html-content pre.latex { background: #f8fafc; color: #1e293b; border: 1px solid #e2e8f0; text-align: center; font-size: 0.9rem; padding: 0.6rem; }
+.kb-html-content code.latex { background: #f1f5f9; padding: 0.1rem 0.3rem; border-radius: 0.2rem; color: #4338ca; font-size: 0.85em; }
+
+/* ── Common error block (used in solution) ── */
+.kb-html-content .common-error {
+  background: #fef2f2;
+  border: 1px solid #fca5a5;
+  border-left: 3px solid #ef4444;
+  border-radius: 0.375rem;
+  padding: 0.6rem 0.75rem;
+  margin: 0.5rem 0;
+}
+.kb-html-content .common-error::before {
+  content: "⚠ Common Mistake";
+  font-weight: 700;
+  color: #dc2626;
+  display: block;
+  margin-bottom: 0.3rem;
+  font-size: 0.85rem;
+}
+
+/* ── Figure description blocks ── */
+.kb-html-content .figure-desc {
+  background: #faf5ff;
+  border: 1px solid #d8b4fe;
+  border-left: 3px solid #a855f7;
+  border-radius: 0.375rem;
+  padding: 0.6rem 0.75rem;
+  margin: 0.5rem 0;
+}
+
+/* ── AI Supplement blocks ── */
+.kb-html-content .ai-supplement {
+  background: #f0fdf4;
+  border: 1px solid #86efac;
+  border-left: 3px solid #22c55e;
+  border-radius: 0.375rem;
+  padding: 0.6rem 0.75rem;
+  margin: 0.5rem 0;
+}
--- a/frontend/src/types/api.ts
+++ b/frontend/src/types/api.ts
@@ -0,0 +1,169 @@
+export interface Paper {
+  id: string;
+  user_id: string | null;
+  course_code: string;
+  year: number;
+  term: string;
+  exam_type: string;
+  paper_file_url: string;
+  answer_file_url: string | null;
+  status: "uploaded" | "processing" | "ready" | "error";
+  error_message: string | null;
+  total_score: number | null;
+  question_count: number | null;
+  topics_summary: Record<string, number> | null;
+  difficulty_level: string | null;
+  processing_step: string | null;
+  processing_progress: number;
+  processing_total: number;
+  created_at: string;
+  updated_at: string;
+}
+
+export interface PaperSummary {
+  id: string;
+  course_code: string;
+  year: number;
+  term: string;
+  exam_type: string;
+  part_label: string | null;
+}
+
+export interface Question {
+  id: string;
+  paper_id: string;
+  question_number: string;
+  parent_question: string | null;
+  display_order: number;
+  question_type: string;
+  question_format?: string | null;
+  question_text: string;
+  score: number | null;
+  page_number: number | null;
+  page_y_ratio?: number | null;
+  options: { label: string; text: string }[] | null;
+  correct_option: string | null;
+  correct_answer: string | null;
+  raw_answer_text: string | null;
+  topics: string[] | null;
+  topic_primary?: string | null;
+  analytics_topic?: string | null;
+  topic_tags?: string[] | null;
+  skill_tags?: string[] | null;
+  difficulty: string | null;
+  knowledge_reminder: string;
+  ai_hint: string;
+  solution: string;
+  created_at: string;
+  updated_at: string;
+  paper?: PaperSummary;
+}
+
+export interface UploadResponse {
+  paper_id: string;
+  status: string;
+  message: string;
+}
+
+export interface UserAttempt {
+  id: string;
+  user_id: string;
+  question_id: string;
+  attempt_type: string;
+  user_answer: string | null;
+  photo_url: string | null;
+  photo_ocr_text: string | null;
+  is_correct: boolean | null;
+  feedback: string | null;
+  error_at_step: number | null;
+  in_error_book: boolean;
+  mastered: boolean;
+  created_at: string;
+  paper_questions?: Question;
+  score_given?: number | null;
+}
+
+export interface VariantQuestion {
+  question_text: string;
+  question_type: string;
+  options: { label: string; text: string }[] | null;
+  correct_answer: string;
+  ai_hint: string;
+  knowledge_reminder: string;
+  solution: string;
+}
+
+export interface QuestionVariant {
+  id: string;
+  user_id: string;
+  source_question_id: string;
+  source_question_number: string;
+  variant_data: VariantQuestion;
+  favorited: boolean;
+  created_at: string;
+}
+
+export interface GradeResult {
+  is_correct: boolean;
+  feedback: string;
+  error_at_step: number | null;
+}
+
+export interface SimilarQuestion {
+  id: string;
+  paper_id: string;
+  source: string;
+  question_number: string;
+  match_percent: number;
+  match_reasons?: string[];
+  question_type: Question["question_type"];
+  question_text: string;
+  topics: string[];
+  difficulty: string | null;
+  knowledge_reminder: string;
+  ai_hint: string;
+  solution: string;
+}
+
+export interface AnalyticsTopicQuestion {
+  paper_id: string;
+  source: string;
+  question_number: string;
+  preview: string;
+  difficulty: string | null;
+  question_type: string;
+  year?: number | null;
+  term?: string | null;
+  exam_type?: string | null;
+  topics?: string[];
+}
+
+export interface AnalyticsTopicEntry {
+  label: string;
+  count: number;
+  pct: number;
+  questions: AnalyticsTopicQuestion[];
+}
+
+export interface CourseAnalytics {
+  course_code: string;
+  kpi: {
+    papers: number;
+    questions: number;
+    topics: number;
+    difficulty: string;
+  };
+  topic_frequency: AnalyticsTopicEntry[];
+  question_types: Array<{
+    label: string;
+    count: number;
+    pct: number;
+  }>;
+  difficulty_distribution: {
+    easy: number;
+    medium: number;
+    hard: number;
+  };
+  high_yield_topics: string[];
+  all_questions: AnalyticsTopicQuestion[];
+}
--- a/frontend/src/vite-env.d.ts
+++ b/frontend/src/vite-env.d.ts
@@ -0,0 +1 @@
+/// <reference types="vite/client" />
--- a/frontend/tsconfig.json
+++ b/frontend/tsconfig.json
@@ -0,0 +1,21 @@
+{
+  "compilerOptions": {
+    "target": "ES2020",
+    "useDefineForClassFields": true,
+    "lib": ["ES2020", "DOM", "DOM.Iterable"],
+    "module": "ESNext",
+    "skipLibCheck": true,
+    "moduleResolution": "bundler",
+    "allowImportingTsExtensions": true,
+    "isolatedModules": true,
+    "moduleDetection": "force",
+    "noEmit": true,
+    "jsx": "react-jsx",
+    "strict": true,
+    "baseUrl": ".",
+    "paths": {
+      "@/*": ["src/*"]
+    }
+  },
+  "include": ["src"]
+}
--- a/frontend/vite.config.ts
+++ b/frontend/vite.config.ts
@@ -0,0 +1,22 @@
+import { defineConfig } from "vite";
+import react from "@vitejs/plugin-react";
+import tailwindcss from "@tailwindcss/vite";
+import { resolve } from "path";
+
+export default defineConfig({
+  plugins: [react(), tailwindcss()],
+  resolve: {
+    alias: {
+      "@": resolve(__dirname, "src"),
+    },
+  },
+  server: {
+    port: 5173,
+    proxy: {
+      "/api": {
+        target: "http://localhost:8000",
+        changeOrigin: true,
+      },
+    },
+  },
+});
--- a/2.html
+++ b/2.html
--- a/memory/MEMORY.md
+++ b/memory/MEMORY.md
@@ -0,0 +1,3 @@
+# Memory Index
+
+- [project_pastpaper_master.md](project_pastpaper_master.md) — PastPaper Master 项目概览与当前开发进度
--- a/memory/project_pastpaper_master.md
+++ b/memory/project_pastpaper_master.md
@@ -0,0 +1,37 @@
+---
+name: PastPaper Master 项目概览
+description: 项目技术栈、当前开发状态、已完成工作流及下一步优先级
+type: project
+---
+
+AI 辅助学习平台，支持 COMP2211 试卷练习。核心功能：题目工作台、AI 三件套、相似题推荐、错题本、变式题生成。
+
+## 技术栈
+- Frontend: React 19 + TypeScript + Vite 7 + Tailwind v4
+- Backend: FastAPI + Python 3.12 + uv
+- DB: Supabase PostgreSQL（RLS 已预留，当前用 temp user id）
+- LLM: GPT-4o (laozhang proxy) + Qwen-plus fallback
+
+## 当前 DB 状态（2026-04-10）
+COMP2211 共 7 份 status=ready 试卷，250 道 subquestion 级题目，均有 knowledge_reminder / ai_hint / solution / analytics_topic / topic_tags / skill_tags。
+
+## 已完成的工作（本次 session）
+**Workstream A：相似题检索 + 移除 demo fallback**
+- `backend/app/routers/questions.py`：
+  - `skill_tags` 加入 SELECT 和 `question_topics()` 计算
+  - 修复 `isinstance(target_score, int)` → `(int, float)` 支持 NUMERIC 小数分
+  - `similarity_score()` 返回 `(score, reasons)` tuple
+  - 过滤阈值从 `<= 0` 改为 `< 10`
+  - 响应增加 `match_reasons` 字段
+- `frontend/src/types/api.ts`：`SimilarQuestion` 加 `match_reasons?: string[]`
+- `frontend/src/components/workbench/SimilarHistoryPanel.tsx`：移除全部 demo fallback，改为真实 empty/error 状态，显示 match_reasons chip
+
+## 下一步优先级（来自 HANDOFF_COMP2211.md）
+1. ✅ Workstream A: 相似题检索 + 移除 demo fallback — 已完成
+2. Workstream B: Analytics 深化（per-paper drill-down、topic 频率时序、高频话题）
+3. Workstream C: LaTeX/KaTeX 渲染质量（集中归一化、剔除 OCR 噪声）
+4. Workstream D: 用户上传去重（对比 course_library 已有试卷）
+5. Workstream E: UI/UX pass（QuestionNav、状态 badge、workbench 层级）
+
+**Why:** HANDOFF 文档中建议的开发顺序，以数据稳定性为先。
+**How to apply:** 下次 session 从 Workstream B（Analytics 深化）开始。
--- a/1
+++ b/1
--- a/pitch_script.md
+++ b/pitch_script.md
@@ -0,0 +1,25 @@
+# KnowIt Pitch — Product Demo (Pages 5-6, ~45s)
+
+## Transition In
+
+> Now let me show you the product.
+
+## Page 5 — Product Demo
+
+> This is PastPaper Master. Search any course, download past papers, and hit "AI Analyze" — our system reads every page, extracts each question, and generates knowledge reminders, hints, and full solutions automatically.
+>
+> It's powered by Gemini vision and DeepSeek, with a RAG pipeline connecting papers, recordings, and courseware.
+
+## Page 6 — Workflow
+
+> Here's the full student workflow.
+>
+> **Download** papers. **AI analysis** breaks down topics and difficulty. **Upload your answers** — AI grades them instantly with detailed feedback.
+>
+> Wrong answers go into your **mistake book**. AI generates **variant questions** on the same topic, plus retrieves **similar questions** from other exams.
+>
+> And **smart flashcards** auto-generated for quick revision — already live for pharmacology students.
+
+## Transition Out
+
+> One closed loop — find, practice, grade, review, master. Over to [name] on the market.
--- a/supabase/migrations/001_init_schema.sql
+++ b/supabase/migrations/001_init_schema.sql
@@ -0,0 +1,207 @@
+-- ============================================
+-- PastPaper Master — 初始数据库 Schema
+-- Version: 001
+-- Date: 2025-03-11
+-- ============================================
+
+-- 启用必要的扩展
+CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
+
+-- ============================================
+-- Table 1: papers — 上传的试卷
+-- ============================================
+CREATE TABLE papers (
+  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+  user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
+
+  -- 元信息（用户上传时填写）
+  course_code TEXT NOT NULL,                -- "COMP2011"
+  year INTEGER NOT NULL,                    -- 2024
+  term TEXT NOT NULL CHECK (term IN ('fall', 'spring', 'summer')),
+  exam_type TEXT NOT NULL CHECK (exam_type IN ('midterm', 'final', 'quiz')),
+
+  -- 文件 (Supabase Storage)
+  paper_file_url TEXT NOT NULL,             -- 试卷 PDF
+  answer_file_url TEXT,                     -- 答案 PDF（可选）
+
+  -- 处理状态
+  status TEXT NOT NULL DEFAULT 'uploaded'
+    CHECK (status IN ('uploaded', 'processing', 'ready', 'error')),
+  error_message TEXT,                       -- 处理失败时的错误信息
+
+  -- 提取的原始文本（缓存）
+  paper_extracted_text TEXT,
+  answer_extracted_text TEXT,
+
+  -- 整卷概览（AI 生成）
+  total_score INTEGER,
+  question_count INTEGER,
+  topics_summary JSONB,                     -- {"Linked List": 40, "Recursion": 30}
+  difficulty_level TEXT CHECK (difficulty_level IN ('easy', 'medium', 'hard')),
+
+  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
+  updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
+);
+
+-- ============================================
+-- Table 2: paper_questions — 逐题数据
+-- ============================================
+CREATE TABLE paper_questions (
+  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+  paper_id UUID NOT NULL REFERENCES papers(id) ON DELETE CASCADE,
+
+  -- 题目标识
+  question_number TEXT NOT NULL,             -- "1", "1a", "2b"
+  parent_question TEXT,                      -- 子题的父题号: "1a" → "1"
+  display_order INTEGER NOT NULL,            -- 显示顺序
+
+  -- 题目内容
+  question_type TEXT NOT NULL
+    CHECK (question_type IN ('mc', 'fill_blank', 'long_question')),
+  question_text TEXT NOT NULL,               -- 题目原文
+  score INTEGER,                             -- 分值
+  page_number INTEGER,                       -- PDF 页码（左右联动）
+
+  -- 选择题专用
+  options JSONB,                             -- [{"label":"A","text":"..."},...]
+  correct_option TEXT,                       -- "B"
+
+  -- 填空题专用
+  correct_answer TEXT,                       -- 正确答案
+  accept_variants TEXT[],                    -- 等价表达 ["O(nlogn)","O(n log n)"]
+
+  -- 答案 PDF 提取的原始答案（所有题型）
+  raw_answer_text TEXT,
+
+  -- 知识点标签
+  topics TEXT[],                             -- ["Linked List","Pointer"]
+  difficulty TEXT CHECK (difficulty IN ('easy', 'medium', 'hard')),
+
+  -- AI 三件套（HTML + KaTeX）
+  knowledge_reminder TEXT,                   -- 知识点 Reminder
+  ai_hint TEXT,                              -- AI Hint
+  solution TEXT,                             -- Solution（逐步 derivation）
+
+  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
+  updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
+);
+
+-- ============================================
+-- Table 3: user_attempts — 用户答题记录
+-- Phase 4 实现，先建好表结构
+-- ============================================
+CREATE TABLE user_attempts (
+  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+  user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
+  question_id UUID NOT NULL REFERENCES paper_questions(id) ON DELETE CASCADE,
+
+  -- 用户的作答
+  attempt_type TEXT NOT NULL
+    CHECK (attempt_type IN ('select', 'input', 'photo')),
+  user_answer TEXT,                          -- 选项 / 输入的答案
+  photo_url TEXT,                            -- 上传的照片
+  photo_ocr_text TEXT,                       -- OCR 识别结果
+
+  -- AI 判定
+  is_correct BOOLEAN,
+  feedback TEXT,                             -- HTML — 逐步错误分析
+  error_at_step INTEGER,                     -- 第几步开始错
+
+  -- 错题本
+  in_error_book BOOLEAN NOT NULL DEFAULT false,
+  mastered BOOLEAN NOT NULL DEFAULT false,
+
+  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
+);
+
+-- ============================================
+-- 索引
+-- ============================================
+CREATE INDEX idx_papers_user ON papers(user_id);
+CREATE INDEX idx_papers_course ON papers(course_code);
+CREATE INDEX idx_papers_status ON papers(status);
+
+CREATE INDEX idx_questions_paper ON paper_questions(paper_id);
+CREATE INDEX idx_questions_type ON paper_questions(question_type);
+CREATE INDEX idx_questions_topics ON paper_questions USING GIN(topics);
+
+CREATE INDEX idx_attempts_user ON user_attempts(user_id);
+CREATE INDEX idx_attempts_question ON user_attempts(question_id);
+CREATE INDEX idx_attempts_errorbook ON user_attempts(user_id)
+  WHERE in_error_book = true;
+
+-- ============================================
+-- RLS 策略
+-- ============================================
+ALTER TABLE papers ENABLE ROW LEVEL SECURITY;
+ALTER TABLE paper_questions ENABLE ROW LEVEL SECURITY;
+ALTER TABLE user_attempts ENABLE ROW LEVEL SECURITY;
+
+-- papers: 用户只能看自己上传的（以后加公共库时再调整）
+CREATE POLICY "Users can view own papers"
+  ON papers FOR SELECT
+  USING (auth.uid() = user_id);
+
+CREATE POLICY "Users can insert own papers"
+  ON papers FOR INSERT
+  WITH CHECK (auth.uid() = user_id);
+
+CREATE POLICY "Users can update own papers"
+  ON papers FOR UPDATE
+  USING (auth.uid() = user_id);
+
+CREATE POLICY "Users can delete own papers"
+  ON papers FOR DELETE
+  USING (auth.uid() = user_id);
+
+-- paper_questions: 跟随 paper 的权限
+CREATE POLICY "Users can view questions of own papers"
+  ON paper_questions FOR SELECT
+  USING (
+    EXISTS (
+      SELECT 1 FROM papers
+      WHERE papers.id = paper_questions.paper_id
+      AND papers.user_id = auth.uid()
+    )
+  );
+
+-- service_role 用于后端写入 questions（处理管线用）
+-- 前端不直接写 questions，通过 API 触发后端处理
+
+-- user_attempts: 用户只能看/写自己的
+CREATE POLICY "Users can view own attempts"
+  ON user_attempts FOR SELECT
+  USING (auth.uid() = user_id);
+
+CREATE POLICY "Users can insert own attempts"
+  ON user_attempts FOR INSERT
+  WITH CHECK (auth.uid() = user_id);
+
+CREATE POLICY "Users can update own attempts"
+  ON user_attempts FOR UPDATE
+  USING (auth.uid() = user_id);
+
+-- ============================================
+-- updated_at 自动更新触发器
+-- ============================================
+CREATE OR REPLACE FUNCTION update_updated_at()
+RETURNS TRIGGER AS $$
+BEGIN
+  NEW.updated_at = now();
+  RETURN NEW;
+END;
+$$ LANGUAGE plpgsql;
+
+CREATE TRIGGER papers_updated_at
+  BEFORE UPDATE ON papers
+  FOR EACH ROW EXECUTE FUNCTION update_updated_at();
+
+CREATE TRIGGER questions_updated_at
+  BEFORE UPDATE ON paper_questions
+  FOR EACH ROW EXECUTE FUNCTION update_updated_at();
+
+-- ============================================
+-- Storage bucket
+-- ============================================
+-- 在 Supabase Dashboard 中手动创建 bucket: "papers"
+-- 或通过 API 创建（后端初始化时处理）
--- a/supabase/migrations/002_course_library_fields.sql
+++ b/supabase/migrations/002_course_library_fields.sql
@@ -0,0 +1,38 @@
+-- ============================================
+-- PastPaper Master — Shared course library fields
+-- Version: 002
+-- Date: 2026-03-24
+-- ============================================
+
+-- Shared library / canonical import metadata on papers
+ALTER TABLE papers
+  ADD COLUMN IF NOT EXISTS source_kind TEXT NOT NULL DEFAULT 'user_upload'
+    CHECK (source_kind IN ('user_upload', 'course_library')),
+  ADD COLUMN IF NOT EXISTS source_exam_key TEXT,
+  ADD COLUMN IF NOT EXISTS part_label TEXT
+    CHECK (part_label IN ('A', 'B')),
+  ADD COLUMN IF NOT EXISTS source_question_filename TEXT,
+  ADD COLUMN IF NOT EXISTS source_answer_filename TEXT;
+
+CREATE UNIQUE INDEX IF NOT EXISTS idx_papers_course_library_exam_key
+  ON papers(source_exam_key)
+  WHERE source_kind = 'course_library' AND source_exam_key IS NOT NULL;
+
+CREATE INDEX IF NOT EXISTS idx_papers_course_lookup
+  ON papers(course_code, year, term, exam_type, part_label);
+
+-- Grading results should persist awarded score
+ALTER TABLE user_attempts
+  ADD COLUMN IF NOT EXISTS score_given INTEGER;
+
+CREATE INDEX IF NOT EXISTS idx_attempts_errorbook_active
+  ON user_attempts(user_id, created_at DESC)
+  WHERE in_error_book = true AND mastered = false;
+
+-- The backend and frontend already support true_false; schema must match.
+ALTER TABLE paper_questions
+  DROP CONSTRAINT IF EXISTS paper_questions_question_type_check;
+
+ALTER TABLE paper_questions
+  ADD CONSTRAINT paper_questions_question_type_check
+  CHECK (question_type IN ('mc', 'true_false', 'fill_blank', 'long_question'));
--- a/supabase/migrations/003_question_taxonomy_fields.sql
+++ b/supabase/migrations/003_question_taxonomy_fields.sql
@@ -0,0 +1,41 @@
+-- ============================================
+-- PastPaper Master — Question taxonomy fields
+-- Version: 003
+-- Date: 2026-03-24
+-- ============================================
+
+-- A question needs multiple classification layers:
+-- 1) question_format: how the student interacts with it
+-- 2) topic_tags / topic_primary / analytics_topic: course knowledge taxonomy
+-- 3) skill_tags: what kind of thinking task the question requires
+ALTER TABLE paper_questions
+  ADD COLUMN IF NOT EXISTS question_format TEXT
+    CHECK (
+      question_format IN (
+        'mc',
+        'true_false',
+        'fill_blank',
+        'short_answer',
+        'long_answer',
+        'coding'
+      )
+    ),
+  ADD COLUMN IF NOT EXISTS topic_primary TEXT,
+  ADD COLUMN IF NOT EXISTS analytics_topic TEXT,
+  ADD COLUMN IF NOT EXISTS topic_tags TEXT[],
+  ADD COLUMN IF NOT EXISTS skill_tags TEXT[];
+
+-- Keep the legacy topics column for backward compatibility for now.
+-- New analytics and retrieval code should gradually move to analytics_topic/topic_tags.
+
+CREATE INDEX IF NOT EXISTS idx_questions_question_format
+  ON paper_questions(question_format);
+
+CREATE INDEX IF NOT EXISTS idx_questions_analytics_topic
+  ON paper_questions(analytics_topic);
+
+CREATE INDEX IF NOT EXISTS idx_questions_topic_tags
+  ON paper_questions USING GIN(topic_tags);
+
+CREATE INDEX IF NOT EXISTS idx_questions_skill_tags
+  ON paper_questions USING GIN(skill_tags);
--- a/supabase/migrations/004_decouple_course_library_from_auth.sql
+++ b/supabase/migrations/004_decouple_course_library_from_auth.sql
@@ -0,0 +1,30 @@
+-- ============================================
+-- PastPaper Master — Decouple course library papers from auth users
+-- Version: 004
+-- Date: 2026-03-24
+-- ============================================
+
+-- Course-library papers should not depend on a concrete auth.users row.
+-- User-uploaded papers still keep user_id populated.
+ALTER TABLE papers
+  ALTER COLUMN user_id DROP NOT NULL;
+
+-- Keep existing FK so user-owned papers can still reference auth.users,
+-- while course-library rows simply use NULL.
+
+-- Tighten the intended invariant with a check constraint:
+-- - user_upload rows must have user_id
+-- - course_library rows must not have user_id
+ALTER TABLE papers
+  DROP CONSTRAINT IF EXISTS papers_source_kind_user_id_check;
+
+ALTER TABLE papers
+  ADD CONSTRAINT papers_source_kind_user_id_check
+  CHECK (
+    (source_kind = 'user_upload' AND user_id IS NOT NULL)
+    OR
+    (source_kind = 'course_library' AND user_id IS NULL)
+  );
+
+-- Existing RLS policies continue to apply to user-owned rows.
+-- Course-library rows are accessed through the backend service role.
--- a/supabase/migrations/005_allow_long_question_format_alias.sql
+++ b/supabase/migrations/005_allow_long_question_format_alias.sql
@@ -0,0 +1,27 @@
+-- ============================================
+-- PastPaper Master — Allow legacy long_question format alias
+-- Version: 005
+-- Date: 2026-03-24
+-- ============================================
+--
+-- Some existing seeds and older generated SQL used `long_question` in the
+-- `question_format` column, while the 003 taxonomy migration introduced
+-- `long_answer` as the canonical value. Allow both temporarily so historical
+-- inserts do not fail. New generators should continue emitting `long_answer`.
+
+ALTER TABLE paper_questions
+  DROP CONSTRAINT IF EXISTS paper_questions_question_format_check;
+
+ALTER TABLE paper_questions
+  ADD CONSTRAINT paper_questions_question_format_check
+  CHECK (
+    question_format IN (
+      'mc',
+      'true_false',
+      'fill_blank',
+      'short_answer',
+      'long_answer',
+      'long_question',
+      'coding'
+    )
+  );
--- a/supabase/migrations/006_make_scores_numeric.sql
+++ b/supabase/migrations/006_make_scores_numeric.sql
@@ -0,0 +1,17 @@
+-- ============================================
+-- PastPaper Master — Make score fields numeric
+-- Version: 006
+-- Date: 2026-04-10
+-- ============================================
+
+ALTER TABLE paper_questions
+  ALTER COLUMN score TYPE NUMERIC
+  USING score::NUMERIC;
+
+ALTER TABLE papers
+  ALTER COLUMN total_score TYPE NUMERIC
+  USING total_score::NUMERIC;
+
+ALTER TABLE user_attempts
+  ALTER COLUMN score_given TYPE NUMERIC
+  USING score_given::NUMERIC;
--- a/supabase/migrations/007_fulltext_search.sql
+++ b/supabase/migrations/007_fulltext_search.sql
@@ -0,0 +1,36 @@
+-- 007: Full-text search on paper_questions.question_text
+--
+-- Adds a tsvector generated column (auto-maintained by PostgreSQL on every
+-- INSERT/UPDATE), a GIN index for fast @@ queries, and a batch-scoring RPC
+-- used by the similar-question retrieval endpoint.
+
+ALTER TABLE paper_questions
+  ADD COLUMN IF NOT EXISTS search_text tsvector
+  GENERATED ALWAYS AS (
+    to_tsvector('english', coalesce(question_text, ''))
+  ) STORED;
+
+CREATE INDEX IF NOT EXISTS idx_pq_search_text
+  ON paper_questions USING gin(search_text);
+
+-- text_similarity_scores(query_text, candidate_ids)
+--   Returns one row per candidate ID with a ts_rank_cd score normalised by
+--   unique word count (normalization flag = 1).  Questions that share no
+--   lexemes with the query still appear in the result with score = 0 so the
+--   caller always gets a complete score map for every candidate.
+CREATE OR REPLACE FUNCTION text_similarity_scores(
+  query_text    text,
+  candidate_ids uuid[]
+)
+RETURNS TABLE (question_id uuid, text_score float4)
+LANGUAGE sql STABLE AS $$
+  SELECT
+    id,
+    ts_rank_cd(
+      search_text,
+      plainto_tsquery('english', query_text),
+      1   -- normalise by unique word count
+    )::float4
+  FROM paper_questions
+  WHERE id = ANY(candidate_ids);
+$$;
--- a/supabase/migrations/008_add_page_y_ratio.sql
+++ b/supabase/migrations/008_add_page_y_ratio.sql
@@ -0,0 +1,2 @@
+ALTER TABLE paper_questions
+  ADD COLUMN IF NOT EXISTS page_y_ratio NUMERIC;
--- a/supabase/migrations/008_fix_storage_url_placeholder.sql
+++ b/supabase/migrations/008_fix_storage_url_placeholder.sql
@@ -0,0 +1,27 @@
+-- 008: Replace __SUPABASE_STORAGE_PUBLIC_BASE_URL__ placeholder in paper URLs
+--
+-- The course-library seed (comp2211_course_library_papers.sql) was inserted
+-- without substituting the placeholder.  This migration replaces it with the
+-- real Supabase Storage public base URL for the `papers` bucket.
+
+UPDATE papers
+SET paper_file_url = REPLACE(
+  paper_file_url,
+  '__SUPABASE_STORAGE_PUBLIC_BASE_URL__',
+  'https://pvcxipwovpwrurebouwg.supabase.co/storage/v1/object/public/papers'
+)
+WHERE paper_file_url LIKE '%__SUPABASE_STORAGE_PUBLIC_BASE_URL__%';
+
+UPDATE papers
+SET answer_file_url = REPLACE(
+  answer_file_url,
+  '__SUPABASE_STORAGE_PUBLIC_BASE_URL__',
+  'https://pvcxipwovpwrurebouwg.supabase.co/storage/v1/object/public/papers'
+)
+WHERE answer_file_url LIKE '%__SUPABASE_STORAGE_PUBLIC_BASE_URL__%';
+
+-- Verify: should return 0 rows
+SELECT id, course_code, year, term, exam_type, paper_file_url, answer_file_url
+FROM papers
+WHERE paper_file_url  LIKE '%__SUPABASE_STORAGE_PUBLIC_BASE_URL__%'
+   OR answer_file_url LIKE '%__SUPABASE_STORAGE_PUBLIC_BASE_URL__%';
--- a/Show More
+++ b/Show More