Initial commit: PastPaper Master full stack

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Zhao
2026-04-21 12:15:35 +07:00
commit 7a09167261
105 changed files with 24799 additions and 0 deletions

12
.gitignore vendored Normal file
View File

@@ -0,0 +1,12 @@
.env
.env.*
node_modules/
__pycache__/
*.pyc
.DS_Store
dist/
.claude/
.venv/
pastpaper-scraper/
pastpaper/
*.pdf

Binary file not shown.

After

Width:  |  Height:  |  Size: 103 KiB

328
HANDOFF_COMP2211.md Normal file
View File

@@ -0,0 +1,328 @@
# COMP2211 Handoff
## Current Status
`COMP2211` course-library papers are now fully loaded into Supabase and normalized to subquestion-level granularity.
Canonical papers currently in DB:
- `COMP2211-2022-fall-midterm`
- `COMP2211-2022-spring-midterm`
- `COMP2211-2022-spring-final-part-a`
- `COMP2211-2022-spring-final-part-b`
- `COMP2211-2023-spring-midterm`
- `COMP2211-2024-spring-midterm`
- `COMP2211-2024-spring-final`
All seven papers are:
- `status = ready`
- split to subquestion level
- tagged with `analytics_topic`, `topic_primary`, `topic_tags`, `skill_tags`
Question counts:
- 2022 fall midterm: `43`
- 2022 spring midterm: `38`
- 2022 spring final part A: `24`
- 2022 spring final part B: `19`
- 2023 spring midterm: `36`
- 2024 spring midterm: `42`
- 2024 spring final: `48`
## Key Files
Schema / SQL:
- [001_init_schema.sql](/Users/soda/Desktop/PastPaper%20Master/supabase/migrations/001_init_schema.sql)
- [002_course_library_fields.sql](/Users/soda/Desktop/PastPaper%20Master/supabase/migrations/002_course_library_fields.sql)
- [003_question_taxonomy_fields.sql](/Users/soda/Desktop/PastPaper%20Master/supabase/migrations/003_question_taxonomy_fields.sql)
- [004_decouple_course_library_from_auth.sql](/Users/soda/Desktop/PastPaper%20Master/supabase/migrations/004_decouple_course_library_from_auth.sql)
- [005_allow_long_question_format_alias.sql](/Users/soda/Desktop/PastPaper%20Master/supabase/migrations/005_allow_long_question_format_alias.sql)
- [006_make_scores_numeric.sql](/Users/soda/Desktop/PastPaper%20Master/supabase/migrations/006_make_scores_numeric.sql)
Course-library seeds:
- [comp2211_course_library_papers.sql](/Users/soda/Desktop/PastPaper%20Master/supabase/seeds/comp2211_course_library_papers.sql)
- [comp2211_problem_taxonomy_backfill.sql](/Users/soda/Desktop/PastPaper%20Master/supabase/seeds/comp2211_problem_taxonomy_backfill.sql)
- [comp2211_problem_level_questions.sql](/Users/soda/Desktop/PastPaper%20Master/supabase/seeds/comp2211_problem_level_questions.sql)
Manual splitters used for final subquestion rebuild:
- [split_comp2211_2022_spring_midterm.py](/Users/soda/Desktop/PastPaper%20Master/backend/split_comp2211_2022_spring_midterm.py)
- [split_comp2211_2022_spring_final_part_a.py](/Users/soda/Desktop/PastPaper%20Master/backend/split_comp2211_2022_spring_final_part_a.py)
- [split_comp2211_2022_spring_final_part_b.py](/Users/soda/Desktop/PastPaper%20Master/backend/split_comp2211_2022_spring_final_part_b.py)
- [split_comp2211_2023_spring_midterm.py](/Users/soda/Desktop/PastPaper%20Master/backend/split_comp2211_2023_spring_midterm.py)
- [split_comp2211_2024_spring_midterm.py](/Users/soda/Desktop/PastPaper%20Master/backend/split_comp2211_2024_spring_midterm.py)
- [split_comp2211_2024_spring_final.py](/Users/soda/Desktop/PastPaper%20Master/backend/split_comp2211_2024_spring_final.py)
Deprecated filler script:
- [fill_manual_study_aids.py](/Users/soda/Desktop/PastPaper%20Master/backend/fill_manual_study_aids.py)
Audit / taxonomy references:
- [COMP2211.json](/Users/soda/Desktop/PastPaper%20Master/pastpaper-scraper/manifests/COMP2211.json)
- [COMP2211_taxonomy.json](/Users/soda/Desktop/PastPaper%20Master/pastpaper-scraper/manifests/COMP2211_taxonomy.json)
- [summary.json](/Users/soda/Desktop/PastPaper%20Master/pastpaper-scraper/reviews/COMP2211/summary.json)
- [problem_topics.json](/Users/soda/Desktop/PastPaper%20Master/pastpaper-scraper/reviews/COMP2211/problem_topics.json)
- [problem_seed.json](/Users/soda/Desktop/PastPaper%20Master/pastpaper-scraper/reviews/COMP2211/problem_seed.json)
Frontend / backend areas already adapted to real taxonomy:
- [frontend/src/pages/HomePage.tsx](/Users/soda/Desktop/PastPaper%20Master/frontend/src/pages/HomePage.tsx)
- [frontend/src/pages/AnalyticsPage.tsx](/Users/soda/Desktop/PastPaper%20Master/frontend/src/pages/ErrorBookPage.tsx)
- [frontend/src/components/workbench/SimilarHistoryPanel.tsx](/Users/soda/Desktop/PastPaper%20Master/frontend/src/components/workbench/SimilarHistoryPanel.tsx)
- [backend/app/routers/analytics.py](/Users/soda/Desktop/PastPaper%20Master/backend/app/routers/analytics.py)
- [backend/app/routers/questions.py](/Users/soda/Desktop/PastPaper%20Master/backend/app/routers/questions.py)
- [backend/app/routers/attempts.py](/Users/soda/Desktop/PastPaper%20Master/backend/app/routers/attempts.py)
## Important Product / Data Decisions Already Made
### Course library vs user upload
This is now separated semantically inside `papers`:
- `source_kind = 'course_library'` for platform-owned papers
- `source_kind = 'user_upload'` for user-contributed papers
Course-library papers no longer require `user_id`.
### Taxonomy model
`question_type` is not the main analytics dimension.
Current intended usage:
- `question_type` / `question_format`: rendering and answer interaction
- `analytics_topic`: normalized analytics bucket
- `topic_tags`: multi-tag topical indexing
- `skill_tags`: finer-grained retrieval / grading / similarity support
### Score field
Scores are `NUMERIC`, not integer, because many subquestions use fractional marks like `1.5`.
## Known Issues
### 1. Similar question retrieval is still not truly production-ready
Current state:
- backend route exists
- frontend panel exists
- demo fallback still exists in the UI when retrieval returns empty / fails
What needs to be done:
- remove demo fallback behavior once real retrieval is stable
- improve ranking beyond current basic topic/type matching
- ideally add indexed text retrieval, then embeddings if needed
Recommended order:
1. build deterministic same-course retrieval first
2. rank by `analytics_topic`, `topic_tags`, `skill_tags`, `question_format`, text similarity
3. only then consider vector search
### 2. Analytics is real, but still not the final version
Current state:
- analytics already reads real DB data
- taxonomy fields are being used
Still missing:
- better topic normalization for edge cases
- per-paper and per-subtopic drill-down
- cleaner stats for mixed-format questions
- confidence around aggregated counts across all courses, not only `COMP2211`
### 3. LaTeX / math rendering is still fragile
Known symptoms:
- OCR / extracted math strings are noisy
- some generated HTML contains malformed or hard-to-read math fragments
- not all backend feedback is rendered with the same quality
What needs work:
- normalize math strings before rendering
- improve KaTeX preprocessing
- avoid dumping broken extracted formulas directly into UI
- ensure solution / feedback content is consistently rendered through the same component path
### 4. Presentation quality is still uneven
Data is now real, but UI still needs polish:
- question nav is still too weak for long real papers
- status / difficulty / topic chips can be clearer
- workbench hierarchy is inconsistent across question types
- some pages still read like an internal demo rather than a finished study product
### 5. User upload flow still lacks dedup / library filtering
This is the next big backend product task.
Desired logic:
- when user uploads a paper, compare against existing course-library papers
- if it is already covered, do not create a duplicate paper
- if it is new, ingest it as `user_upload`
- if high quality and non-duplicate, optionally promote into library workflow later
### 6. Most non-Spring-2024 study aids are contaminated by template filler content
Current state:
- `COMP2211-2022-fall-midterm` has question-level LLM-authored study aids
- `COMP2211-2024-spring-midterm` is the intended quality bar
- the remaining papers were backfilled with a deprecated template script and should not be treated as production-quality AI content
Impact:
- `knowledge_reminder` is often generic topic boilerplate
- `ai_hint` often points to a parent problem header instead of the actual subquestion
- `solution` is often just wrapped reference text, not a true worked solution
Required action:
1. detect and clear templated study aids from affected papers
2. regenerate them through the real LLM path in [paper_processor.py](/Users/soda/Desktop/PastPaper%20Master/backend/app/services/paper_processor.py)
3. review output quality before marking the papers as complete
## Next Major Workstreams
### A. Real similar-question retrieval
Goal:
- no demo fallback
- same-course retrieval that feels trustworthy
Suggested implementation:
1. add a richer retrieval score in [questions.py](/Users/soda/Desktop/PastPaper%20Master/backend/app/routers/questions.py)
2. use:
- same `course_code`
- same `analytics_topic`
- overlapping `topic_tags`
- overlapping `skill_tags`
- same or compatible `question_format`
- lexical similarity on `question_text`
3. expose match reasons in response if useful
4. update UI to show why a question was retrieved
Potential DB improvement:
- add `search_text` / `tsvector` on `paper_questions`
- later optionally add `embedding`
### B. Real paper / topic statistics
Goal:
- analytics should be fully trustworthy at subquestion level
Suggested improvements:
- topic frequency by `analytics_topic`
- question-format distribution by subquestion, not by top-level problem
- per-paper breakdown
- high-yield topic trend across years
- topic-to-question index page for drill mode
### C. LaTeX and content rendering cleanup
Goal:
- all math-heavy content should render legibly
Suggested work:
- centralize HTML + KaTeX normalization
- strip broken OCR artifacts before render
- make study-aid content generation avoid malformed formula formatting
- ensure grading feedback and solutions share the same renderer pipeline
### D. User upload deduplication and library filtering
Goal:
- new uploads should not pollute the DB with duplicates
Suggested logic:
1. normalize upload metadata
2. compare against existing papers in same course:
- year / term / exam_type / part_label
- title similarity
- extracted first-page markers
- optional text fingerprint
3. if duplicate:
- attach to existing paper or reject with explanation
4. if not duplicate:
- create `user_upload`
- process normally
Likely schema additions later:
- content fingerprint field on `papers`
- upload provenance fields
- moderation / promotion state for community uploads
### E. UI / UX pass
Priority items:
- stronger question navigation for real papers
- clearer ready / processing / failed states
- better paper list and filtering UX
- richer workbench metadata:
- topic
- difficulty
- format
- score
- answered / wrong / mastered state
- unify visual style across analytics, error book, workbench
## Suggested Development Order
1. Remove similar-question demo fallback and ship real retrieval
2. Improve analytics and topic drill views using subquestion-level data
3. Fix LaTeX / rendering quality
4. Build upload dedup / filtering against existing library papers
5. Do a focused UI / UX pass after the real data flows are stable
## Operational Notes
### Frontend entry issue that was fixed
Homepage was previously still using mock papers and an old hardcoded `COMP2211` id.
It now reads real papers from `listPapers()`.
### Manual content generation
The current `COMP2211` three-piece study aids were filled manually through local scripts and deterministic templates, not through external LLM batch processing. This is deliberate and keeps the current dataset stable.
### If rebuilding papers again
For `COMP2211`, use the manual splitters rather than rerunning generic extraction blindly. `2024-spring-midterm` especially required reconstruction from PDF page spans because the earlier top-level extraction had already truncated `Problem 5` and `Problem 7`.
## Ready-to-Verify Checklist
If you want to sanity-check the current product quickly:
1. Open home page and filter `COMP2211`
2. Open each paper and confirm `status = ready`
3. Check question count matches:
- `43 / 38 / 24 / 19 / 36 / 42 / 48`
4. Open analytics page for `COMP2211`
5. Open several papers and verify:
- question nav loads
- AI trio exists
- topics render
- similar-question panel does not block the page

516
TECHNICAL.md Normal file
View File

@@ -0,0 +1,516 @@
# PastPaper Master — 技术文档
## 系统架构总览
```
┌─────────────────────────────────────────────────────────────────┐
│ Frontend (React 19 + Vite 7) │
│ Pages: Home / Upload / Workbench / ErrorBook │
│ PDF: react-pdf v10 | Math: KaTeX 0.16 | Style: Tailwind v4 │
└────────────────────────────┬────────────────────────────────────┘
│ /api (Vite proxy → :8000)
┌────────────────────────────▼────────────────────────────────────┐
│ Backend (FastAPI + Python) │
│ Routers: papers / attempts / questions │
│ Services: paper_processor / grader / llm_clients / text_extractor│
└────────┬───────────────────┬──────────────────┬─────────────────┘
│ │ │
┌─────▼─────┐ ┌────────▼───────┐ ┌───────▼──────┐
│ Supabase │ │ GPT-4o │ │ Qwen-plus │
│ PostgreSQL │ │ (laozhang API) │ │ (DashScope) │
│ + Storage │ │ 结构化/OCR/变体 │ │ AI三件套/判分 │
└───────────┘ └────────────────┘ └──────────────┘
```
**技术栈一览:**
- **Frontend**: React 19, TypeScript, Vite 7, Tailwind CSS v4, react-pdf v10, KaTeX 0.16
- **Backend**: FastAPI, Python 3.12, uv (包管理)
- **Database**: Supabase (PostgreSQL + Row Level Security)
- **Storage**: Supabase Storage (buckets: `papers`, `attempt-photos`)
- **LLM**: GPT-4o (laozhang API 代理), Qwen-plus (阿里 DashScope)
---
## 数据库 Schema
> 文件: `supabase/migrations/001_init_schema.sql`
### Table: `papers` — 试卷
| 字段 | 类型 | 说明 |
|------|------|------|
| id | UUID PK | 自动生成 |
| user_id | UUID FK → auth.users | 上传者 |
| course_code | TEXT | 课程代码, e.g. "COMP2011" |
| year / term / exam_type | INT/TEXT/TEXT | 元信息 |
| paper_file_url | TEXT | 试卷 PDF (Supabase Storage) |
| answer_file_url | TEXT? | 答案 PDF (可选) |
| status | TEXT | `uploaded``processing``ready` / `error` |
| paper_extracted_text | TEXT | PyMuPDF 提取的原始文本 (缓存) |
| total_score / question_count | INT | AI 提取的整卷概览 |
| topics_summary | JSONB | `{"Linked List": 40, "Recursion": 30}` |
| difficulty_level | TEXT | easy / medium / hard |
### Table: `paper_questions` — 逐题数据
| 字段 | 类型 | 说明 |
|------|------|------|
| id | UUID PK | |
| paper_id | UUID FK → papers | |
| question_number | TEXT | "1", "1a", "2b" |
| parent_question | TEXT? | 子题父题号: "1a" → "1" |
| display_order | INT | 排序 |
| question_type | TEXT | `mc` / `true_false` / `fill_blank` / `long_question` |
| question_text | TEXT | 题目原文 |
| score / page_number | INT | 分值, PDF 页码 (PDF-题目联动用) |
| options | JSONB | MC 选项: `[{"label":"A","text":"..."}]` |
| correct_option | TEXT | MC 正确选项 |
| correct_answer | TEXT | 填空题正确答案 |
| raw_answer_text | TEXT | 答案 PDF 原始解<E5A78B><E8A7A3> |
| topics | TEXT[] | 知识点标签 |
| difficulty | TEXT | easy / medium / hard |
| knowledge_reminder | TEXT | AI 知识点提醒 (HTML+KaTeX) |
| ai_hint | TEXT | AI 思路提示 (HTML+KaTeX) |
| solution | TEXT | AI 完整解题过程 (HTML+KaTeX) |
### Table: `user_attempts` — 用户答题记录
| 字段 | 类型 | 说明 |
|------|------|------|
| id | UUID PK | |
| user_id / question_id | UUID FK | |
| attempt_type | TEXT | `select` / `input` / `photo` |
| user_answer | TEXT | 用户的选项或输入 |
| photo_url / photo_ocr_text | TEXT | 拍照上传的图片和 OCR 结果 |
| is_correct | BOOL | AI 判定 |
| feedback | TEXT | HTML 逐步错误分析 |
| error_at_step | INT | 第几步出错 |
| in_error_book / mastered | BOOL | 错题本状态 |
---
## 核心功能一:试卷分析管线
### 流程概述
```
用户上传 PDF → 后台 BackgroundTask → 5 步管线 → 状态变 ready
```
### 文件
| 文件 | 作用 |
|------|------|
| `backend/app/routers/papers.py` | 上传接口, 触发后台处理 |
| `backend/app/services/paper_processor.py` | **核心管线**, 5 步处理逻辑 |
| `backend/app/services/text_extractor.py` | PDF → 文本提取 (PyMuPDF) |
| `backend/app/services/llm_clients.py` | GPT-4o / Qwen 客户端单例 |
### 管线 5 步 (`paper_processor.py: process_paper()`)
**Step 1 — PDF 文本提取**
- 使用 PyMuPDF (`fitz`) 逐页提取文本
- 如果某页文本 < 50 字符 (可能是扫描件), 额外保存该页为 base64 图片备用
- 提取结果缓存到 `papers.paper_extracted_text`
```python
# text_extractor.py
extract_pdf(file_bytes) ExtractedContent(pages_text, page_images, total_pages, has_images)
get_full_text(extracted) "--- Page 1 ---\n{text}\n\n--- Page 2 ---\n..."
```
**Step 2 — GPT-4o 结构化拆题**
- Model: `gpt-4o`, temperature=0, response_format=json_object
- 输入: 整卷文本
- 输出: JSON 包含 total_score, difficulty_level, topics_summary, questions[]
- 每题提取: question_number, parent_question, question_type, question_text, score, page_number, options, topics, difficulty
- 更新 `papers` 表的概览字段 (total_score, question_count, topics_summary, difficulty_level)
**Step 3 — 答案匹配 (如果有答案 PDF)**
- Model: `gpt-4o`, temperature=0
- 输入: 题目结构 JSON + 答案文本
- 输出: 逐题匹配 — correct_option / correct_answer / raw_answer_text
- 选择题 → correct_option, 填空题 → correct_answer, 大题 → raw_answer_text
**Step 4 — Qwen 生成 AI 三件套 (逐题)**
- Model: `qwen-plus`, temperature=0.3
- 逐题调用, 输入题目信息 + 标准答案
- 输出 JSON 三件套:
- `knowledge_reminder`: 前置知识要点 (HTML+KaTeX)
- `ai_hint`: 不给答案的思路引导 (HTML+KaTeX)
- `solution`: 完整逐步解题过程 (HTML+KaTeX)
- 写入 `paper_questions`
**Step 5 — 标记完成**
- `papers.status` 更新为 `ready`
- 如果任何步骤抛异常, status 设为 `error`, 错误信息写入 `error_message`
### 关键 Prompt 设计
**STRUCTURE_PROMPT** — 结构化拆题
- 限定 question_type 只能是 mc / true_false / fill_blank / long_question
- 判断题 (True/False) 用 `true_false` 类型options 为 `[{label:"True",text:"True"},{label:"False",text:"False"}]`
- 选择题必须提取 options 数组
- 子题通过 parent_question 关联 (e.g. "1a" parent 是 "1")
- 要求推断 page_number, topics, difficulty
**ANSWER_MATCH_PROMPT** — 答案匹配
- 输入包含 questions_json (题号+题型) 和 answer_text
- 按题型输出不同字段: MC → correct_option, fill → correct_answer, 大题 → raw_answer_text
**ANALYSIS_PROMPT** — AI 三件套
- Solution 要求带完整过程 (Step 1, 2, 3...), 不能只给答案
- 选择题要解释为什么对、为什么其他选项错
- 标注常见错误: `<div class="common-error">...</div>`
- KaTeX 规则: 块级 `$$...$$`, 行内 `$...$`
---
## 核心功能二PDF 滚动 + 题目联动
### 文件
| 文件 | 作用 |
|------|------|
| `frontend/src/components/workbench/PdfViewer.tsx` | PDF 连续滚动渲染 + 可见页检测 |
| `frontend/src/components/workbench/QuestionNav.tsx` | 题目水平导航栏 |
| `frontend/src/pages/WorkbenchPage.tsx` | 双向联动调度中枢 |
### 实现方案
**布局**: 左侧 60% PDF, 右侧 40% 题目面板
**PDF 连续滚动 (`PdfViewer.tsx`)**
- 使用 `react-pdf``<Document>` + `<Page>` 组件
- 所有页面垂直排列在可滚动容器中 (不是单页切换)
- `ResizeObserver` 监听容器宽度, 动态设置 Page width
- 手动跳转: 输入页码 → `scrollIntoView`
**双向联动:**
1. **题目 → PDF (点击题目, PDF 滚动到对应页)**
- QuestionNav 点击 → `handleQuestionSelect(index)` → 记录 `lastUserSelectTime = Date.now()` + `setCurrentIndex`
- PdfViewer 收到 `currentPage` prop 变化 → `useEffect` 触发 `el.scrollIntoView({ behavior: "smooth" })`
- 设置 `programmaticScroll.current = true`, 2s 后重置
2. **PDF → 题目 (滚动 PDF, 右侧自动切换到当前题)**
- `IntersectionObserver` 监听所有 `<Page>` 元素, threshold: `[0, 0.25, 0.5, 0.75, 1]`
- 追踪每页的 `intersectionRatio`, 选出可见占比最高的页码
- 如果 `programmaticScroll.current === true`, 跳过回调
- 触发 `onPageChange(bestPage)` → WorkbenchPage `handlePdfPageChange`
- `handlePdfPageChange`: 找到 `page_number <= currentPage` 的最后一题, 更新 `currentIndex`
**防止跳转抢夺 (双层保护):**
- **WorkbenchPage 层 (核心)**: `lastUserSelectTime` ref — 用户点击题目后 2 秒内, `handlePdfPageChange` 直接 return, 不响应任何 Observer 回调。解决长文档 smooth scroll 经过中间页触发 Observer 导致题目被切走的问题
- **PdfViewer 层 (辅助)**: `programmaticScroll` ref — scrollIntoView 期间 Observer 回调跳过, 2s 后重置
---
## 核心功能三:做题交互 (MC / 填空)
### 文件
| 文件 | 作用 |
|------|------|
| `frontend/src/components/workbench/QuestionDetail.tsx` | 题目展示 + 答题交互 |
| `frontend/src/components/workbench/AiTrioPanel.tsx` | 知<><E79FA5>点/提示/解析 折叠面板 |
| `frontend/src/components/shared/CollapsibleSection.tsx` | 可折叠区域组件 |
| `frontend/src/components/shared/KaTeXRenderer.tsx` | HTML+KaTeX 渲染器 |
### QuestionDetail 交互逻辑
**选择题 (MC):**
- 状态: `selectedOption`, `checked`
- 点击选项 → 高亮蓝色 (未检查时)
- 点击 "Check Answer" → `checked=true`
- 正确: 选项变绿 + "Correct!" / 错误: 选中项变红, 正确项变绿 + 显示正确答案
- 切换题目时自动重置状态 (`useEffect` on `question.id`)
**判断题 (True/False):**
- 状态: `tfAnswers: Record<string, "True" | "False">`, `tfChecked`
- 每个 statement 右侧有 T / F 两个按钮, 独立切换
- 选中高亮蓝色, 全部选完后可点 "Submit Answers"
- 提交后提示查看 solution 对答案 (因为逐条正确答案暂未单独存储)
**填空题 (Fill Blank):**
- 文本输入框 + "Check" 按钮
- Enter 键可直接检查
- 大小写不敏感比较 (`toLowerCase()`)
- 检查后输入框变色: 绿色 (对) / 红色 (错)
**回调**: `onAnswerResult(isCorrect, userAnswer)` → WorkbenchPage → `recordAttempt` API
### AiTrioPanel
- 三个 `CollapsibleSection`: Knowledge Reminder (蓝, 默认展开), AI Hint (琥珀), Solution (绿)
- `CollapsibleSection` 使用 CSS `grid-template-rows: 0fr → 1fr` 动画平滑展开收起
- 内容通过 `KaTeXRenderer` 渲染 (HTML + KaTeX 公式)
---
## 核心功能四:变体题生成 (Similar Question)
### 文件
| 文件 | 作用 |
|------|------|
| `backend/app/routers/questions.py` | `POST /{question_id}/variant` 端点 |
| `backend/app/services/grader.py` | `generate_variant()` — GPT-4o 生成变体 |
| `frontend/src/components/workbench/ActionBar.tsx` | "Similar Question" 按钮, 异步触发 |
| `frontend/src/pages/WorkbenchPage.tsx` | Variants Tab 状态管理 |
| `frontend/src/components/workbench/VariantDetail.tsx` | 变体题作答界面 |
### 后端
- `POST /api/questions/{question_id}/variant`
- 从 DB 查原题 → 调 `generate_variant(question)` → 附上原题的 `knowledge_reminder` → 返回
- Model: `gpt-4o`, temperature=0.5, response_format=json_object
- VARIANT_PROMPT 要求: 同知识点, 相似难度, 不同数据/场景, 输出 HTML 格式 (非 markdown)
- 输出字段: question_text, question_type, options (if MC), correct_answer, ai_hint, solution
### 前端交互 (Tab-based 异步流程)
**状态管理 (`WorkbenchPage.tsx`):**
```typescript
interface StoredVariant {
id: string; // placeholder ID, e.g. "variant-1"
sourceQuestionNumber: string; // 原题题号
variant: VariantQuestion; // 生成结果
status: "generating" | "ready";
}
```
**流程:**
1. 用户点击 "Similar Question" → `ActionBar``onVariantStart(placeholderId, questionNumber)`
2. WorkbenchPage 创建 `status: "generating"` 的占位项, 用户可继续做题不受阻塞
3. API 返回后 → `onVariantReady(placeholderId, variant)` → 状态更新为 `ready`
4. 失败 → `onVariantFailed(placeholderId)` → 删除占位项
**右侧面板三种视图:**
- **Questions Tab**: 题目导航 + QuestionDetail + AiTrioPanel + ActionBar
- **Variants Tab**: 变体列表 (Generating.../Ready), 每项显示题号和预览文本
- **Variant Detail**: 点击 "Start" 后整个右侧替换为 VariantDetail 组件 + "Back" 按钮
**VariantDetail 组件**: 紫色主题, 包含完整 MC/填空交互 + AI 三件套 (CollapsibleSection)
---
## 核心功能五:拍照批改
### 文件
| 文件 | 作用 |
|------|------|
| `backend/app/routers/attempts.py` | `POST /photo` — 上传+OCR+批改 |
| `backend/app/services/grader.py` | `ocr_photo()` + `grade_answer()` |
| `frontend/src/components/workbench/PhotoUpload.tsx` | 拍照上传 Modal |
| `frontend/src/components/workbench/ActionBar.tsx` | "Upload handwritten answer" 按钮 |
### 后端流程
1. 接收图片 → 上传到 Supabase Storage `attempt-photos` bucket
2. `ocr_photo(photo_bytes)` — GPT-4o Vision 识别手写内容
- 输入: base64 图片
- 输出: 学生答案文本 (含 LaTeX 公式)
3. `grade_answer(question, student_answer)` — Qwen-plus 批改
- 输入: 题目信息 + 标准答案 + 学生答案
- 输出: `{ is_correct, score_given, feedback (HTML), error_at_step }`
4. 写入 `user_attempts` 表 (含 photo_url, photo_ocr_text, feedback, is_correct)
5. 答错自动 `in_error_book = true`
### 前端
- PhotoUpload: Modal 弹窗, 支持拖拽/点击选择图片
- 预览 → 提交 → 显示 OCR 识别结果 + AI 批改反馈
- 所有题型均可使用 (MC / 填空 / 大题)
---
## 核心功能六:错题本
### 文件
| 文件 | 作用 |
|------|------|
| `backend/app/routers/attempts.py` | `GET /error-book` + `PATCH /{attempt_id}` |
| `frontend/src/pages/ErrorBookPage.tsx` | 错题本页面 |
| `frontend/src/lib/api.ts` | `getErrorBook()` + `updateAttempt()` |
### 后端
- `GET /api/attempts/error-book?user_id=xxx`
- 查询 `in_error_book=true AND mastered=false`
- JOIN `paper_questions` 返回完整题目信息
- `PATCH /api/attempts/{attempt_id}`
- 更新 `in_error_book``mastered` 标记
### 前端
- 列表展示: 题目信息 + 用户答案 + AI 反馈
- 操作: "Review in Workbench" (跳转) / "Mastered" (标记掌握) / "Remove" (移出错题本)
---
## 核心功能七:答题记录
### 文件
| 文件 | 作用 |
|------|------|
| `backend/app/routers/attempts.py` | `POST /` — 记录答题 |
| `frontend/src/components/workbench/ActionBar.tsx` | "Got it right" / "Got it wrong" 按钮 |
### 流程
- "Got it right" → `POST /api/attempts/` with `attempt_type: "select", is_correct: true`
- "Got it wrong" → `POST /api/attempts/` with `attempt_type: "select", is_correct: false`
- 后端自动 `in_error_book = true`
- Toast 提示操作结果
---
## API 接口汇总
### Papers Router (`/api/papers`)
| Method | Path | 说明 |
|--------|------|------|
| GET | `/` | 列出所有试卷 (可按 user_id 过滤) |
| POST | `/upload` | 上传试卷 PDF + 可选答案 PDF |
| GET | `/{paper_id}` | 获<><E88EB7><EFBFBD>单份试卷信息 |
| GET | `/{paper_id}/questions` | 获取试卷所有题目 |
### Attempts Router (`/api/attempts`)
| Method | Path | 说明 |
|--------|------|------|
| POST | `/` | 记录一次答题 |
| POST | `/photo` | 拍照上传 + OCR + AI 批改 |
| GET | `/error-book?user_id=` | 获取错题本 |
| PATCH | `/{attempt_id}` | 更新错题本/掌握状态 |
### Questions Router (`/api/questions`)
| Method | Path | 说明 |
|--------|------|------|
| POST | `/{question_id}/variant` | 生成变体题 |
---
## 前端路由
| 路径 | 页面 | 文件 |
|------|------|------|
| `/` | 首页 — 试卷列表 | `src/pages/HomePage.tsx` |
| `/upload` | 上传试卷 | `src/pages/UploadPage.tsx` |
| `/paper/:id` | 做题工作台 | `src/pages/WorkbenchPage.tsx` |
| `/error-book` | 错题本 | `src/pages/ErrorBookPage.tsx` |
---
## 前端组件树 (Workbench)
```
WorkbenchPage
├── Header # 顶部导航 (课程+试卷标题)
├── PdfViewer # 左侧 60% — PDF 连续滚动
└── Right Panel (40%)
├── [Questions Tab]
│ ├── QuestionNav # 题目水平导航 Q1 Q2 Q3...
│ ├── QuestionDetail # 题目展示 + MC/填空交互
│ ├── AiTrioPanel # 知识点/提示/解析 (3x CollapsibleSection)
│ └── ActionBar # 底部按钮 (对/错/变体/拍照)
├── [Variants Tab]
│ └── Variant Cards # 变体列表 (Generating.../Ready)
└── [Variant Detail View] # 替换整个右侧
├── Back Button
└── VariantDetail # 变体题作答 + AI 三件套
```
---
## LLM 调用模型分工
| 任务 | 模型 | Provider | 文件 |
|------|------|----------|------|
| 结构化拆题 | gpt-4o | laozhang API | paper_processor.py |
| 答案匹配 | gpt-4o | laozhang API | paper_processor.py |
| AI 三件套 (knowledge/hint/solution) | qwen-plus | DashScope | paper_processor.py |
| 手写 OCR | gpt-4o (Vision) | laozhang API | grader.py |
| 答案批改 | qwen-plus | DashScope | grader.py |
| 变体题生成 | gpt-4o | laozhang API | grader.py |
---
## 配置与环境变量
> 文件: `backend/app/config.py`, `.env`
| 变量 | 说明 |
|------|------|
| SUPABASE_URL | Supabase 项目 URL |
| SUPABASE_ANON_KEY | 前端用匿名 Key |
| SUPABASE_SERVICE_ROLE_KEY | 后端用 Service Role Key (绕过 RLS) |
| LAOZHANG_BASE_URL | GPT-4o 代理 API 地址 |
| LAOZHANG_API_KEY | GPT-4o 代理 API Key |
| DASHSCOPE_BASE_URL | 阿里 DashScope API |
| DASHSCOPE_API_KEY | DashScope API Key |
---
## 文件完整索引
### Backend (`backend/app/`)
```
main.py # FastAPI 入口, CORS, 路由注册
config.py # Pydantic Settings, 环境变量
routers/
papers.py # 试卷 CRUD + 上传触发处理
attempts.py # 答题记录 + 拍照OCR批改 + 错题本
questions.py # 变体题生成
services/
paper_processor.py # 核心5步管线: PDF→结构化→答案匹配→AI三件套
text_extractor.py # PyMuPDF 文本提取
grader.py # OCR + 批改 + 变体生成 (Prompt + LLM 调用)
llm_clients.py # GPT-4o / Qwen 客户端单例
supabase_client.py # Supabase 客户端
```
### Frontend (`frontend/src/`)
```
App.tsx # React Router 路由定义
main.tsx # ReactDOM 入口
lib/
api.ts # 所有 API 调用封装 (9 个函数)
types/
api.ts # TypeScript 类型定义
hooks/
usePaper.ts # 轮询获取试卷状态 (3s interval)
useQuestions.ts # 获取题目列表
pages/
HomePage.tsx # 首页 — 试卷列表
UploadPage.tsx # 上传页
WorkbenchPage.tsx # 做题工作台 — 核心调度组件
ErrorBookPage.tsx # 错题本
components/
layout/
Header.tsx # 顶部导航栏
shared/
KaTeXRenderer.tsx # HTML+KaTeX 公式渲染
CollapsibleSection.tsx # 折叠面板 (grid动画)
StatusBadge.tsx # 状态标签
upload/
UploadForm.tsx # 上传表单
FilePickerField.tsx # 文件选择器
workbench/
PdfViewer.tsx # PDF 连续滚动 + IntersectionObserver
QuestionNav.tsx # 题目导航栏
QuestionDetail.tsx # 题目展示 + MC/填空交互
AiTrioPanel.tsx # AI 三件套面板
ActionBar.tsx # 底部操作按钮
PhotoUpload.tsx # 拍照上传 Modal
VariantDetail.tsx # 变体题内联作答
VariantModal.tsx # (已废弃, 被 VariantDetail 替代)
```

16
backend/Dockerfile Normal file
View File

@@ -0,0 +1,16 @@
FROM python:3.12-slim
WORKDIR /app
# System deps for PyMuPDF
RUN apt-get update && apt-get install -y --no-install-recommends \
libmupdf-dev gcc g++ && \
rm -rf /var/lib/apt/lists/*
COPY pyproject.toml .
RUN pip install --no-cache-dir .
COPY app/ app/
EXPOSE 8000
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

View File

@@ -0,0 +1,4 @@
ALTER TABLE papers
ADD COLUMN IF NOT EXISTS processing_step text DEFAULT NULL,
ADD COLUMN IF NOT EXISTS processing_progress integer DEFAULT 0,
ADD COLUMN IF NOT EXISTS processing_total integer DEFAULT 0;

0
backend/app/__init__.py Normal file
View File

36
backend/app/config.py Normal file
View File

@@ -0,0 +1,36 @@
from pydantic_settings import BaseSettings
from functools import lru_cache
import os
class Settings(BaseSettings):
# Supabase
supabase_url: str
supabase_anon_key: str
supabase_service_role_key: str
# LLM - laozhang (gpt-4o, gpt-4o-mini)
laozhang_base_url: str = "https://api.laozhang.ai/v1"
laozhang_api_key: str = ""
# LLM - DashScope (qwen-plus)
dashscope_base_url: str = "https://dashscope.aliyuncs.com/compatible-mode/v1"
dashscope_api_key: str = ""
# LLM - DeepSeek
deepseek_base_url: str = "https://api.deepseek.com/v1"
deepseek_api_key: str = ""
# Google Gemini (official)
google_gemini_api_key: str = ""
model_config = {
"env_file": os.path.join(os.path.dirname(__file__), "../../.env"),
"env_file_encoding": "utf-8",
"extra": "ignore",
}
@lru_cache
def get_settings() -> Settings:
return Settings()

View File

View File

@@ -0,0 +1,34 @@
"""Auth dependency: validate Supabase JWT and return user_id"""
from fastapi import Depends, HTTPException, status
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
from app.services.supabase_client import get_supabase
bearer_scheme = HTTPBearer(auto_error=False)
async def get_current_user_id(
credentials: HTTPAuthorizationCredentials | None = Depends(bearer_scheme),
) -> str:
"""Extract and validate Bearer token, return user_id."""
if not credentials:
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Not authenticated",
)
token = credentials.credentials
sb = get_supabase()
try:
result = sb.auth.get_user(token)
user = result.user
if not user:
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Invalid token",
)
return user.id
except Exception:
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Invalid or expired token",
)

59
backend/app/main.py Normal file
View File

@@ -0,0 +1,59 @@
import asyncio
import threading
from contextlib import asynccontextmanager
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from app.routers import analytics, papers, attempts, questions
def _resume_stale_papers():
"""启动时检查卡在 processing 的 paper自动续传 AI trio"""
try:
from app.services.supabase_client import get_supabase
from app.services.paper_processor import process_paper
sb = get_supabase()
stale = sb.table("papers").select("id").eq("status", "processing").execute().data
if not stale:
return
for p in stale:
paper_id = p["id"]
print(f"[STARTUP] Resuming processing for paper {paper_id[:8]}...")
def run(pid=paper_id):
asyncio.run(process_paper(pid, b"", None))
threading.Thread(target=run, daemon=True).start()
except Exception as e:
print(f"[STARTUP] Resume skipped: {e}")
@asynccontextmanager
async def lifespan(app: FastAPI):
# Startup
_resume_stale_papers()
yield
# Shutdown (nothing to do)
app = FastAPI(title="PastPaper Master API", version="0.1.0", lifespan=lifespan)
app.add_middleware(
CORSMiddleware,
allow_origins=["*"], # 开发阶段先放开,上线收紧
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
app.include_router(papers.router, prefix="/api/papers", tags=["papers"])
app.include_router(attempts.router, prefix="/api/attempts", tags=["attempts"])
app.include_router(questions.router, prefix="/api/questions", tags=["questions"])
app.include_router(analytics.router, prefix="/api/analytics", tags=["analytics"])
@app.get("/health")
def health():
return {"status": "ok"}

View File

View File

@@ -0,0 +1,285 @@
"""Course-level analytics endpoints."""
from __future__ import annotations
from collections import Counter, defaultdict
from fastapi import APIRouter
from app.services.supabase_client import get_supabase
router = APIRouter()
DIFFICULTY_SCORE = {"easy": 1, "medium": 2, "hard": 3}
DIFFICULTY_LABEL = {1: "Easy", 2: "Medium", 3: "Hard"}
# ── Topic normalization ──────────────────────────────────────
# Map variant spellings to canonical label
_TOPIC_ALIASES: dict[str, str] = {
"numpy": "NumPy",
"naïve bayes": "Naive Bayes",
"naïve bayes classifier": "Naive Bayes",
"naive bayes classifier": "Naive Bayes",
"bayes classifier": "Naive Bayes",
"bayes model": "Naive Bayes",
"bayes' theorem": "Naive Bayes",
"bayes' rule": "Naive Bayes",
"k-nearest neighbors": "K-Nearest Neighbors (KNN)",
"knn": "K-Nearest Neighbors (KNN)",
"k-means clustering": "K-Means Clustering",
"k-means": "K-Means Clustering",
"k means": "K-Means Clustering",
"multilayer perceptron": "Multilayer Perceptron (MLP)",
"multi-layer perceptron": "Multilayer Perceptron (MLP)",
"multi-layer perceptron (mlp)": "Multilayer Perceptron (MLP)",
"mlp": "Multilayer Perceptron (MLP)",
"single layer perceptron": "Perceptron",
"convolutional neural network": "CNN",
"convolutional neural network (cnn)": "CNN",
"convolutional neural networks": "CNN",
"cnn architecture": "CNN",
"cnn properties": "CNN",
"python fundamentals": "Python",
"python programming": "Python",
"python implementation": "Python",
"advanced python programming": "Python",
"python programming: convolutional neural network": "CNN",
"cross-validation": "Cross Validation",
"model evaluation implementation": "Model Evaluation",
"digital image processing": "Image Processing",
"computer vision": "Image Processing",
"array slicing": "Array Slicing",
"slicing": "Array Slicing",
"array indexing": "Array Slicing",
"array reshaping": "Reshape",
"array views": "Array Slicing",
"view vs copy": "Array Slicing",
"boolean indexing": "Array Slicing",
"arange": "NumPy",
"newaxis": "NumPy",
"expand dims": "NumPy",
"transpose": "NumPy",
"type casting": "NumPy",
"element-wise operation": "NumPy",
"array reduction": "NumPy",
"multi-dimensional array": "NumPy",
"dot product": "NumPy",
"vectorization": "NumPy",
"activation functions": "Activation Function",
"linear activation function": "Activation Function",
"neural network architecture": "Neural Networks",
"hidden layer": "Neural Networks",
"deep learning": "Neural Networks",
"deep learning frameworks": "Neural Networks",
"alpha-beta pruning": "Alpha-Beta Pruning",
"minimax algorithm": "Minimax",
"ethics of ai": "AI Ethics",
"ethics": "AI Ethics",
"cosine distance": "Cosine Similarity",
"distance calculation": "Distance Metrics",
"euclidean distance": "Distance Metrics",
"manhattan distance": "Distance Metrics",
"hamming distance": "Distance Metrics",
"precision": "Model Evaluation",
"recall": "Model Evaluation",
"f1 score": "Model Evaluation",
"macro f1 score": "Model Evaluation",
"accuracy": "Model Evaluation",
"classification accuracy": "Model Evaluation",
"confusion matrix": "Model Evaluation",
"convolution operation": "Convolution",
"dilated convolution": "Convolution",
"3d convolution": "Convolution",
"gaussian likelihood": "Probability",
"gaussian distribution": "Probability",
"categorical likelihood": "Probability",
"conditional probability": "Probability",
"total probability theorem": "Probability",
"probability assumptions": "Probability",
"tensorflow": "Keras",
"model summary": "Keras",
"model construction": "Keras",
"trainable parameters": "Parameter Calculation",
"parameter reduction": "Parameter Calculation",
"output shape calculation": "Parameter Calculation",
"shape calculation": "Parameter Calculation",
}
def normalize_topic(label: str) -> str:
return _TOPIC_ALIASES.get(label.lower().strip(), label)
def extract_topic_labels(question: dict) -> list[str]:
labels: list[str] = []
raw_labels: list[str] = []
analytics_topic = question.get("analytics_topic")
if analytics_topic:
raw_labels.append(analytics_topic)
for tag in question.get("topic_tags") or []:
if tag and tag not in raw_labels:
raw_labels.append(tag)
if not raw_labels:
for tag in question.get("topics") or []:
if tag and tag not in raw_labels:
raw_labels.append(tag)
# Normalize and deduplicate
seen: set[str] = set()
for raw in raw_labels:
norm = normalize_topic(raw)
if norm not in seen:
seen.add(norm)
labels.append(norm)
return labels
def extract_question_family(question: dict) -> str:
return (
question.get("question_format")
or question.get("question_type")
or "unknown"
)
@router.get("/courses")
async def list_courses():
"""返回所有有 ready 状态试卷的课程列表"""
sb = get_supabase()
rows = (
sb.table("papers")
.select("course_code")
.eq("status", "ready")
.execute()
.data
)
codes = sorted({row["course_code"] for row in rows if row.get("course_code")})
return codes
@router.get("/course/{course_code}")
async def get_course_analytics(course_code: str):
sb = get_supabase()
papers = (
sb.table("papers")
.select("id, course_code, year, term, exam_type, part_label, status")
.eq("course_code", course_code.upper())
.eq("status", "ready")
.order("year", desc=True)
.execute()
.data
)
if not papers:
return {
"course_code": course_code.upper(),
"kpi": {"papers": 0, "questions": 0, "topics": 0, "difficulty": "N/A"},
"topic_frequency": [],
"question_types": [],
"difficulty_distribution": {"easy": 0, "medium": 0, "hard": 0},
"high_yield_topics": [],
}
paper_ids = [paper["id"] for paper in papers]
questions = (
sb.table("paper_questions")
.select(
"id, paper_id, question_number, question_type, question_format, "
"question_text, score, topics, analytics_topic, topic_tags, difficulty"
)
.in_("paper_id", paper_ids)
.order("display_order")
.execute()
.data
)
papers_by_id = {paper["id"]: paper for paper in papers}
total_questions = len(questions)
topic_counter: Counter[str] = Counter()
type_counter: Counter[str] = Counter()
difficulty_counter: Counter[str] = Counter()
topic_examples: dict[str, list[dict]] = defaultdict(list)
difficulty_scores: list[int] = []
all_question_items: list[dict] = []
for question in questions:
question_type = extract_question_family(question)
type_counter[question_type] += 1
difficulty = question.get("difficulty")
if difficulty in DIFFICULTY_SCORE:
difficulty_counter[difficulty] += 1
difficulty_scores.append(DIFFICULTY_SCORE[difficulty])
paper = papers_by_id.get(question["paper_id"], {})
source_label = (
f"{paper.get('year', '')} {paper.get('term', '').title()} "
f"{paper.get('exam_type', '').title()}"
).strip()
if paper.get("part_label"):
source_label = f"{source_label} Part {paper['part_label']}"
topics = extract_topic_labels(question)
q_item = {
"paper_id": paper.get("id"),
"source": source_label,
"question_number": question["question_number"],
"preview": question["question_text"][:220],
"difficulty": question.get("difficulty"),
"question_type": question_type,
"year": paper.get("year"),
"term": paper.get("term"),
"exam_type": paper.get("exam_type"),
"topics": topics,
}
all_question_items.append(q_item)
for topic in topics:
topic_counter[topic] += 1
topic_examples[topic].append(q_item)
avg_difficulty = "N/A"
if difficulty_scores:
rounded = round(sum(difficulty_scores) / len(difficulty_scores))
avg_difficulty = DIFFICULTY_LABEL.get(rounded, "Medium")
topic_frequency = []
for topic, count in topic_counter.most_common():
pct = round((count / total_questions) * 100) if total_questions else 0
topic_frequency.append(
{
"label": topic,
"count": count,
"pct": pct,
"questions": topic_examples[topic],
}
)
question_types = []
for label, count in type_counter.most_common():
pct = round((count / total_questions) * 100) if total_questions else 0
question_types.append({"label": label, "count": count, "pct": pct})
return {
"course_code": course_code.upper(),
"kpi": {
"papers": len(papers),
"questions": total_questions,
"topics": len(topic_counter),
"difficulty": avg_difficulty,
},
"topic_frequency": topic_frequency,
"question_types": question_types,
"all_questions": all_question_items,
"difficulty_distribution": {
"easy": difficulty_counter.get("easy", 0),
"medium": difficulty_counter.get("medium", 0),
"hard": difficulty_counter.get("hard", 0),
},
"high_yield_topics": [topic for topic, _ in topic_counter.most_common(5)],
}

View File

@@ -0,0 +1,208 @@
"""用户答题记录 + 拍照批改 + 错题本"""
import asyncio
from fastapi import APIRouter, UploadFile, File, Form, HTTPException, Depends
from pydantic import BaseModel
from app.services.supabase_client import get_supabase
from app.services.grader import ocr_photo, grade_answer
from app.dependencies.auth import get_current_user_id
router = APIRouter()
class AttemptCreate(BaseModel):
question_id: str
attempt_type: str # "select" | "input" | "photo"
user_answer: str | None = None
is_correct: bool | None = None
class AttemptUpdate(BaseModel):
in_error_book: bool | None = None
mastered: bool | None = None
@router.post("/")
async def create_attempt(data: AttemptCreate, user_id: str = Depends(get_current_user_id)):
"""记录一次答题"""
sb = get_supabase()
record = {
"user_id": user_id,
"question_id": data.question_id,
"attempt_type": data.attempt_type,
"user_answer": data.user_answer,
"is_correct": data.is_correct,
}
# Auto add to error book if wrong
if data.is_correct is False:
record["in_error_book"] = True
result = sb.table("user_attempts").insert(record).execute()
return result.data[0]
@router.post("/photo")
async def photo_attempt(
question_id: str = Form(...),
photo: UploadFile = File(...),
user_id: str = Depends(get_current_user_id),
):
"""拍照上传 → OCR → AI批改"""
sb = get_supabase()
# 1. Read photo
photo_bytes = await photo.read()
# 2. Upload to storage
storage_path = f"attempts/{user_id}/{question_id}/{photo.filename}"
sb.storage.from_("attempt-photos").upload(
storage_path, photo_bytes,
file_options={"content-type": photo.content_type or "image/jpeg", "upsert": "true"},
)
photo_url = sb.storage.from_("attempt-photos").get_public_url(storage_path)
# 3. OCR (run in thread pool to avoid blocking event loop)
ocr_text = await asyncio.to_thread(ocr_photo, photo_bytes)
# 4. Fetch question for grading context
q_result = sb.table("paper_questions").select("*").eq("id", question_id).execute()
if not q_result.data:
raise HTTPException(status_code=404, detail="Question not found")
question = q_result.data[0]
# 5. AI grading (run in thread pool)
grade_result = await asyncio.to_thread(grade_answer, question, ocr_text)
# 6. Save attempt
record = {
"user_id": user_id,
"question_id": question_id,
"attempt_type": "photo",
"photo_url": photo_url,
"photo_ocr_text": ocr_text,
"is_correct": grade_result.get("is_correct", False),
"feedback": grade_result.get("feedback", ""),
"error_at_step": grade_result.get("error_at_step"),
"in_error_book": not grade_result.get("is_correct", False),
}
result = sb.table("user_attempts").insert(record).execute()
return {
"attempt": result.data[0],
"ocr_text": ocr_text,
"grade": grade_result,
}
@router.get("/error-book")
async def get_error_book(
course_code: str | None = None,
user_id: str = Depends(get_current_user_id),
):
"""获取错题本"""
sb = get_supabase()
attempts = (
sb.table("user_attempts")
.select("*")
.eq("user_id", user_id)
.eq("in_error_book", True)
.eq("mastered", False)
.order("created_at", desc=True)
.execute()
.data
)
if not attempts:
return []
question_ids = list({attempt["question_id"] for attempt in attempts})
questions = (
sb.table("paper_questions")
.select("*")
.in_("id", question_ids)
.execute()
.data
)
questions_by_id = {question["id"]: question for question in questions}
paper_ids = list({question["paper_id"] for question in questions})
papers = (
sb.table("papers")
.select("id, course_code, year, term, exam_type, part_label")
.in_("id", paper_ids)
.execute()
.data
)
papers_by_id = {paper["id"]: paper for paper in papers}
enriched = []
for attempt in attempts:
question = questions_by_id.get(attempt["question_id"])
if not question:
continue
paper = papers_by_id.get(question["paper_id"])
if course_code and paper and paper.get("course_code") != course_code.upper():
continue
enriched.append(
{
**attempt,
"paper_questions": {
**question,
"paper": paper,
},
}
)
return enriched
@router.get("/by-paper/{paper_id}")
async def get_paper_attempts(paper_id: str, user_id: str = Depends(get_current_user_id)):
"""获取某张试卷所有题目的最新判卷记录"""
sb = get_supabase()
attempts = (
sb.table("user_attempts")
.select("question_id, is_correct, feedback, photo_ocr_text, attempt_type, created_at")
.eq("user_id", user_id)
.order("created_at", desc=True)
.execute()
.data
)
# 只保留 photo 类型的,且只保留每题最新一条
question_ids = (
sb.table("paper_questions")
.select("id")
.eq("paper_id", paper_id)
.execute()
.data
)
qid_set = {q["id"] for q in question_ids}
seen: set[str] = set()
result = []
for a in attempts:
if a["question_id"] not in qid_set:
continue
if a["question_id"] in seen:
continue
if a["attempt_type"] != "photo":
continue
seen.add(a["question_id"])
result.append(a)
return result
@router.patch("/{attempt_id}")
async def update_attempt(attempt_id: str, data: AttemptUpdate):
"""更新错题状态(标记掌握等)"""
sb = get_supabase()
update = {}
if data.in_error_book is not None:
update["in_error_book"] = data.in_error_book
if data.mastered is not None:
update["mastered"] = data.mastered
if not update:
raise HTTPException(status_code=400, detail="Nothing to update")
result = sb.table("user_attempts").update(update).eq("id", attempt_id).execute()
if not result.data:
raise HTTPException(status_code=404, detail="Attempt not found")
return result.data[0]

View File

@@ -0,0 +1,142 @@
"""试卷上传 + 处理管线"""
import asyncio
import threading
from fastapi import APIRouter, UploadFile, File, Form, HTTPException, Depends
from app.services.supabase_client import get_supabase
from app.services.text_extractor import extract_pdf, get_full_text
from app.services.paper_processor import process_paper
from app.dependencies.auth import get_current_user_id
router = APIRouter()
def _upload_and_process_sync(
paper_id: str,
storage_path: str,
paper_bytes: bytes,
answer_bytes: bytes | None,
):
"""在独立线程中运行Storage 上传 + AI 处理"""
sb = get_supabase()
try:
paper_storage_path = f"{storage_path}/paper.pdf"
sb.storage.from_("papers").upload(
paper_storage_path, paper_bytes,
file_options={"content-type": "application/pdf", "upsert": "true"},
)
paper_url = sb.storage.from_("papers").get_public_url(paper_storage_path)
update_data: dict = {"paper_file_url": paper_url}
if answer_bytes:
answer_storage_path = f"{storage_path}/answer.pdf"
sb.storage.from_("papers").upload(
answer_storage_path, answer_bytes,
file_options={"content-type": "application/pdf", "upsert": "true"},
)
update_data["answer_file_url"] = sb.storage.from_("papers").get_public_url(answer_storage_path)
sb.table("papers").update(update_data).eq("id", paper_id).execute()
except Exception:
pass
# process_paper 是 async在新事件循环里跑
asyncio.run(process_paper(paper_id, paper_bytes, answer_bytes))
@router.get("/")
async def list_papers():
"""获取试卷列表(公共资产,所有用户共享)"""
sb = get_supabase()
return (
sb.table("papers")
.select("id, course_code, year, term, exam_type, status, question_count, total_score, difficulty_level, processing_step, processing_progress, processing_total, created_at")
.order("created_at", desc=True)
.execute()
.data
)
@router.get("/mine")
async def my_papers(user_id: str = Depends(get_current_user_id)):
"""当前用户上传的试卷(含 processing 状态)"""
sb = get_supabase()
return (
sb.table("papers")
.select("id, course_code, year, term, exam_type, part_label, status, question_count, processing_step, processing_progress, processing_total, created_at")
.eq("user_id", user_id)
.order("created_at", desc=True)
.execute()
.data
)
@router.post("/upload")
async def upload_paper(
paper_file: UploadFile = File(...),
answer_file: UploadFile | None = File(None),
course_code: str = Form(...),
year: int = Form(...),
term: str = Form(...),
exam_type: str = Form(...),
user_id: str = Depends(get_current_user_id),
):
"""上传试卷 PDF可选答案 PDF触发后台处理"""
sb = get_supabase()
# 1. 读取文件内容(已在内存中,快)
paper_bytes = await paper_file.read()
answer_bytes = await answer_file.read() if answer_file else None
# 2. 立即创建记录status=processing马上返回
storage_path = f"{course_code.upper()}/{year}_{term}_{exam_type}"
paper_record = sb.table("papers").insert({
"user_id": user_id,
"course_code": course_code.upper(),
"year": year,
"term": term,
"exam_type": exam_type,
"paper_file_url": "", # 后台上传后更新
"answer_file_url": None,
"status": "processing",
}).execute()
paper_id = paper_record.data[0]["id"]
# 3. 在独立线程中运行,完全不阻塞事件循环
threading.Thread(
target=_upload_and_process_sync,
args=(paper_id, storage_path, paper_bytes, answer_bytes),
daemon=True,
).start()
return {
"paper_id": paper_id,
"status": "processing",
"message": "试卷已上传,正在处理中...",
}
@router.get("/{paper_id}")
async def get_paper(paper_id: str):
"""获取试卷信息 + 处理状态"""
sb = get_supabase()
result = sb.table("papers").select("*").eq("id", paper_id).execute()
if not result.data:
raise HTTPException(status_code=404, detail="Paper not found")
return result.data[0]
@router.get("/{paper_id}/questions")
async def get_questions(paper_id: str):
"""获取试卷的所有题目(含 AI 三件套)"""
sb = get_supabase()
result = (
sb.table("paper_questions")
.select("*")
.eq("paper_id", paper_id)
.order("display_order")
.execute()
)
return result.data

View File

@@ -0,0 +1,325 @@
"""题目相关:变式题生成 + 相似题召回"""
from __future__ import annotations
import asyncio
import time
from fastapi import APIRouter, HTTPException, Depends
from pydantic import BaseModel
from app.services.supabase_client import get_supabase
from app.services.grader import generate_variant
from app.dependencies.auth import get_current_user_id
# Simple in-memory cache: question_id → (timestamp, result)
_similar_cache: dict[str, tuple[float, list]] = {}
_CACHE_TTL = 300 # 5 minutes
class VariantUpdate(BaseModel):
favorited: bool | None = None
router = APIRouter()
def normalized_labels(values: list[str] | None) -> dict[str, str]:
labels: dict[str, str] = {}
for value in values or []:
if value:
labels[value.lower()] = value
return labels
def question_family(question: dict) -> str:
return question.get("question_format") or question.get("question_type") or "unknown"
def display_topics(question: dict) -> list[str]:
labels: list[str] = []
analytics_topic = question.get("analytics_topic")
if analytics_topic:
labels.append(analytics_topic)
for topic in question.get("topic_tags") or []:
if topic and topic not in labels:
labels.append(topic)
if labels:
return labels
for topic in question.get("topics") or []:
if topic and topic not in labels:
labels.append(topic)
return labels
def similarity_score(
target: dict,
candidate: dict,
text_score: float = 0.0,
) -> tuple[int, list[str]]:
score = 0
reasons: list[str] = []
# Primary topic bucket: 40 pts
target_topic = target.get("analytics_topic")
candidate_topic = candidate.get("analytics_topic")
if target_topic and target_topic == candidate_topic:
score += 40
reasons.append(f"Same topic: {target_topic}")
# Concept overlap: up to 20 pts
target_topics = normalized_labels(target.get("topic_tags"))
candidate_topics = normalized_labels(candidate.get("topic_tags"))
shared_topics = sorted(set(target_topics) & set(candidate_topics))
if shared_topics:
score += min(len(shared_topics) * 10, 20)
# Only show concept reason if analytics_topic didn't already match (avoid redundancy)
if not (target_topic and target_topic == candidate_topic):
reasons.append(
"Shared concept: "
+ ", ".join(target_topics[key] for key in shared_topics[:2])
)
# Skill overlap: up to 20 pts
target_skills = normalized_labels(target.get("skill_tags"))
candidate_skills = normalized_labels(candidate.get("skill_tags"))
shared_skills = sorted(set(target_skills) & set(candidate_skills))
if shared_skills:
score += min(len(shared_skills) * 10, 20)
reasons.append(
"Shared skill: "
+ ", ".join(target_skills[key] for key in shared_skills[:2])
)
# Same question format: 10 pts
if question_family(candidate) == question_family(target):
score += 10
reasons.append("Same format")
# Same difficulty: 5 pts
if candidate.get("difficulty") and candidate.get("difficulty") == target.get("difficulty"):
score += 5
reasons.append("Same difficulty")
# Full-text similarity from PostgreSQL ts_rank_cd: up to 20 pts
if text_score > 0:
text_pts = min(round(text_score * 60), 20)
score += text_pts
if text_pts >= 4:
reasons.append("Similar wording")
return min(score, 99), reasons
@router.get("/variants/favorited")
async def get_favorited_variants(user_id: str = Depends(get_current_user_id)):
"""获取用户收藏的所有 variant用于 Error Book"""
sb = get_supabase()
rows = (
sb.table("question_variants")
.select("*, paper_questions(question_number, paper_id, papers(id, course_code, year, term, exam_type, part_label))")
.eq("user_id", user_id)
.eq("favorited", True)
.order("created_at", desc=True)
.execute()
.data
)
return rows
@router.post("/{question_id}/variant")
async def create_variant(question_id: str, user_id: str = Depends(get_current_user_id)):
"""生成变式题并入库"""
sb = get_supabase()
result = sb.table("paper_questions").select("*").eq("id", question_id).execute()
if not result.data:
raise HTTPException(status_code=404, detail="Question not found")
question = result.data[0]
variant_data = await asyncio.to_thread(generate_variant, question)
variant_data["knowledge_reminder"] = question.get("knowledge_reminder", "")
saved = sb.table("question_variants").insert({
"user_id": user_id,
"source_question_id": question_id,
"variant_data": variant_data,
"favorited": False,
}).execute()
row = saved.data[0]
row["source_question_number"] = question["question_number"]
return row
@router.get("/{question_id}/variants")
async def list_variants(question_id: str, user_id: str = Depends(get_current_user_id)):
"""获取某道题的用户所有 variant"""
sb = get_supabase()
q_result = sb.table("paper_questions").select("question_number").eq("id", question_id).execute()
question_number = q_result.data[0]["question_number"] if q_result.data else ""
rows = (
sb.table("question_variants")
.select("*")
.eq("user_id", user_id)
.eq("source_question_id", question_id)
.order("created_at", desc=True)
.execute()
.data
)
for row in rows:
row["source_question_number"] = question_number
return rows
@router.patch("/variant/{variant_id}")
async def update_variant(variant_id: str, data: VariantUpdate, user_id: str = Depends(get_current_user_id)):
"""更新 variant收藏/取消收藏)"""
sb = get_supabase()
update: dict = {}
if data.favorited is not None:
update["favorited"] = data.favorited
if not update:
raise HTTPException(status_code=400, detail="Nothing to update")
result = (
sb.table("question_variants")
.update(update)
.eq("id", variant_id)
.eq("user_id", user_id)
.execute()
)
if not result.data:
raise HTTPException(status_code=404, detail="Variant not found")
return result.data[0]
@router.delete("/variant/{variant_id}", status_code=204)
async def delete_variant(variant_id: str, user_id: str = Depends(get_current_user_id)):
"""删除 variant"""
sb = get_supabase()
sb.table("question_variants").delete().eq("id", variant_id).eq("user_id", user_id).execute()
@router.get("/{question_id}/similar")
async def get_similar_questions(question_id: str, limit: int = 6):
"""Retrieve similar questions from the same course."""
# Cache hit
cached = _similar_cache.get(question_id)
if cached and (time.time() - cached[0]) < _CACHE_TTL:
return cached[1][:max(1, min(limit, 12))]
sb = get_supabase()
result = sb.table("paper_questions").select("*, similar_questions").eq("id", question_id).execute()
if not result.data:
raise HTTPException(status_code=404, detail="Question not found")
target = result.data[0]
# Return pre-computed immediately; schedule background refresh
if target.get("similar_questions"):
precomputed = target["similar_questions"]
_similar_cache[question_id] = (time.time(), precomputed)
return precomputed[:max(1, min(limit, 12))]
paper_result = sb.table("papers").select("id, course_code").eq("id", target["paper_id"]).execute()
# (fallback: compute on-the-fly for questions not yet backfilled)
if not paper_result.data:
raise HTTPException(status_code=404, detail="Paper not found")
course_code = paper_result.data[0]["course_code"]
papers = (
sb.table("papers")
.select("id, course_code, year, term, exam_type, part_label")
.eq("course_code", course_code)
.eq("status", "ready")
.execute()
.data
)
paper_ids = [paper["id"] for paper in papers if paper["id"] != target["paper_id"]]
if not paper_ids:
return []
papers_by_id = {paper["id"]: paper for paper in papers}
# Pre-filter by analytics_topic in DB when possible (cuts candidates from ~250 to ~30)
candidates_query = (
sb.table("paper_questions")
.select(
"id, paper_id, question_number, question_type, question_format, "
"question_text, score, topics, analytics_topic, topic_tags, skill_tags, "
"difficulty, knowledge_reminder, ai_hint, solution"
)
.in_("paper_id", paper_ids)
)
target_topic = target.get("analytics_topic")
if target_topic:
candidates_query = candidates_query.eq("analytics_topic", target_topic)
candidates = candidates_query.execute().data
if not candidates:
return []
# Batch full-text scores from PostgreSQL (skip if too many candidates — slow)
text_scores: dict[str, float] = {}
if len(candidates) <= 50:
try:
rpc_result = sb.rpc(
"text_similarity_scores",
{
"query_text": target.get("question_text") or "",
"candidate_ids": [c["id"] for c in candidates],
},
).execute()
for row in rpc_result.data or []:
text_scores[row["question_id"]] = float(row["text_score"] or 0)
except Exception:
pass
ranked = []
for candidate in candidates:
text_score = text_scores.get(candidate["id"], 0.0)
match_percent, reasons = similarity_score(target, candidate, text_score)
if match_percent < 20:
continue
paper = papers_by_id.get(candidate["paper_id"], {})
source = (
f"{paper.get('year', '')} {paper.get('term', '').title()} "
f"{paper.get('exam_type', '').title()}"
).strip()
if paper.get("part_label"):
source = f"{source} Part {paper['part_label']}"
ranked.append(
{
"id": candidate["id"],
"paper_id": candidate["paper_id"],
"source": source,
"question_number": candidate["question_number"],
"match_percent": match_percent,
"match_reasons": reasons,
"question_type": question_family(candidate),
"question_text": candidate["question_text"],
"topics": display_topics(candidate),
"difficulty": candidate.get("difficulty"),
"knowledge_reminder": candidate.get("knowledge_reminder", ""),
"ai_hint": candidate.get("ai_hint", ""),
"solution": candidate.get("solution", ""),
}
)
ranked.sort(key=lambda item: (-item["match_percent"], item["source"], item["question_number"]))
# Keep only the best-scoring question per paper
seen_papers: set[str] = set()
deduped = []
for item in ranked:
if item["paper_id"] not in seen_papers:
seen_papers.add(item["paper_id"])
deduped.append(item)
_similar_cache[question_id] = (time.time(), deduped)
# Persist to DB so future requests are instant
try:
sb.table("paper_questions").update({"similar_questions": deduped}).eq("id", question_id).execute()
except Exception:
pass
return deduped[:max(1, min(limit, 12))]

View File

View File

@@ -0,0 +1,146 @@
"""OCR, grading, and variant generation prompts"""
import json
import base64
from app.services.llm_clients import get_vision_client, get_deepseek_client
OCR_PROMPT = """You are an expert at recognizing handwritten answers. Analyze this photo of a student's handwritten answer and extract the text and mathematical formulas.
Requirements:
- Faithfully extract what the student wrote, do not modify or correct
- Use LaTeX format for math formulas (e.g. $x^2 + 1$)
- If there are multiple steps, list them in original order
- If some handwriting is unclear, mark with [unclear]
Return only the extracted text, no additional explanation."""
GRADING_PROMPT = """You are an expert academic grader. Grade the following student answer. ALL output must be in English.
Question info:
- Number: {question_number}
- Type: {question_type}
- Question: {question_text}
- Score: {score}
Reference answer / solution:
{reference_answer}
Student answer:
{student_answer}
Grade and return JSON:
{{
"is_correct": true/false,
"score_given": 0-{score},
"feedback": "<HTML> Step-by-step analysis of the student's answer, pointing out correct parts and errors, using KaTeX formulas </HTML>",
"error_at_step": null or the step number where errors begin (integer)
}}
Grading rules:
- MC / fill-blank: only correct if answer matches exactly
- Long questions: give partial credit for correct steps even if the final answer is wrong
- feedback in HTML format, supports KaTeX ($..$ inline, $$...$$ block)
- Mark errors with <div class="common-error">...</div>
- Identify exactly which step the error starts"""
VARIANT_PROMPT = """You are an expert exam question creator. Generate a similar but different variant question based on the original below. ALL output must be in English.
Original question info:
- Type: {question_type}
- Question: {question_text}
- Topics: {topics}
- Difficulty: {difficulty}
- Reference answer: {answer}
Requirements:
- Variant must test the same knowledge points at similar difficulty
- Data/scenario/wording must differ — don't just change numbers
- Must provide a complete correct answer
Format requirements (CRITICAL):
- All text in HTML format, absolutely NO markdown syntax
- Code: <pre><code class="language-xxx">...</code></pre>, NOT ```
- Math: $...$ (inline) or $$...$$ (block), KaTeX compatible
- Line breaks: <br>, paragraphs: <p>
Return JSON:
{{
"question_text": "HTML formatted variant question",
"question_type": "{question_type}",
"options": [MC only, format {{"label":"A","text":"..."}}, ...] or null,
"correct_answer": "Correct answer (plain text)",
"ai_hint": "HTML formatted hint that guides thinking WITHOUT giving the answer",
"solution": "HTML formatted complete step-by-step solution"
}}"""
def ocr_photo(photo_bytes: bytes) -> str:
"""Gemini Vision OCR for handwritten answers"""
client = get_vision_client()
b64 = base64.b64encode(photo_bytes).decode("utf-8")
resp = client.chat.completions.create(
model="gemini-2.5-flash",
messages=[
{"role": "system", "content": OCR_PROMPT},
{"role": "user", "content": [
{"type": "image_url", "image_url": {
"url": f"data:image/jpeg;base64,{b64}",
}},
]},
],
temperature=0,
max_tokens=2000,
)
return resp.choices[0].message.content or ""
def grade_answer(question: dict, student_answer: str) -> dict:
"""Qwen grades student answer"""
reference = question.get("raw_answer_text") or question.get("solution") or "No reference answer"
score = question.get("score") or "unknown"
ds = get_deepseek_client()
resp = ds.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "system", "content": GRADING_PROMPT.format(
question_number=question["question_number"],
question_type=question["question_type"],
question_text=question["question_text"],
score=score,
reference_answer=reference,
student_answer=student_answer,
)},
],
temperature=0.2,
response_format={"type": "json_object"},
)
return json.loads(resp.choices[0].message.content)
def generate_variant(question: dict) -> dict:
"""Gemini generates a variant question"""
answer = (
question.get("correct_option")
or question.get("correct_answer")
or question.get("raw_answer_text")
or "N/A"
)
ds = get_deepseek_client()
resp = ds.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "system", "content": VARIANT_PROMPT.format(
question_type=question["question_type"],
question_text=question["question_text"],
topics=", ".join(question.get("topics", [])),
difficulty=question.get("difficulty", "medium"),
answer=answer,
)},
],
temperature=0.5,
response_format={"type": "json_object"},
)
return json.loads(resp.choices[0].message.content)

View File

@@ -0,0 +1,74 @@
import httpx
from openai import OpenAI
from app.config import get_settings
_TIMEOUT = httpx.Timeout(connect=10, read=300, write=60, pool=10)
_gpt_client: OpenAI | None = None
_qwen_client: OpenAI | None = None
_gemini_flash_client: OpenAI | None = None
_gemini_lite_client: OpenAI | None = None
_deepseek_client: OpenAI | None = None
def get_gpt_client() -> OpenAI:
"""laozhang API — gpt-4o / gpt-4o-mini"""
global _gpt_client
if _gpt_client is None:
s = get_settings()
_gpt_client = OpenAI(
base_url=s.laozhang_base_url,
api_key=s.laozhang_api_key,
)
return _gpt_client
def get_qwen_client() -> OpenAI:
"""DashScope — qwen-plus"""
global _qwen_client
if _qwen_client is None:
s = get_settings()
_qwen_client = OpenAI(
base_url=s.dashscope_base_url,
api_key=s.dashscope_api_key,
)
return _qwen_client
def get_vision_client() -> OpenAI:
"""Google Gemini 官方 API视觉用于拆题+OCR— 部署在新加坡可用"""
global _gemini_flash_client
if _gemini_flash_client is None:
s = get_settings()
_gemini_flash_client = OpenAI(
base_url="https://generativelanguage.googleapis.com/v1beta/openai/",
api_key=s.google_gemini_api_key,
timeout=_TIMEOUT,
)
return _gemini_flash_client
def get_gemini_lite_client() -> OpenAI:
"""laozhang — gemini-3.1-flash-lite-preview轻量用于 AI trio"""
global _gemini_lite_client
if _gemini_lite_client is None:
s = get_settings()
_gemini_lite_client = OpenAI(
base_url=s.laozhang_base_url,
api_key=s.laozhang_api_key,
timeout=_TIMEOUT,
)
return _gemini_lite_client
def get_deepseek_client() -> OpenAI:
"""DeepSeek — deepseek-chat用于 AI trio"""
global _deepseek_client
if _deepseek_client is None:
s = get_settings()
_deepseek_client = OpenAI(
base_url=s.deepseek_base_url,
api_key=s.deepseek_api_key,
timeout=_TIMEOUT,
)
return _deepseek_client

View File

@@ -0,0 +1,576 @@
"""试卷处理管线PDF → 结构化题目 → AI 三件套Vision 模式)"""
import asyncio
import base64
import io
import json
import re
import traceback
from contextlib import redirect_stdout
import fitz # pymupdf
from app.services.supabase_client import get_supabase
from app.services.llm_clients import get_vision_client, get_deepseek_client
def strip_nulls(obj):
"""Recursively remove \\u0000 null bytes from strings (PostgreSQL rejects them)."""
if isinstance(obj, str):
return obj.replace("\u0000", "")
if isinstance(obj, dict):
return {k: strip_nulls(v) for k, v in obj.items()}
if isinstance(obj, list):
return [strip_nulls(i) for i in obj]
return obj
# ============================================
# Prompts
# ============================================
STRUCTURE_PROMPT = """You are an expert exam paper structure analyst. You are given images of a past exam paper. Analyze every page carefully and extract all questions into structured JSON.
All generated values must be in English. Do not output Chinese.
CRITICAL RULES for question_text:
- Each question's question_text must be FULLY SELF-CONTAINED. Include ALL context needed to solve it.
- For sub-questions (e.g. (a)(i)), copy the ENTIRE parent question setup (variable definitions, code blocks, problem description) into the question_text, then append the specific sub-question.
- For Python/code questions: include ALL variable definitions and import statements verbatim, exactly as they appear in the exam, preserving multi-line arrays and data structures completely.
- Never truncate code. If a variable is defined across multiple lines (e.g. a numpy array), include every line.
Output JSON format (strictly follow):
{
"total_score": 100,
"difficulty_level": "medium",
"topics_summary": {"Topic A": 40, "Topic B": 30, "Topic C": 30},
"questions": [
{
"question_number": "1a",
"parent_question": "1",
"question_type": "mc",
"question_text": "Original question text...",
"score": 5,
"page_number": 1,
"options": [{"label": "A", "text": "Option content"}, {"label": "B", "text": "..."}],
"topics": ["Linked List", "Pointer"],
"difficulty": "easy"
},
{
"question_number": "2",
"parent_question": null,
"question_type": "long_question",
"question_text": "Original question text...",
"score": 15,
"page_number": 2,
"options": null,
"topics": ["Recursion"],
"difficulty": "hard"
}
]
}
Rules:
- question_type must be one of: "mc" (multiple choice), "true_false" (true/false), "fill_blank" (fill in blank), "long_question" (long question)
- True/False questions MUST use "true_false" type, with options set to [{"label":"True","text":"True"},{"label":"False","text":"False"}], correct_option as "True" or "False"
- Multiple choice must extract the options array
- Sub-questions use parent_question to link to parent: "1a" parent is "1"
- Independent questions without sub-questions set parent_question to null
- page_number inferred from where the question appears
- topics inferred from the question content
- difficulty: "easy" | "medium" | "hard"
- Extract ALL questions, do not miss any
- Keep topic labels in English only
"""
ANSWER_MATCH_PROMPT = """You are an expert exam answer matching specialist. Below is the answer text for an exam paper. Extract and match answers to their corresponding question numbers.
All generated values must be in English. Do not output Chinese.
Question structure:
{questions_json}
Answer text:
{answer_text}
Output JSON format:
{{
"answers": [
{{
"question_number": "1a",
"correct_option": "B",
"correct_answer": null,
"raw_answer_text": "Original answer text..."
}},
{{
"question_number": "2",
"correct_option": null,
"correct_answer": null,
"raw_answer_text": "Complete solution process and answer..."
}}
]
}}
Rules:
- For MC questions, fill correct_option (e.g. "B")
- For fill-blank questions, fill correct_answer (e.g. "O(n log n)")
- For long questions, only fill raw_answer_text (complete solution process)
- Match all questions where answers can be found
- Keep raw_answer_text faithful to the source answer, but do not add Chinese commentary
"""
ANALYSIS_PROMPT = """You are an expert academic answer analyst. Generate three sections for the following exam question. ALL output must be in English.
Question info:
- Number: {question_number}
- Type: {question_type}
- Score: {score}
- Question: {question_text}
- Topics: {topics}
{answer_section}
Generate THREE sections in HTML format (supports KaTeX: block $$ ... $$ inline $ ... $):
Output JSON:
{{
"knowledge_reminder": "<HTML> Prerequisite knowledge points needed for this question, as a concise bullet list </HTML>",
"ai_hint": "<HTML> A hint that guides thinking direction WITHOUT giving away the answer </HTML>",
"solution": "<HTML> Complete step-by-step solution (Step 1, Step 2, ...) with derivations, formulas, and common mistake warnings </HTML>"
}}
Solution requirements:
- Must include complete working process, not just the answer
- Each step must have an explanation
- If a reference answer is provided, derive the solution based on it
- If no reference answer, work out the complete solution independently
- For MC questions, explain why the correct option is right AND why others are wrong
- Use <ol> or numbered steps
- Mark common mistakes with <div class="common-error">...</div>
KaTeX formula rules:
- Block formula: $$ on its own line, with blank lines before and after
- Inline formula: $x^2$ no line break
- Matrix: \\begin{{bmatrix}} ... \\end{{bmatrix}}
- Fraction: \\frac{{a}}{{b}}
"""
BATCH_ANALYSIS_PROMPT = """You are an expert academic answer analyst. Generate three study sections for each question below. ALL output must be in English.
For every question, return:
- knowledge_reminder: concise prerequisite bullets in HTML
- ai_hint: a helpful hint in HTML without revealing the final answer
- solution: a complete step-by-step solution in HTML
Return JSON in this exact format:
{{
"analyses": [
{{
"question_number": "1a",
"knowledge_reminder": "<HTML>...</HTML>",
"ai_hint": "<HTML>...</HTML>",
"solution": "<HTML>...</HTML>"
}}
]
}}
Rules:
- Return one item for every provided question_number
- Keep each item matched to the same question_number
- All text must be in English
- HTML only, KaTeX compatible
- For MC questions, explain why the correct option is right and why the others are wrong
- For long questions, show a complete derivation or reasoning chain
- Use <ol> or numbered steps in solution when appropriate
- Mark common mistakes with <div class="common-error">...</div>
- CRITICAL: When a question_text contains "[Context from parent question X]" followed by "[Sub-question Y]", the parent section is background context only. You MUST solve ONLY the specific sub-question labeled [Sub-question Y]. Do NOT solve other sub-questions listed in the parent context. Give one precise answer for that single sub-question only.
Questions:
{questions_payload}
"""
# ============================================
# 处理管线
# ============================================
RETRYABLE_ERROR_MARKERS = (
"429",
"rate limit",
"rate_limit",
"too many requests",
"timeout",
"timed out",
"connection",
)
def is_retryable_error(exc: Exception) -> bool:
message = str(exc).lower()
return any(marker in message for marker in RETRYABLE_ERROR_MARKERS)
def pdf_to_images(pdf_bytes: bytes, dpi: int = 96) -> list[str]:
"""将 PDF 每页渲染为 base64 PNG 图片列表96dpi 平衡清晰度与成本)"""
doc = fitz.open(stream=pdf_bytes, filetype="pdf")
images = []
mat = fitz.Matrix(dpi / 72, dpi / 72)
for page in doc:
pix = page.get_pixmap(matrix=mat, colorspace=fitz.csRGB)
img_bytes = pix.tobytes("png")
images.append(base64.b64encode(img_bytes).decode())
doc.close()
return images
def parse_json_response(text: str) -> dict:
"""解析模型返回的 JSON兼容 markdown 代码块包装"""
text = text.strip()
# 去掉 ```json ... ``` 包装
if text.startswith("```"):
lines = text.splitlines()
text = "\n".join(lines[1:-1] if lines[-1].strip() == "```" else lines[1:])
# 移除 JSON 字符串中的非法控制字符0x00-0x1F 除了 \t \n \r
text = re.sub(r'[\x00-\x08\x0b\x0c\x0e-\x1f]', '', text)
# 修复模型返回的无效 JSON 转义序列:只修奇数个反斜杠后的非法字符
text = re.sub(r'(?<!\\)((?:\\\\)*)\\([^"\\/bfnrtu])', r'\1\\\\\2', text)
return json.loads(text)
async def gemini_vision_json(
*,
system_prompt: str,
images: list[str],
user_text: str = "",
temperature: float = 0,
max_attempts: int = 6,
) -> dict:
"""发送图片 + prompt 给 Gemini vision 模型,返回 JSON"""
client = get_vision_client()
delay_seconds = 2
content: list = []
for b64 in images:
content.append({"type": "image_url", "image_url": {"url": f"data:image/png;base64,{b64}"}})
if user_text:
content.append({"type": "text", "text": user_text})
for attempt in range(1, max_attempts + 1):
try:
response = client.chat.completions.create(
model="gemini-2.5-flash",
messages=[
{"role": "system", "content": system_prompt + "\n\nIMPORTANT: Your entire response must be valid JSON only. No markdown, no code fences, no extra text."},
{"role": "user", "content": content},
],
temperature=temperature,
max_tokens=16384,
)
return parse_json_response(response.choices[0].message.content)
except Exception as exc:
if attempt == max_attempts or not is_retryable_error(exc):
raise
await asyncio.sleep(delay_seconds)
delay_seconds = min(delay_seconds * 2, 30)
async def deepseek_json_completion(
*,
system_prompt: str,
user_prompt: str | None = None,
temperature: float = 0,
max_attempts: int = 6,
) -> dict:
"""DeepSeek 纯文本 JSON completion用于 AI trio 生成)"""
client = get_deepseek_client()
delay_seconds = 2
for attempt in range(1, max_attempts + 1):
try:
messages = [{"role": "system", "content": system_prompt}]
if user_prompt:
messages.append({"role": "user", "content": user_prompt})
response = client.chat.completions.create(
model="deepseek-chat",
messages=messages,
temperature=temperature,
max_tokens=8192,
response_format={"type": "json_object"},
)
raw = response.choices[0].message.content
raw = re.sub(r'[\x00-\x08\x0b\x0c\x0e-\x1f]', '', raw)
raw = re.sub(r'(?<!\\)((?:\\\\)*)\\([^"\\/bfnrtu])', r'\1\\\\\2', raw)
return json.loads(raw)
except Exception as exc:
if attempt == max_attempts or not is_retryable_error(exc):
raise
await asyncio.sleep(delay_seconds)
delay_seconds = min(delay_seconds * 2, 30)
def chunked(items: list[dict], size: int) -> list[list[dict]]:
return [items[i:i + size] for i in range(0, len(items), size)]
def _question_sort_key(qnum: str) -> tuple:
"""自然排序题号1a < 1b < ... < 1i < 1j < 2ai < 2aii < 10a"""
parts = re.findall(r'(\d+|[a-zA-Z]+|[()]+)', qnum)
key = []
for idx, p in enumerate(parts):
if p.isdigit():
key.append((0, int(p), ''))
elif p in ('(', ')'):
continue
else:
# Single letter (a-z): always sort alphabetically (a=1, b=2, ..., j=10)
if len(p) == 1 and p.isalpha():
key.append((1, ord(p.lower()) - ord('a') + 1, p))
else:
# Multi-letter: roman numerals for sub-sub-questions (i=1, ii=2, iii=3, ...)
romans = {'i':1,'ii':2,'iii':3,'iv':4,'v':5,'vi':6,'vii':7,'viii':8,'ix':9,'x':10,'xi':11,'xii':12,'xiii':13}
if p.lower() in romans:
key.append((2, romans[p.lower()], p))
else:
key.append((1, 0, p))
return tuple(key)
def sort_questions(questions: list[dict]) -> list[dict]:
"""按题号自然排序"""
return sorted(questions, key=lambda q: _question_sort_key(q.get("question_number", "")))
def extract_code_block(text: str) -> str:
"""
从题目文本中提取 Python 代码块。
策略找到第一个明确的代码起始行import/赋值/print
然后把后续所有缩进或延续行一并带上,直到明显的非代码段落。
"""
lines = text.splitlines()
result = []
in_code = False
open_brackets = 0
CODE_START = re.compile(r"^\s*(import |from \w|[A-Za-z_]\w*\s*=|print\()")
for line in lines:
stripped = line.strip()
# 已在代码块内:括号未闭合时继续收集
if in_code and open_brackets > 0:
result.append(stripped)
open_brackets += stripped.count("(") + stripped.count("[") + stripped.count("{")
open_brackets -= stripped.count(")") + stripped.count("]") + stripped.count("}")
continue
# 检测新的代码起始行
if CODE_START.match(line):
in_code = True
result.append(stripped)
open_brackets += stripped.count("(") + stripped.count("[") + stripped.count("{")
open_brackets -= stripped.count(")") + stripped.count("]") + stripped.count("}")
continue
# 非代码行:重置(但保留 in_code=True 以便继续接后续代码行)
in_code = False
return "\n".join(result)
# 保持向后兼容
extract_code_lines = extract_code_block
def try_exec_python(code: str, shared_ns: dict) -> str | None:
"""
在 shared_ns 命名空间中执行 code捕获 stdout。
返回输出字符串,失败返回 None。
"""
buf = io.StringIO()
try:
with redirect_stdout(buf):
exec(code, shared_ns) # noqa: S102
output = buf.getvalue().strip()
return output if output else None
except Exception:
return None
async def _resume_ai_trio(sb, paper_id: str, questions: list[dict]):
"""为缺 solution 的题目生成 AI trio逐条写回 DB。支持断点续传。"""
need = [q for q in questions if not q.get("solution")]
if not need:
# 全部已有 solution直接标记完成
sb.table("papers").update({"status": "ready", "processing_step": None}).eq("id", paper_id).execute()
return
total_q = len(questions)
done_q = total_q - len(need)
# 构建 payload
id_map = {q["question_number"]: q["id"] for q in need}
# 需要完整的 question_text 来生成 AI trio
full_data = sb.table("paper_questions").select(
"id, question_number, question_type, question_text, score, correct_option, correct_answer, raw_answer_text"
).eq("paper_id", paper_id).in_("id", [q["id"] for q in need]).execute().data
payloads = []
for q in full_data:
answer_section = q.get("raw_answer_text") or ""
if not answer_section and q.get("correct_option"):
answer_section = f"Correct option: {q['correct_option']}"
elif not answer_section and q.get("correct_answer"):
answer_section = f"Correct answer: {q['correct_answer']}"
payloads.append({
"question_number": q["question_number"],
"question_type": q["question_type"] or "long_question",
"score": q.get("score") or "unknown",
"question_text": q["question_text"] or "",
"reference_answer": answer_section,
})
batches = chunked(payloads, 3)
for batch_idx, batch in enumerate(batches, 1):
current = done_q + batch_idx * 3
_update_progress(sb, paper_id, f"Generating solutions ({min(current, total_q)}/{total_q} questions)", batch_idx, len(batches))
try:
result = await deepseek_json_completion(
system_prompt=BATCH_ANALYSIS_PROMPT.format(
questions_payload=json.dumps(batch, ensure_ascii=False),
),
temperature=0.3,
)
for item in result.get("analyses", []):
qnum = item.get("question_number")
qid = id_map.get(qnum)
if qid:
sb.table("paper_questions").update({
"knowledge_reminder": item.get("knowledge_reminder", ""),
"ai_hint": item.get("ai_hint", ""),
"solution": item.get("solution", ""),
}).eq("id", qid).execute()
except Exception:
pass # 单批失败不影响其他批
await asyncio.sleep(1)
# 标记完成
sb.table("papers").update({"status": "ready", "processing_step": None}).eq("id", paper_id).execute()
def _update_progress(sb, paper_id: str, step: str, progress: int = 0, total: int = 0):
"""更新处理进度到 DB"""
sb.table("papers").update({
"processing_step": step,
"processing_progress": progress,
"processing_total": total,
}).eq("id", paper_id).execute()
async def process_paper(paper_id: str, paper_bytes: bytes, answer_bytes: bytes | None):
"""后台处理管线: PDF pages → Vision 结构化 → AI 三件套
设计原则:每个步骤完成后立即持久化到 DB支持断点续传。
"""
sb = get_supabase()
try:
# 检查是否已有题目(断点续传场景)
existing = sb.table("paper_questions").select("id, question_number, solution").eq("paper_id", paper_id).execute().data
if existing:
# 已有题目 → 跳过提取,直接补 AI trio
await _resume_ai_trio(sb, paper_id, existing)
return
# ── Step 1: PDF → 图片 ──
_update_progress(sb, paper_id, "Rendering PDF pages...")
paper_images = pdf_to_images(paper_bytes)
# ── Step 2: Vision 结构化拆题 ──
PAGE_BATCH = 8
all_questions: list = []
meta: dict = {}
num_page_batches = -(-len(paper_images) // PAGE_BATCH)
for i in range(0, len(paper_images), PAGE_BATCH):
batch_imgs = paper_images[i:i + PAGE_BATCH]
batch_idx = i // PAGE_BATCH + 1
_update_progress(sb, paper_id, f"Reading pages {i+1}-{i+len(batch_imgs)}...", batch_idx, num_page_batches)
batch_result = await gemini_vision_json(
system_prompt=STRUCTURE_PROMPT,
images=batch_imgs,
user_text=f"Pages {i+1}-{i+len(batch_imgs)} of the exam paper. Extract all questions visible on these pages.",
temperature=0,
)
if not meta:
meta = {k: batch_result.get(k) for k in ("total_score", "difficulty_level", "topics_summary")}
all_questions.extend(batch_result.get("questions", []))
all_questions = sort_questions(all_questions)
questions = all_questions
# 更新 paper 概览
sb.table("papers").update({
"total_score": meta.get("total_score"),
"question_count": len(questions),
"topics_summary": meta.get("topics_summary"),
"difficulty_level": meta.get("difficulty_level"),
}).eq("id", paper_id).execute()
# ── Step 3: 答案匹配(分批,失败跳过)──
answers_map = {}
if answer_bytes:
_update_progress(sb, paper_id, "Matching answers...")
try:
answer_images = pdf_to_images(answer_bytes)
questions_json = json.dumps(
[{"question_number": q["question_number"], "question_type": q["question_type"]}
for q in questions], ensure_ascii=False,
)
all_answers: list = []
for ai in range(0, len(answer_images), 8):
batch_ans_imgs = answer_images[ai:ai + 8]
try:
match_result = await gemini_vision_json(
system_prompt=ANSWER_MATCH_PROMPT.format(
questions_json=questions_json, answer_text="(See images)",
),
images=batch_ans_imgs,
user_text=f"Match answers to these questions: {questions_json}",
temperature=0,
)
all_answers.extend(match_result.get("answers", []))
except Exception:
pass
answers_map = {a["question_number"]: a for a in all_answers}
except Exception:
pass
# ── Step 4: 立即写入题目到 DB先不含 AI trio──
_update_progress(sb, paper_id, "Saving questions...")
for i, q in enumerate(questions):
qnum = q["question_number"]
answer = answers_map.get(qnum, {})
sb.table("paper_questions").insert(strip_nulls({
"paper_id": paper_id,
"question_number": qnum,
"parent_question": q.get("parent_question"),
"display_order": i,
"question_type": q["question_type"],
"question_text": q["question_text"],
"score": q.get("score"),
"page_number": q.get("page_number"),
"options": q.get("options"),
"correct_option": answer.get("correct_option"),
"correct_answer": answer.get("correct_answer"),
"raw_answer_text": answer.get("raw_answer_text"),
"topics": q.get("topics", []),
"analytics_topic": q.get("topics", [None])[0],
"topic_tags": q.get("topics", []),
"difficulty": q.get("difficulty"),
})).execute()
# ── Step 5: AI trio逐条更新支持断点续传──
saved = sb.table("paper_questions").select("id, question_number, solution").eq("paper_id", paper_id).execute().data
await _resume_ai_trio(sb, paper_id, saved)
except Exception as e:
sb.table("papers").update({
"status": "error",
"error_message": f"{type(e).__name__}: {str(e)}\n{traceback.format_exc()[-500:]}",
}).eq("id", paper_id).execute()
raise

View File

@@ -0,0 +1,13 @@
from supabase import create_client, Client
from app.config import get_settings
_client: Client | None = None
def get_supabase() -> Client:
"""获取 Supabase client (service_role绕过 RLS)"""
global _client
if _client is None:
s = get_settings()
_client = create_client(s.supabase_url, s.supabase_service_role_key)
return _client

View File

@@ -0,0 +1,48 @@
"""PDF 文本提取 — 复用 SOS 的 text_extractor 逻辑"""
import base64
import fitz # PyMuPDF
from dataclasses import dataclass
@dataclass
class ExtractedContent:
pages_text: list[str] # 每页文本
page_images: dict[int, str] # 页码 → base64 图片(图片密集型页面)
total_pages: int
has_images: bool
def extract_pdf(file_bytes: bytes) -> ExtractedContent:
"""从 PDF 提取文本和图片"""
doc = fitz.open(stream=file_bytes, filetype="pdf")
pages_text = []
page_images = {}
for i, page in enumerate(doc):
text = page.get_text("text")
pages_text.append(text)
# 如果某页文本很少但有图片,可能是扫描件 → 保存为图片用于 Vision OCR
if len(text.strip()) < 50:
pix = page.get_pixmap(dpi=200)
img_bytes = pix.tobytes("png")
page_images[i] = base64.b64encode(img_bytes).decode("utf-8")
doc.close()
return ExtractedContent(
pages_text=pages_text,
page_images=page_images,
total_pages=len(pages_text),
has_images=len(page_images) > 0,
)
def get_full_text(extracted: ExtractedContent) -> str:
"""合并所有页面文本"""
return "\n\n".join(
f"--- Page {i+1} ---\n{text}"
for i, text in enumerate(extracted.pages_text)
if text.strip()
)

View File

@@ -0,0 +1,252 @@
"""
重新生成所有题目的 AI trio子题带父题上下文。
用法: python backfill_ai_trio_with_context.py [--paper-id <id>] [--course <code>]
"""
import asyncio
import io
import json
import re
import sys
import time
import argparse
from contextlib import redirect_stdout
from app.services.supabase_client import get_supabase
from app.services.llm_clients import get_deepseek_client
def extract_code_lines(text: str) -> str:
lines = (text or "").splitlines()
result = []
in_code = False
open_brackets = 0
CODE_START = re.compile(r"^\s*(import |from \w|[A-Za-z_]\w*\s*=|print\()")
for line in lines:
stripped = line.strip()
if in_code and open_brackets > 0:
result.append(stripped)
open_brackets += stripped.count("(") + stripped.count("[") + stripped.count("{")
open_brackets -= stripped.count(")") + stripped.count("]") + stripped.count("}")
continue
if CODE_START.match(line):
in_code = True
result.append(stripped)
open_brackets += stripped.count("(") + stripped.count("[") + stripped.count("{")
open_brackets -= stripped.count(")") + stripped.count("]") + stripped.count("}")
continue
in_code = False
return "\n".join(result)
def try_exec_python(code: str, shared_ns: dict) -> str | None:
buf = io.StringIO()
try:
with redirect_stdout(buf):
exec(code, shared_ns) # noqa: S102
output = buf.getvalue().strip()
return output if output else None
except Exception:
return None
BATCH_ANALYSIS_PROMPT = """You are an expert academic answer analyst. Generate three study sections for each question below. ALL output must be in English.
For every question, return:
- knowledge_reminder: concise prerequisite bullets in HTML
- ai_hint: a helpful hint in HTML without revealing the final answer
- solution: a complete step-by-step solution in HTML
Return JSON in this exact format:
{{
"analyses": [
{{
"question_number": "1a",
"knowledge_reminder": "<HTML>...</HTML>",
"ai_hint": "<HTML>...</HTML>",
"solution": "<HTML>...</HTML>"
}}
]
}}
Rules:
- Return one item for every provided question_number
- All text must be in English
- HTML only, KaTeX compatible (block $$ ... $$ inline $ ... $)
- For MC questions, explain why the correct option is right and why others are wrong
- For long questions, show a complete derivation or reasoning chain
- Use <ol> or numbered steps in solution when appropriate
- Mark common mistakes with <div class="common-error">...</div>
- CRITICAL: When a question_text contains "[Context from parent question X]" followed by "[Sub-question Y]", the parent section is background context only. You MUST solve ONLY the specific sub-question labeled [Sub-question Y]. Do NOT solve other sub-questions listed in the parent context. Give one precise answer for that single sub-question only.
Questions:
{questions_payload}
"""
def chunked(lst, size):
return [lst[i:i+size] for i in range(0, len(lst), size)]
async def deepseek_batch(batch: list[dict]) -> list[dict]:
client = get_deepseek_client()
for attempt in range(5):
try:
resp = client.chat.completions.create(
model="deepseek-chat",
messages=[{
"role": "system",
"content": BATCH_ANALYSIS_PROMPT.format(
questions_payload=json.dumps(batch, ensure_ascii=False)
)
}],
temperature=0.3,
max_tokens=8192,
response_format={"type": "json_object"},
)
raw = re.sub(r'[\x00-\x08\x0b\x0c\x0e-\x1f]', '', resp.choices[0].message.content)
raw = re.sub(r'(?<!\\)((?:\\\\)*)\\([^"\\/bfnrtu])', r'\1\\\\\2', raw)
data = json.loads(raw)
return data.get("analyses", [])
except Exception as e:
print(f" attempt {attempt+1} failed: {e}")
if attempt < 4:
await asyncio.sleep(2 ** attempt * 2)
return []
async def main():
parser = argparse.ArgumentParser()
parser.add_argument("--paper-id", help="Only process this paper")
parser.add_argument("--course", help="Only process papers with this course code")
parser.add_argument("--missing-only", action="store_true", help="Only process questions missing solution")
args = parser.parse_args()
sb = get_supabase()
# Fetch all questions (with paper info for filtering)
query = sb.table("paper_questions").select(
"id, paper_id, question_number, question_type, question_text, "
"parent_question, score, correct_option, correct_answer, raw_answer_text, "
"analytics_topic, topic_tags, solution"
)
if args.paper_id:
query = query.eq("paper_id", args.paper_id)
result = query.order("paper_id").order("display_order").execute()
all_questions = result.data
if args.course:
# Filter by course via papers table
papers_res = sb.table("papers").select("id").eq("course_code", args.course.upper()).execute()
paper_ids = {p["id"] for p in papers_res.data}
all_questions = [q for q in all_questions if q["paper_id"] in paper_ids]
if args.missing_only:
all_questions = [q for q in all_questions if not q.get("solution")]
print(f"Questions missing solution: {len(all_questions)}")
else:
print(f"Total questions to process: {len(all_questions)}")
# Group by paper_id
from collections import defaultdict
by_paper: dict[str, list] = defaultdict(list)
for q in all_questions:
by_paper[q["paper_id"]].append(q)
total_updated = 0
for paper_id, questions in by_paper.items():
print(f"\nPaper {paper_id}{len(questions)} questions")
# 所有题都可能是别的题的父题
parent_text_map: dict[str, str] = {
q["question_number"]: q["question_text"] or ""
for q in questions
}
# Build payloads with context + Python exec
payloads = []
exec_namespaces: dict[str, dict] = {}
for q in questions:
parent_q = q.get("parent_question")
if parent_q and parent_q in parent_text_map:
full_text = (
f"[Context from parent question {parent_q}]\n"
f"{parent_text_map[parent_q]}\n\n"
f"[Sub-question {q['question_number']}]\n"
f"{q['question_text'] or ''}"
)
else:
full_text = q["question_text"] or ""
answer_section = ""
if q.get("raw_answer_text"):
answer_section = q["raw_answer_text"]
elif q.get("correct_option"):
answer_section = f"Correct option: {q['correct_option']}"
elif q.get("correct_answer"):
answer_section = f"Correct answer: {q['correct_answer']}"
# 尝试 Python exec 拿真实输出
if not answer_section:
group_key = parent_q or q["question_number"]
if group_key not in exec_namespaces:
ns: dict = {}
try:
import numpy as np
ns["np"] = np
except ImportError:
pass
# 先执行父题 setup 代码
if parent_q and parent_q in parent_text_map:
setup = extract_code_lines(parent_text_map[parent_q])
try_exec_python(setup, ns)
exec_namespaces[group_key] = ns
ns = exec_namespaces[group_key]
sub_code = extract_code_lines(q["question_text"] or "")
if sub_code:
exec_out = try_exec_python(sub_code, ns)
if exec_out is not None:
answer_section = f"Executed output: {exec_out}"
print(f" [exec] {q['question_number']}: {exec_out[:60]}")
payloads.append({
"_id": q["id"],
"question_number": q["question_number"],
"question_type": q["question_type"] or "long_question",
"score": q.get("score") or "unknown",
"question_text": full_text,
"reference_answer": answer_section,
})
# Process in batches of 3
id_map = {q["question_number"]: q["id"] for q in questions}
for batch in chunked(payloads, 3):
# Strip internal _id before sending to model
model_batch = [{k: v for k, v in p.items() if k != "_id"} for p in batch]
nums = [p["question_number"] for p in batch]
print(f" Batch {nums} ...", end=" ", flush=True)
analyses = await deepseek_batch(model_batch)
for item in analyses:
qnum = item.get("question_number")
qid = id_map.get(qnum)
if not qid:
continue
sb.table("paper_questions").update({
"knowledge_reminder": item.get("knowledge_reminder"),
"ai_hint": item.get("ai_hint"),
"solution": item.get("solution"),
}).eq("id", qid).execute()
total_updated += 1
print(f"done ({len(analyses)} updated)")
await asyncio.sleep(1)
print(f"\nDone. Total updated: {total_updated}")
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,160 @@
"""Backfill page_y_ratio for COMP2211 subquestions."""
from __future__ import annotations
import re
import time
from pathlib import Path
from concurrent.futures import ThreadPoolExecutor, as_completed
import fitz
import httpx
from app.services.supabase_client import get_supabase
ROOT = Path(__file__).resolve().parent.parent
PAPERS_DIR = ROOT / "pastpaper-scraper" / "papers" / "COMP2211"
PDF_BY_EXAM_KEY = {
"COMP2211-2022-fall-midterm": "(COMP2211)[2022](f)midterm~=yjz8dxdd^_27002.pdf",
"COMP2211-2022-spring-midterm": "(COMP2211)[2022](s)midterm~=b8bidkgs^_14629.pdf",
"COMP2211-2022-spring-final-part-a": "(COMP2211)[2022](s)final~=b8bidkgs^_33018.pdf",
"COMP2211-2022-spring-final-part-b": "(COMP2211)[2022](s)final~=b8bidkgs^_40627.pdf",
"COMP2211-2023-spring-midterm": "(COMP2211)[2023](s)midterm~=bxbidkmj^_26587.pdf",
"COMP2211-2024-spring-midterm": "(COMP2211)[2024](s)midterm~=rcidkjgf^_82003.pdf",
"COMP2211-2024-spring-final": "(COMP2211)[2024](s)final~=igk5mmg^_90365.pdf",
}
def marker_candidates(question_number: str) -> list[str]:
if "_" in question_number:
left, right = question_number.split("_", 1)
tokens: list[str] = []
m = re.fullmatch(r"(\d+)([a-z])", left)
if m:
tokens.append(f"({m.group(2)})")
elif re.fullmatch(r"\d+[a-z]+", left):
tokens.append(f"({re.sub(r'^\\d+', '', left)})")
tokens.append(f"({right})")
return tokens[::-1]
m = re.fullmatch(r"(\d+)([a-z])", question_number)
if m:
return [f"({m.group(2)})", f"Problem {m.group(1)}"]
if question_number.isdigit():
return [f"Problem {question_number}"]
return [question_number]
def line_matches(line_text: str, marker: str) -> bool:
text = re.sub(r"\s+", " ", line_text.strip())
if not text:
return False
if marker.startswith("("):
return text.startswith(marker)
return marker.lower() in text.lower()
def line_y_ratio(page: fitz.Page, marker: str) -> float | None:
data = page.get_text("dict")
hits: list[float] = []
for block in data.get("blocks", []):
if block.get("type") != 0:
continue
for line in block.get("lines", []):
line_text = "".join(
span.get("text", "")
for span in line.get("spans", [])
)
if line_matches(line_text, marker):
bbox = line.get("bbox")
if bbox:
hits.append(float(bbox[1]))
if not hits:
return None
y = min(hits)
return max(0.0, min((y - page.rect.y0) / page.rect.height, 0.98))
def search_y_ratio(page: fitz.Page, marker: str) -> float | None:
ratios: list[float] = []
for rect in page.search_for(marker):
ratios.append(max(0.0, min((rect.y0 - page.rect.y0) / page.rect.height, 0.98)))
return min(ratios) if ratios else None
def infer_y_ratio(page: fitz.Page, question_number: str) -> float:
for marker in marker_candidates(question_number):
ratio = line_y_ratio(page, marker)
if ratio is not None:
return ratio
ratio = search_y_ratio(page, marker)
if ratio is not None:
return ratio
return 0.05
def main() -> None:
sb = get_supabase()
papers = (
sb.table("papers")
.select("id, source_exam_key")
.eq("course_code", "COMP2211")
.eq("source_kind", "course_library")
.execute()
.data
or []
)
updates: list[tuple[str, float]] = []
for paper in papers:
exam_key = paper["source_exam_key"]
pdf_name = PDF_BY_EXAM_KEY.get(exam_key)
if not pdf_name:
continue
pdf_path = PAPERS_DIR / pdf_name
doc = fitz.open(pdf_path)
try:
questions = (
sb.table("paper_questions")
.select("id, question_number, page_number")
.eq("paper_id", paper["id"])
.order("display_order")
.execute()
.data
or []
)
for question in questions:
page_number = question.get("page_number") or 1
page = doc[page_number - 1]
ratio = infer_y_ratio(page, question["question_number"])
updates.append((question["id"], round(ratio, 4)))
finally:
doc.close()
def apply_update(payload: tuple[str, float]) -> None:
question_id, ratio = payload
attempts = 0
while True:
try:
sb.table("paper_questions").update({"page_y_ratio": ratio}).eq("id", question_id).execute()
return
except httpx.HTTPError:
attempts += 1
if attempts >= 5:
raise
time.sleep(0.4 * attempts)
with ThreadPoolExecutor(max_workers=3) as executor:
futures = [executor.submit(apply_update, payload) for payload in updates]
for future in as_completed(futures):
future.result()
print(f"Backfilled page_y_ratio for {len(updates)} COMP2211 questions.")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,365 @@
"""Backfill COMP2211 tags to the revised retrieval schema."""
from __future__ import annotations
import re
from collections import OrderedDict
from app.services.supabase_client import get_supabase
SKILL_LABELS = {
"concept_check": "Concept Check",
"code_tracing": "Code Tracing",
"algorithm_tracing": "Algorithm Tracing",
"distance_calculation": "Distance Calculation",
"centroid_update": "Centroid Update",
"weight_update": "Weight Update",
"decision_boundary": "Decision Boundary",
"implementation": "Implementation",
"debugging": "Debugging",
"model_selection": "Model Selection",
"concept_explanation": "Concept Explanation",
"architecture_reasoning": "Architecture Reasoning",
"convergence_reasoning": "Convergence Reasoning",
"generalization_reasoning": "Generalization Reasoning",
"classification_decision": "Classification Decision",
}
ACRONYMS = {
"ai": "AI",
"cnn": "CNN",
"knn": "KNN",
"mlp": "MLP",
"nb": "NB",
"numpy": "NumPy",
}
def title_case_with_acronyms(value: str) -> str:
words = re.split(r"[\s_]+", value.strip())
parts: list[str] = []
for word in words:
if not word:
continue
lowered = word.lower()
parts.append(ACRONYMS.get(lowered, lowered.capitalize()))
return " ".join(parts)
def normalize_skill_tag(tag: str) -> str:
if tag in SKILL_LABELS:
return SKILL_LABELS[tag]
return title_case_with_acronyms(tag)
def text_blob(question: dict) -> str:
parts = [
question.get("question_text") or "",
question.get("raw_answer_text") or "",
" ".join(question.get("topic_tags") or []),
" ".join(question.get("skill_tags") or []),
question.get("analytics_topic") or "",
]
return " ".join(parts).lower()
def has_any(text: str, phrases: list[str]) -> bool:
return any(phrase in text for phrase in phrases)
def infer_analytics_topic(question: dict) -> str:
text = text_blob(question)
broad = question.get("analytics_topic") or ""
skills = {normalize_skill_tag(tag) for tag in (question.get("skill_tags") or [])}
if has_any(text, ["ethics", "bias", "privacy", "autonomous vehicle", "informed consent", "human participants", "ethically"]):
return "Ethics of AI"
if has_any(text, ["minimax", "alpha-beta", "alpha beta", "game tree", "tic-tac-toe", "tic tac toe"]):
return "Game Trees"
if has_any(text, ["search algorithm", "best-first", "breadth-first", "depth-first", "a* search", "a star"]):
return "Search Algorithms"
if has_any(text, ["cross validation", "d-fold", "k-fold", "train/val", "validation set", "fold "]) or broad == "Cross Validation":
return "Cross Validation"
if has_any(text, ["confusion matrix", "precision", "recall", "macro f1", "f1 score", "accuracy score", "evaluation metric"]):
return "Evaluation Metrics"
if has_any(text, ["naive bayes", "gaussian distribution", "laplace smoothing", "likelihood", "posterior probability"]) or broad == "Naive Bayes":
return "Naive Bayes"
if has_any(text, ["bayes classifier", "conditional probability", "bayesian inference", "prior probability", "posterior"]) or broad == "Bayesian Inference":
return "Bayesian Inference"
if has_any(text, ["leader clustering", "k-means", "k means", "centroid", "elbow method", "silhouette", "cluster assignments", "closest centroid", "new cluster"]):
return "K-Means"
if has_any(text, ["k-nearest", "nearest neighbors", "weighted knn", "cosine distance", "euclidean distance", "manhattan distance", "6-cross-validation error for k", "class for cosine distance"]):
return "KNN"
if has_any(text, ["multilayer perceptron", "mlp", "back propagation", "backpropagation", "hidden layer", "output layer", "dropout", "softmax", "sigmoid function", "relu as the activation"]) or broad == "MLP":
return "MLP"
if has_any(text, ["perceptron", "decision boundary", "single neuron", "weight update", "activation function f(z)", "linearly separable"]) or broad == "Perceptron":
return "Perceptron"
if has_any(text, ["convolutional neural network", "cnn", "kernel", "padding", "stride", "pooling", "dilated convolution", "3d convolution", "otsu", "histogram", "image processing", "grayscale image"]):
return "CNN"
if has_any(text, ["numpy", "python", "np.", "broadcasting", "reshape", "transpose", "mask", "vectorized", "np.arange", "np.mean", "np.dot", "np.convolve"]):
return "Python and NumPy"
if broad == "KNN and Clustering":
if (
has_any(text, ["k-means", "k means", "centroid", "leader clustering", "elbow", "silhouette"])
or "Centroid Update" in skills
or "Convergence Reasoning" in skills
or "Algorithm Tracing" in skills
or "Model Selection" in skills
):
return "K-Means"
return "KNN"
if broad == "Perceptron and MLP":
if (
has_any(text, ["hidden layer", "backprop", "activation function", "softmax", "relu", "sigmoid", "multilayer perceptron", "mlp"])
or "Architecture Reasoning" in skills
):
return "MLP"
return "Perceptron"
if broad == "Probabilistic Models":
if has_any(text, ["naive bayes", "gaussian", "laplace", "likelihood"]):
return "Naive Bayes"
return "Bayesian Inference"
if broad == "Evaluation and Validation":
if has_any(text, ["cross validation", "cross-validation", "k-fold", "d-fold", "validation set", "train/val"]):
return "Cross Validation"
return "Evaluation Metrics"
if broad == "Search and Games":
if has_any(text, ["minimax", "alpha-beta", "alpha beta", "game tree"]):
return "Game Trees"
return "Search Algorithms"
broad_map = {
"Vision and CNN": "CNN",
"Python Fundamentals": "Python and NumPy",
"Ethics of AI": "Ethics of AI",
}
return broad_map.get(broad, "Python and NumPy")
TOPIC_CONCEPTS = {
"Naive Bayes": [
("Naive Bayes", ["naive bayes"]),
("Prior", ["prior"]),
("Likelihood", ["likelihood"]),
("Posterior", ["posterior"]),
("Gaussian", ["gaussian"]),
("Laplace Smoothing", ["laplace"]),
("Missing Data", ["missing data", "missing value"]),
],
"Bayesian Inference": [
("Bayesian Inference", ["bayes", "conditional probability", "posterior"]),
("Conditional Probability", ["conditional probability"]),
("Bayes Rule", ["bayes rule", "posterior"]),
("Prior", ["prior"]),
("Posterior", ["posterior"]),
],
"KNN": [
("KNN", ["k-nearest", "nearest neighbors", "knn"]),
("Euclidean Distance", ["euclidean distance"]),
("Manhattan Distance", ["manhattan distance"]),
("Cosine Distance", ["cosine distance"]),
("Weighted KNN", ["weighted k-nearest", "weighted knn", "inverse of the distance"]),
("Classification", ["class label", "predict", "classification"]),
("Cross Validation", ["cross-validation", "cross validation"]),
("Test Error", ["test error"]),
],
"K-Means": [
("K-Means", ["k-means", "k means"]),
("Centroid Update", ["centroid"]),
("Convergence", ["converged", "convergence"]),
("Leader Clustering", ["leader clustering"]),
("Outliers", ["outlier"]),
("Model Selection", ["elbow method", "silhouette", "suitable k"]),
],
"Perceptron": [
("Perceptron", ["perceptron"]),
("Decision Boundary", ["decision boundary", "linearly separable"]),
("Weight Update", ["weight update", "∆w", "deltaw", "backward propagation"]),
("Convergence", ["converged", "convergence"]),
("Activation Function", ["activation function"]),
],
"MLP": [
("MLP", ["mlp", "multilayer perceptron"]),
("Backpropagation", ["back propagation", "backpropagation", "backward propagation"]),
("Activation Function", ["activation function", "relu", "sigmoid", "softmax"]),
("Hidden Layer", ["hidden layer"]),
("Output Layer", ["output layer"]),
("Parameter Count", ["number of parameters", "parameter"]),
("Overfitting", ["overfitting", "dropout"]),
],
"CNN": [
("CNN", ["cnn", "convolutional neural network"]),
("Convolution", ["convolution", "kernel"]),
("Padding", ["padding", "reflection padding", "zero padding"]),
("Stride", ["stride"]),
("Pooling", ["pooling", "max pooling", "average pooling"]),
("Image Processing", ["image processing", "grayscale image"]),
("Histogram", ["histogram"]),
("Otsu Thresholding", ["otsu"]),
("Dilated Convolution", ["dilated convolution"]),
("3D Convolution", ["3d convolution"]),
("Dropout", ["dropout"]),
],
"Evaluation Metrics": [
("Evaluation Metrics", ["evaluation", "metric"]),
("Confusion Matrix", ["confusion matrix"]),
("Accuracy", ["accuracy"]),
("Precision", ["precision"]),
("Recall", ["recall"]),
("F1 Score", ["f1"]),
("Macro F1", ["macro f1"]),
],
"Cross Validation": [
("Cross Validation", ["cross validation", "cross-validation", "d-fold", "k-fold"]),
("Train Validation Split", ["validation set", "train", "test fold"]),
("Model Selection", ["choose k", "which k", "fold"]),
("Data Shuffling", ["shuffle", "shuffling"]),
],
"Python and NumPy": [
("Python and NumPy", ["numpy", "python"]),
("NumPy", ["numpy", "np."]),
("Broadcasting", ["broadcast"]),
("Array Indexing", ["index", "slice"]),
("Vectorization", ["no explicit loops", "vectorized"]),
("Matrix Multiplication", ["matmul", "matrix multiplication", "@"]),
("Reshape", ["reshape"]),
("Transpose", ["transpose"]),
("Masking", ["mask"]),
("Convolution", ["convolve"]),
],
"Search Algorithms": [
("Search Algorithms", ["search"]),
("Breadth-First Search", ["breadth-first", "breadth first", "bfs"]),
("Depth-First Search", ["depth-first", "depth first", "dfs"]),
("Best-First Search", ["best-first", "best first"]),
("A* Search", ["a* search", "a star", "astar"]),
("Heuristic", ["heuristic"]),
],
"Game Trees": [
("Game Trees", ["game tree", "minimax", "alpha-beta", "alpha beta"]),
("Minimax", ["minimax"]),
("Alpha-Beta Pruning", ["alpha-beta", "alpha beta", "pruned"]),
("Utility", ["utility"]),
],
"Ethics of AI": [
("Ethics of AI", ["ethics", "ethical"]),
("Bias", ["bias"]),
("Privacy", ["privacy"]),
("Fairness", ["fair"]),
("Research Ethics", ["informed consent", "human participants"]),
("Governance", ["monitoring", "production", "organizations"]),
("Autonomous Vehicles", ["autonomous vehicle"]),
],
}
TOPIC_DEFAULTS = {
"Naive Bayes": ["Likelihood", "Posterior"],
"Bayesian Inference": ["Conditional Probability", "Bayes Rule"],
"KNN": ["Classification", "Distance Calculation"],
"K-Means": ["Centroid Update", "Convergence"],
"Perceptron": ["Decision Boundary", "Weight Update"],
"MLP": ["Activation Function", "Hidden Layer"],
"CNN": ["Convolution", "Padding"],
"Evaluation Metrics": ["Confusion Matrix", "F1 Score"],
"Cross Validation": ["Train Validation Split", "Model Selection"],
"Python and NumPy": ["NumPy", "Vectorization"],
"Search Algorithms": ["Breadth-First Search", "Heuristic"],
"Game Trees": ["Minimax", "Alpha-Beta Pruning"],
"Ethics of AI": ["Bias", "Fairness"],
}
DEFAULT_SKILLS = {
"Naive Bayes": ["Probability Reasoning"],
"Bayesian Inference": ["Probability Reasoning"],
"KNN": ["Classification Decision"],
"K-Means": ["Centroid Update"],
"Perceptron": ["Decision Boundary"],
"MLP": ["Concept Explanation"],
"CNN": ["Concept Explanation"],
"Evaluation Metrics": ["Metric Reasoning"],
"Cross Validation": ["Model Selection"],
"Python and NumPy": ["Code Tracing"],
"Search Algorithms": ["Algorithm Tracing"],
"Game Trees": ["Game Reasoning"],
"Ethics of AI": ["Ethical Reasoning"],
}
def unique_keep_order(values: list[str]) -> list[str]:
return list(OrderedDict((value, None) for value in values if value).keys())
def build_topic_tags(question: dict, analytics_topic: str) -> list[str]:
text = text_blob(question)
tags: list[str] = [analytics_topic]
for label, keywords in TOPIC_CONCEPTS.get(analytics_topic, []):
if label == analytics_topic:
continue
if has_any(text, keywords):
tags.append(label)
for default in TOPIC_DEFAULTS.get(analytics_topic, []):
if len(unique_keep_order(tags)) >= 2:
break
tags.append(default)
tags = unique_keep_order(tags)
return tags[:5]
def build_skill_tags(question: dict, analytics_topic: str) -> list[str]:
raw = question.get("skill_tags") or []
converted = unique_keep_order([normalize_skill_tag(tag) for tag in raw])
if not converted:
converted = DEFAULT_SKILLS.get(analytics_topic, ["Concept Check"])
return converted[:3]
def main() -> None:
sb = get_supabase()
papers = (
sb.table("papers")
.select("id")
.eq("course_code", "COMP2211")
.eq("source_kind", "course_library")
.execute()
.data
)
paper_ids = [paper["id"] for paper in papers]
if not paper_ids:
print("No COMP2211 course-library papers found.")
return
questions = (
sb.table("paper_questions")
.select("id, paper_id, question_number, question_text, raw_answer_text, analytics_topic, topic_tags, skill_tags, topics")
.in_("paper_id", paper_ids)
.order("paper_id")
.order("display_order")
.execute()
.data
)
for question in questions:
analytics_topic = infer_analytics_topic(question)
topic_tags = build_topic_tags(question, analytics_topic)
skill_tags = build_skill_tags(question, analytics_topic)
payload = {
"analytics_topic": analytics_topic,
"topic_primary": analytics_topic,
"topic_tags": topic_tags,
"topics": topic_tags,
"skill_tags": skill_tags,
}
sb.table("paper_questions").update(payload).eq("id", question["id"]).execute()
print(f"Backfilled {len(questions)} COMP2211 questions.")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,169 @@
"""Backfill AI trio for questions where knowledge_reminder IS NULL.
For each question, generates fields in two separate LLM calls to avoid token truncation:
Call 1 → knowledge_reminder + ai_hint (short, ~500 tokens output)
Call 2 → solution (long, up to 4096 tokens output)
Run from the backend directory:
uv run python backfill_null_ai_trio.py [--dry-run]
"""
from __future__ import annotations
import asyncio
import json
import sys
from app.services.supabase_client import get_supabase
from app.services.paper_processor import qwen_json_completion
KNOWLEDGE_HINT_PROMPT = """\
You are an expert tutor. Given a past-paper question, produce two short study aids in English.
Return JSON exactly:
{{
"knowledge_reminder": "2-4 sentences summarising the key concept or formula the student must recall.",
"ai_hint": "1-3 sentence nudge that guides WITHOUT giving the answer away."
}}
Question:
{payload}
"""
SOLUTION_PROMPT = """\
You are an expert tutor. Given a past-paper question and its reference answer, write a clear, \
step-by-step model solution in English. Show all working. Be thorough but stop when the answer \
is complete — do not pad.
Return JSON exactly:
{{
"solution": "<full step-by-step solution as a single string, use \\n for line breaks>"
}}
Question:
{payload}
"""
def build_payload(q: dict) -> dict:
ref = ""
if q.get("raw_answer_text"):
ref = q["raw_answer_text"]
elif q.get("correct_option"):
ref = f"Correct option: {q['correct_option']}"
elif q.get("correct_answer"):
ref = f"Correct answer: {q['correct_answer']}"
return {
"question_number": q["question_number"],
"question_type": q["question_type"] or "long_question",
"score": q.get("score") or "unknown",
"question_text": q.get("question_text") or "",
"topics": q.get("topics") or [],
"reference_answer": ref,
}
async def process_one(sb, q: dict, dry_run: bool) -> bool:
payload_str = json.dumps(build_payload(q), ensure_ascii=False)
row_id = q["id"]
qnum = q["question_number"]
if dry_run:
print(f" [dry-run] would process {qnum}")
return True
update: dict = {}
# ── Call 1: knowledge_reminder + ai_hint ─────────────────────────
try:
r1 = await qwen_json_completion(
system_prompt=KNOWLEDGE_HINT_PROMPT.format(payload=payload_str),
temperature=0.3,
max_tokens=1024,
)
if r1.get("knowledge_reminder"):
update["knowledge_reminder"] = r1["knowledge_reminder"]
if r1.get("ai_hint"):
update["ai_hint"] = r1["ai_hint"]
except Exception as e:
print(f" WARN call-1 failed for {qnum}: {e}")
await asyncio.sleep(1)
# ── Call 2: solution ──────────────────────────────────────────────
try:
r2 = await qwen_json_completion(
system_prompt=SOLUTION_PROMPT.format(payload=payload_str),
temperature=0.3,
max_tokens=4096,
)
if r2.get("solution"):
update["solution"] = r2["solution"]
except Exception as e:
print(f" WARN call-2 failed for {qnum}: {e}")
if not update:
print(f" SKIP {qnum}: both calls returned nothing")
return False
sb.table("paper_questions").update(update).eq("id", row_id).execute()
return True
async def backfill(dry_run: bool = False) -> None:
sb = get_supabase()
papers = (
sb.table("papers")
.select("id")
.eq("course_code", "COMP2211")
.eq("source_kind", "course_library")
.execute()
.data
)
paper_ids = [p["id"] for p in papers]
if not paper_ids:
print("No COMP2211 course-library papers found.")
return
questions = (
sb.table("paper_questions")
.select("id, paper_id, question_number, question_type, score, question_text, topics, raw_answer_text, correct_option, correct_answer")
.in_("paper_id", paper_ids)
.is_("knowledge_reminder", "null")
.order("paper_id")
.order("display_order")
.execute()
.data
)
if not questions:
print("No NULL questions found — all done!")
return
print(f"Found {len(questions)} questions with NULL knowledge_reminder.")
# Group by paper for cleaner output
from collections import defaultdict
by_paper: dict[str, list] = defaultdict(list)
for q in questions:
by_paper[q["paper_id"]].append(q)
total_updated = 0
for paper_idx, (paper_id, qs) in enumerate(by_paper.items(), 1):
print(f"\n[{paper_idx}/{len(by_paper)}] paper_id={paper_id}{len(qs)} NULL questions")
for q in qs:
print(f" Processing {q['question_number']}...", end=" ", flush=True)
ok = await process_one(sb, q, dry_run)
if ok:
total_updated += 1
print("done")
await asyncio.sleep(1.5)
print(f"\nDone. {total_updated}/{len(questions)} questions updated.")
if __name__ == "__main__":
dry_run = "--dry-run" in sys.argv
asyncio.run(backfill(dry_run=dry_run))

View File

@@ -0,0 +1,135 @@
"""Pre-compute similar_questions for all COMP2211 course-library questions.
For each question, runs the same similarity logic as the API and writes the result
into paper_questions.similar_questions (JSONB). The API will then return this
pre-computed value directly with no computation overhead.
Run from the backend directory:
uv run python backfill_similar_questions.py [--dry-run]
"""
from __future__ import annotations
import sys
from collections import Counter
from app.services.supabase_client import get_supabase
from app.routers.questions import (
similarity_score,
question_family,
display_topics,
)
def run(dry_run: bool = False) -> None:
sb = get_supabase()
# Fetch all ready COMP2211 papers
papers = (
sb.table("papers")
.select("id, year, term, exam_type, part_label")
.eq("course_code", "COMP2211")
.eq("status", "ready")
.execute()
.data
)
if not papers:
print("No ready COMP2211 papers found.")
return
papers_by_id = {p["id"]: p for p in papers}
paper_ids = list(papers_by_id.keys())
# Fetch all questions for these papers
all_questions = (
sb.table("paper_questions")
.select(
"id, paper_id, question_number, question_type, question_format, "
"question_text, score, topics, analytics_topic, topic_tags, skill_tags, "
"difficulty, knowledge_reminder, ai_hint, solution"
)
.in_("paper_id", paper_ids)
.execute()
.data
)
print(f"Found {len(all_questions)} questions across {len(papers)} papers.")
# Batch full-text scores not practical here; skip RPC, rely on tag/topic scoring
# (text_score = 0 for all, still produces good tag-based results)
updated = 0
skipped = 0
for i, target in enumerate(all_questions, 1):
target_paper_id = target["paper_id"]
target_topic = target.get("analytics_topic")
# Candidates: same course, different paper
candidates = [
q for q in all_questions
if q["paper_id"] != target_paper_id
]
# Pre-filter by analytics_topic if available
if target_topic:
candidates = [c for c in candidates if c.get("analytics_topic") == target_topic]
if not candidates:
skipped += 1
print(f" [{i}/{len(all_questions)}] {target['question_number']} — no candidates, skip")
continue
ranked = []
for candidate in candidates:
match_percent, reasons = similarity_score(target, candidate, text_score=0.0)
if match_percent < 20:
continue
paper = papers_by_id.get(candidate["paper_id"], {})
source = (
f"{paper.get('year', '')} {paper.get('term', '').title()} "
f"{paper.get('exam_type', '').title()}"
).strip()
if paper.get("part_label"):
source = f"{source} Part {paper['part_label']}"
ranked.append({
"id": candidate["id"],
"paper_id": candidate["paper_id"],
"source": source,
"question_number": candidate["question_number"],
"match_percent": match_percent,
"match_reasons": reasons,
"question_type": question_family(candidate),
"question_text": candidate["question_text"],
"topics": display_topics(candidate),
"difficulty": candidate.get("difficulty"),
"knowledge_reminder": candidate.get("knowledge_reminder", ""),
"ai_hint": candidate.get("ai_hint", ""),
"solution": candidate.get("solution", ""),
})
ranked.sort(key=lambda item: (-item["match_percent"], item["source"], item["question_number"]))
# Deduplicate: best per paper
seen_papers: set[str] = set()
deduped = []
for item in ranked:
if item["paper_id"] not in seen_papers:
seen_papers.add(item["paper_id"])
deduped.append(item)
deduped = deduped[:12]
print(f" [{i}/{len(all_questions)}] {target['question_number']}{len(deduped)} similar", end="")
if dry_run:
print(" [dry-run]")
continue
sb.table("paper_questions").update({"similar_questions": deduped}).eq("id", target["id"]).execute()
updated += 1
print()
print(f"\nDone. {updated} updated, {skipped} skipped (no candidates).")
if __name__ == "__main__":
dry_run = "--dry-run" in sys.argv
run(dry_run=dry_run)

238
backend/backfill_vision.py Normal file
View File

@@ -0,0 +1,238 @@
"""
用 Vision 模式重新处理所有已 ready 的试卷:
- 从 Supabase Storage 拉 PDF → 图片 → Vision 拆题 → exec → AI trio → 更新 DB
用法:
python backfill_vision.py --course COMP2211
python backfill_vision.py --paper-id <uuid>
"""
import asyncio
import argparse
import requests
from app.services.supabase_client import get_supabase
from app.services.paper_processor import (
process_paper,
strip_nulls,
pdf_to_images,
gemini_vision_json,
deepseek_json_completion,
parse_json_response,
extract_code_lines,
try_exec_python,
chunked,
sort_questions,
STRUCTURE_PROMPT,
ANSWER_MATCH_PROMPT,
BATCH_ANALYSIS_PROMPT,
)
import json
import traceback
async def reprocess_paper(paper: dict):
"""重新处理单张试卷Vision 模式)"""
sb = get_supabase()
paper_id = paper["id"]
label = f"{paper['course_code']} {paper['year']} {paper['term']} {paper['exam_type']}"
print(f"\n=== {label} ({paper_id[:8]}) ===")
# 1. 拉 PDF
try:
pdf_bytes = requests.get(paper["paper_file_url"], timeout=60).content
except Exception as e:
print(f" SKIP: failed to fetch PDF: {e}")
return
answer_bytes = None
if paper.get("answer_file_url"):
try:
answer_bytes = requests.get(paper["answer_file_url"], timeout=60).content
except Exception:
pass
# 2. PDF → 图片
print(f" Rendering {len(pdf_to_images(pdf_bytes))} pages...", end=" ", flush=True)
paper_images = pdf_to_images(pdf_bytes)
print("done")
# 3. Vision 拆题(分批,每批 8 页)
PAGE_BATCH = 8
all_questions: list = []
meta: dict = {}
print(f" Vision extraction ({len(paper_images)} pages, {-(-len(paper_images)//PAGE_BATCH)} batches)...")
for i in range(0, len(paper_images), PAGE_BATCH):
batch_imgs = paper_images[i:i + PAGE_BATCH]
print(f" Pages {i+1}-{i+len(batch_imgs)}...", end=" ", flush=True)
try:
batch_result = await gemini_vision_json(
system_prompt=STRUCTURE_PROMPT,
images=batch_imgs,
user_text=f"Pages {i+1}-{i+len(batch_imgs)} of the exam paper. Extract all questions visible on these pages.",
temperature=0,
)
if not meta:
meta = {k: batch_result.get(k) for k in ("total_score", "difficulty_level", "topics_summary")}
qs = batch_result.get("questions", [])
all_questions.extend(qs)
print(f"done ({len(qs)} questions)")
except Exception as e:
print(f"FAILED: {e}")
structure = {**meta, "questions": all_questions}
questions = sort_questions(all_questions)
print(f" Total: {len(questions)} questions extracted")
# 4. 答案匹配
answers_map = {}
if answer_bytes:
print(" Vision answer matching...", end=" ", flush=True)
answer_images = pdf_to_images(answer_bytes)
questions_json = json.dumps(
[{"question_number": q["question_number"], "question_type": q["question_type"]}
for q in questions], ensure_ascii=False
)
try:
match_result = await gemini_vision_json(
system_prompt=ANSWER_MATCH_PROMPT.format(
questions_json=questions_json, answer_text="(See images)"
),
images=answer_images,
user_text=f"Match answers to these questions: {questions_json}",
temperature=0,
)
answers_map = {a["question_number"]: a for a in match_result.get("answers", [])}
print(f"done ({len(answers_map)} matched)")
except Exception as e:
print(f"FAILED: {e}")
# 5. 构建 payloadsexec Python
import numpy as np
exec_namespaces: dict = {}
batched_payloads = []
for q in questions:
qnum = q["question_number"]
answer = answers_map.get(qnum, {})
full_text = q["question_text"] or ""
answer_section = ""
if answer.get("raw_answer_text"):
answer_section = answer["raw_answer_text"]
elif answer.get("correct_option"):
answer_section = f"Correct option: {answer['correct_option']}"
elif answer.get("correct_answer"):
answer_section = f"Correct answer: {answer['correct_answer']}"
if not answer_section:
parent_q = q.get("parent_question")
group_key = parent_q or qnum
if group_key not in exec_namespaces:
ns: dict = {"np": np}
setup = extract_code_lines(full_text)
try_exec_python(setup, ns)
exec_namespaces[group_key] = ns
ns = exec_namespaces[group_key]
print_lines = [l.strip() for l in full_text.splitlines() if l.strip().startswith("print(")]
if print_lines:
out = try_exec_python(print_lines[-1], ns)
if out is not None:
answer_section = f"Executed output: {out}"
print(f" [exec] {qnum}: {out[:60]}")
batched_payloads.append({
"question_number": qnum,
"question_type": q["question_type"],
"score": q.get("score", "unknown"),
"question_text": full_text,
"topics": q.get("topics", []),
"reference_answer": answer_section,
})
# 6. AI trio
print(f" Generating AI trio ({len(batched_payloads)} questions, {len(list(chunked(batched_payloads, 3)))} batches)...")
analyses: dict = {}
for batch in chunked(batched_payloads, 3):
nums = [p["question_number"] for p in batch]
print(f" Batch {nums}...", end=" ", flush=True)
try:
result = await deepseek_json_completion(
system_prompt=BATCH_ANALYSIS_PROMPT.format(
questions_payload=json.dumps(batch, ensure_ascii=False)
),
temperature=0.3,
)
for item in result.get("analyses", []):
if item.get("question_number"):
analyses[item["question_number"]] = item
print(f"done ({len(result.get('analyses', []))})")
except Exception as e:
print(f"FAILED: {e}")
await asyncio.sleep(1)
# 7. 删除旧题目,写入新题目
print(" Writing to DB...", end=" ", flush=True)
sb.table("paper_questions").delete().eq("paper_id", paper_id).execute()
for i, q in enumerate(questions):
qnum = q["question_number"]
answer = answers_map.get(qnum, {})
analysis = analyses.get(qnum, {})
sb.table("paper_questions").insert(strip_nulls({
"paper_id": paper_id,
"question_number": qnum,
"parent_question": q.get("parent_question"),
"display_order": i,
"question_type": q["question_type"],
"question_text": q["question_text"],
"score": q.get("score"),
"page_number": q.get("page_number"),
"options": q.get("options"),
"correct_option": answer.get("correct_option"),
"correct_answer": answer.get("correct_answer"),
"raw_answer_text": answer.get("raw_answer_text"),
"topics": q.get("topics", []),
"analytics_topic": q.get("topics", [None])[0],
"topic_tags": q.get("topics", []),
"difficulty": q.get("difficulty"),
"knowledge_reminder": analysis.get("knowledge_reminder", ""),
"ai_hint": analysis.get("ai_hint", ""),
"solution": analysis.get("solution", ""),
})).execute()
sb.table("papers").update({
"question_count": len(questions),
"total_score": structure.get("total_score"),
"topics_summary": structure.get("topics_summary"),
"difficulty_level": structure.get("difficulty_level"),
}).eq("id", paper_id).execute()
print(f"done ({len(questions)} questions written)")
async def main():
parser = argparse.ArgumentParser()
parser.add_argument("--course", help="Course code")
parser.add_argument("--paper-id", help="Single paper ID")
args = parser.parse_args()
sb = get_supabase()
query = sb.table("papers").select("*").eq("status", "ready")
if args.paper_id:
query = query.eq("id", args.paper_id)
elif args.course:
query = query.eq("course_code", args.course.upper())
papers = query.order("created_at").execute().data
print(f"Papers to reprocess: {len(papers)}")
for paper in papers:
try:
await reprocess_paper(paper)
except Exception as e:
print(f" ERROR: {e}")
traceback.print_exc()
print("\nAll done.")
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,29 @@
"""Deprecated: study aids must come from LLM output, not template fillers."""
from __future__ import annotations
import sys
MESSAGE = """
fill_manual_study_aids.py is intentionally disabled.
Reason:
- knowledge_reminder / ai_hint / solution must be generated by LLM
- template-based filler content polluted the COMP2211 course library
Use one of these paths instead:
1. Regenerate study aids through the real LLM pipeline in app/services/paper_processor.py
2. Rebuild paper_questions from a reviewed source and then run LLM generation
This script must not be used to backfill production study aids.
""".strip()
def main() -> None:
print(MESSAGE, file=sys.stderr)
raise SystemExit(1)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,240 @@
"""Import a canonical course manifest into Supabase-backed papers."""
from __future__ import annotations
import argparse
import asyncio
import json
from pathlib import Path
from typing import Any
from app.services.paper_processor import process_paper
from app.services.supabase_client import get_supabase
def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(
description="Import a canonical course paper manifest into Supabase."
)
parser.add_argument(
"--manifest",
type=Path,
required=True,
help="Path to the manifest JSON file.",
)
parser.add_argument(
"--papers-root",
type=Path,
required=True,
help="Root folder that contains the course PDF files referenced by the manifest.",
)
parser.add_argument(
"--user-id",
required=False,
help="Existing auth.users UUID used as the owner of imported course-library rows.",
)
parser.add_argument(
"--course-code",
help="Optional filter to only import entries from one course.",
)
parser.add_argument(
"--exam-key",
action="append",
dest="exam_keys",
default=[],
help="Optional exam_key filter. Repeat the flag to import multiple entries.",
)
parser.add_argument(
"--process",
action="store_true",
help="Run the full paper processing pipeline after the files are uploaded.",
)
parser.add_argument(
"--dry-run",
action="store_true",
help="Print what would be imported without uploading or writing database rows.",
)
return parser.parse_args()
def load_manifest(path: Path) -> list[dict[str, Any]]:
with path.open("r", encoding="utf-8") as f:
data = json.load(f)
if not isinstance(data, list):
raise ValueError("Manifest must be a JSON array.")
return data
def should_import(entry: dict[str, Any], args: argparse.Namespace) -> bool:
if args.course_code and entry.get("course_code") != args.course_code:
return False
if args.exam_keys and entry.get("exam_key") not in set(args.exam_keys):
return False
return bool(entry.get("importable"))
def resolve_file_path(root: Path, filename: str | None) -> Path | None:
if not filename:
return None
direct = root / filename
if direct.exists():
return direct
all_files = [candidate for candidate in root.iterdir() if candidate.is_file()]
def normalize(name: str) -> str:
return name.replace(" (1)", "")
target_name = normalize(filename)
normalized = [candidate for candidate in all_files if normalize(candidate.name) == target_name]
if len(normalized) == 1:
return normalized[0]
path = Path(filename)
normalized_stem = normalize(path.stem)
suffix = path.suffix
stem_matches = [
candidate
for candidate in all_files
if candidate.suffix == suffix and normalize(candidate.stem) == normalized_stem
]
if len(stem_matches) == 1:
return stem_matches[0]
return None
def read_file_bytes(root: Path, filename: str | None) -> bytes | None:
if not filename:
return None
path = resolve_file_path(root, filename)
if path is None or not path.exists():
raise FileNotFoundError(f"Referenced file does not exist under {root}: {filename}")
return path.read_bytes()
def build_storage_path(entry: dict[str, Any], kind: str) -> str:
exam_key = entry["exam_key"]
return f"course-library/{entry['course_code']}/{exam_key}/{kind}.pdf"
def upsert_paper_record(
entry: dict[str, Any],
user_id: str | None,
paper_url: str,
answer_url: str | None,
) -> str:
sb = get_supabase()
payload = {
"user_id": user_id,
"course_code": entry["course_code"],
"year": entry["year"],
"term": entry["term"],
"exam_type": entry["exam_type"],
"part_label": entry.get("part_label"),
"paper_file_url": paper_url,
"answer_file_url": answer_url,
"status": "processing",
"source_kind": "course_library",
"source_exam_key": entry["exam_key"],
"source_question_filename": entry.get("question_pdf"),
"source_answer_filename": entry.get("primary_answer_pdf"),
}
existing = (
sb.table("papers")
.select("id")
.eq("source_kind", "course_library")
.eq("source_exam_key", entry["exam_key"])
.limit(1)
.execute()
.data
)
if existing:
paper_id = existing[0]["id"]
sb.table("papers").update(payload).eq("id", paper_id).execute()
return paper_id
created = sb.table("papers").insert(payload).execute().data
return created[0]["id"]
def reset_existing_processed_data(paper_id: str) -> None:
sb = get_supabase()
sb.table("paper_questions").delete().eq("paper_id", paper_id).execute()
sb.table("papers").update(
{
"status": "processing",
"error_message": None,
"paper_extracted_text": None,
"answer_extracted_text": None,
"total_score": None,
"question_count": None,
"topics_summary": None,
"difficulty_level": None,
}
).eq("id", paper_id).execute()
async def import_entry(
entry: dict[str, Any],
args: argparse.Namespace,
) -> None:
paper_bytes = read_file_bytes(args.papers_root, entry.get("question_pdf"))
answer_bytes = read_file_bytes(args.papers_root, entry.get("primary_answer_pdf"))
if paper_bytes is None:
raise ValueError(f"Importable entry is missing question PDF: {entry['exam_key']}")
if args.dry_run:
print(
f"[dry-run] {entry['exam_key']}: "
f"question={entry.get('question_pdf')} answer={entry.get('primary_answer_pdf')}"
)
return
sb = get_supabase()
paper_path = build_storage_path(entry, "paper")
sb.storage.from_("papers").upload(
paper_path,
paper_bytes,
file_options={"content-type": "application/pdf", "upsert": "true"},
)
paper_url = sb.storage.from_("papers").get_public_url(paper_path)
answer_url = None
if answer_bytes:
answer_path = build_storage_path(entry, "answer")
sb.storage.from_("papers").upload(
answer_path,
answer_bytes,
file_options={"content-type": "application/pdf", "upsert": "true"},
)
answer_url = sb.storage.from_("papers").get_public_url(answer_path)
paper_id = upsert_paper_record(entry, args.user_id, paper_url, answer_url)
print(f"Imported metadata for {entry['exam_key']} -> paper_id={paper_id}")
if args.process:
reset_existing_processed_data(paper_id)
await process_paper(paper_id, paper_bytes, answer_bytes)
print(f"Processed {entry['exam_key']}")
async def main() -> None:
args = parse_args()
manifest = load_manifest(args.manifest)
entries = [entry for entry in manifest if should_import(entry, args)]
if not entries:
print("No manifest entries matched the provided filters.")
return
print(f"Preparing to import {len(entries)} manifest entries.")
for entry in entries:
await import_entry(entry, args)
if __name__ == "__main__":
asyncio.run(main())

17
backend/pyproject.toml Normal file
View File

@@ -0,0 +1,17 @@
[project]
name = "pastpaper-master-backend"
version = "0.1.0"
requires-python = ">=3.11"
dependencies = [
"fastapi>=0.115.0",
"uvicorn[standard]>=0.30.0",
"python-dotenv>=1.0.0",
"python-multipart>=0.0.9",
"supabase>=2.0.0",
"openai>=1.50.0",
"PyMuPDF>=1.24.0",
"pydantic>=2.0.0",
"pydantic-settings>=2.0.0",
"httpx>=0.27.0",
"numpy>=2.4.4",
]

View File

@@ -0,0 +1,174 @@
"""Regenerate AI trio (knowledge_reminder, ai_hint, solution) for all COMP2211 course-library questions.
Reads existing paper_questions rows and runs the same BATCH_ANALYSIS_PROMPT used by
paper_processor.py — but does UPDATE instead of INSERT, so question structure is untouched.
Run from the backend directory:
uv run python regen_ai_trio_comp2211.py
Pass --dry-run to print batches without calling the LLM or writing to the database.
"""
from __future__ import annotations
import asyncio
import json
import sys
from app.services.supabase_client import get_supabase
from app.services.paper_processor import BATCH_ANALYSIS_PROMPT, qwen_json_completion, chunked
def build_reference_answer(q: dict) -> str:
if q.get("raw_answer_text"):
return q["raw_answer_text"]
if q.get("correct_option"):
return f"Correct option: {q['correct_option']}"
if q.get("correct_answer"):
return f"Correct answer: {q['correct_answer']}"
return ""
async def regen(dry_run: bool = False) -> None:
sb = get_supabase()
papers = (
sb.table("papers")
.select("id")
.eq("course_code", "COMP2211")
.eq("source_kind", "course_library")
.execute()
.data
)
paper_ids = [p["id"] for p in papers]
if not paper_ids:
print("No COMP2211 course-library papers found.")
return
questions = (
sb.table("paper_questions")
.select("id, paper_id, question_number, question_type, score, question_text, topics, raw_answer_text, correct_option, correct_answer")
.in_("paper_id", paper_ids)
.order("paper_id")
.order("display_order")
.execute()
.data
)
print(f"Found {len(questions)} questions across {len(paper_ids)} papers.")
payloads = [
{
"question_number": q["question_number"],
"question_type": q["question_type"] or "long_question",
"score": q.get("score") or "unknown",
"question_text": q.get("question_text") or "",
"topics": q.get("topics") or [],
"reference_answer": build_reference_answer(q),
}
for q in questions
]
id_by_qnum_paper: dict[tuple[str, str], str] = {
(q["paper_id"], q["question_number"]): q["id"]
for q in questions
}
paper_id_by_qnum: dict[str, str] = {
q["question_number"]: q["paper_id"] for q in questions
}
# Group payloads by paper so batches don't mix papers (cleaner context for LLM)
from collections import defaultdict
payloads_by_paper: dict[str, list[dict]] = defaultdict(list)
for q, payload in zip(questions, payloads):
payloads_by_paper[q["paper_id"]].append((q["id"], payload))
total_updated = 0
total_papers = len(payloads_by_paper)
for paper_idx, (paper_id, items) in enumerate(payloads_by_paper.items(), 1):
ids = [item[0] for item in items]
batch_payloads = [item[1] for item in items]
print(f"\n[{paper_idx}/{total_papers}] paper_id={paper_id}{len(batch_payloads)} questions")
for batch_idx, batch in enumerate(chunked(batch_payloads, 3), 1):
print(f" Batch {batch_idx}: questions {[b['question_number'] for b in batch]}", end="", flush=True)
if dry_run:
print(" [dry-run, skipped]")
continue
batch_start = (batch_idx - 1) * 3
batch_ids = ids[batch_start: batch_start + 3]
async def run_single(row_id: str, payload: dict) -> bool:
try:
r = await qwen_json_completion(
system_prompt=BATCH_ANALYSIS_PROMPT.format(
questions_payload=json.dumps([payload], ensure_ascii=False),
),
temperature=0.3,
max_tokens=8192,
)
items = r.get("analyses", [])
if not items:
return False
analysis = items[0]
sb.table("paper_questions").update({
"knowledge_reminder": analysis.get("knowledge_reminder", ""),
"ai_hint": analysis.get("ai_hint", ""),
"solution": analysis.get("solution", ""),
}).eq("id", row_id).execute()
return True
except Exception:
return False
try:
result = await qwen_json_completion(
system_prompt=BATCH_ANALYSIS_PROMPT.format(
questions_payload=json.dumps(batch, ensure_ascii=False),
),
temperature=0.3,
max_tokens=8192,
)
analyses = {item["question_number"]: item for item in result.get("analyses", [])}
written = 0
for row_id, payload in zip(batch_ids, batch):
qnum = payload["question_number"]
analysis = analyses.get(qnum)
if not analysis:
# fallback: retry this single question alone
ok = await run_single(row_id, payload)
if ok:
written += 1
total_updated += 1
else:
print(f"\n SKIP: {qnum}")
else:
sb.table("paper_questions").update({
"knowledge_reminder": analysis.get("knowledge_reminder", ""),
"ai_hint": analysis.get("ai_hint", ""),
"solution": analysis.get("solution", ""),
}).eq("id", row_id).execute()
written += 1
total_updated += 1
print(f"{written} written")
except Exception as exc:
# batch failed entirely — retry each question individually
print(f" [batch error, retrying 1-by-1]")
written = 0
for row_id, payload in zip(batch_ids, batch):
ok = await run_single(row_id, payload)
if ok:
written += 1
total_updated += 1
await asyncio.sleep(1)
print(f"{written}/{len(batch)} written")
await asyncio.sleep(2.5)
print(f"\nDone. {total_updated} questions updated.")
if __name__ == "__main__":
dry_run = "--dry-run" in sys.argv
asyncio.run(regen(dry_run=dry_run))

View File

@@ -0,0 +1,69 @@
"""Re-generate AI trio (knowledge_reminder, ai_hint, solution) in English for existing questions."""
import json
import asyncio
from app.services.supabase_client import get_supabase
from app.services.llm_clients import get_qwen_client
from app.services.paper_processor import ANALYSIS_PROMPT
async def regenerate_for_paper(paper_id: str):
sb = get_supabase()
qwen = get_qwen_client()
questions = sb.table("paper_questions").select("*").eq("paper_id", paper_id).order("display_order").execute().data
print(f"Found {len(questions)} questions for paper {paper_id[:8]}")
for q in questions:
qnum = q["question_number"]
print(f" Regenerating Q{qnum}...", end=" ", flush=True)
answer_section = ""
if q.get("raw_answer_text"):
answer_section = f"- Reference answer: {q['raw_answer_text']}"
elif q.get("correct_option"):
answer_section = f"- Correct option: {q['correct_option']}"
elif q.get("correct_answer"):
answer_section = f"- Correct answer: {q['correct_answer']}"
resp = qwen.chat.completions.create(
model="qwen-plus",
messages=[
{"role": "system", "content": ANALYSIS_PROMPT.format(
question_number=qnum,
question_type=q["question_type"],
score=q.get("score", "unknown"),
question_text=q["question_text"],
topics=", ".join(q.get("topics", [])),
answer_section=answer_section,
)},
],
temperature=0.3,
response_format={"type": "json_object"},
)
analysis = json.loads(resp.choices[0].message.content)
sb.table("paper_questions").update({
"knowledge_reminder": analysis.get("knowledge_reminder", ""),
"ai_hint": analysis.get("ai_hint", ""),
"solution": analysis.get("solution", ""),
}).eq("id", q["id"]).execute()
print("done")
print(f"All questions regenerated for paper {paper_id[:8]}")
async def main():
sb = get_supabase()
papers = sb.table("papers").select("id,course_code,year,term").eq("status", "ready").order("created_at", desc=True).execute().data
for p in papers:
print(f"\n=== {p['course_code']} {p['year']} {p['term']} ===")
await regenerate_for_paper(p["id"])
print("\nAll done!")
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,224 @@
"""Split COMP2211 Spring 2022 final part A into subquestions."""
from __future__ import annotations
import json
import re
from dataclasses import dataclass
from pathlib import Path
from app.services.supabase_client import get_supabase
EXAM_KEY = "COMP2211-2022-spring-final-part-a"
TRUE_FALSE_OPTIONS = [{"label": "True", "text": "True"}, {"label": "False", "text": "False"}]
PROBLEM_SEED_PATH = (
Path(__file__).resolve().parent.parent
/ "pastpaper-scraper"
/ "reviews"
/ "COMP2211"
/ "problem_seed.json"
)
@dataclass(frozen=True)
class ChildSpec:
question_number: str
parent_question: str
top_level_number: str
path: tuple[str, ...]
score: float
question_type: str
question_format: str | None = None
analytics_topic: str | None = None
topic_primary: str | None = None
topic_tags: tuple[str, ...] | None = None
skill_tags: tuple[str, ...] | None = None
page_number: int = 1
def short_answer(
question_number: str,
parent_question: str,
top_level_number: str,
path: tuple[str, ...],
score: float,
*,
analytics_topic: str | None = None,
topic_primary: str | None = None,
topic_tags: tuple[str, ...] | None = None,
skill_tags: tuple[str, ...] | None = None,
page_number: int,
) -> ChildSpec:
return ChildSpec(
question_number=question_number,
parent_question=parent_question,
top_level_number=top_level_number,
path=path,
score=score,
question_type="long_question",
question_format="short_answer",
analytics_topic=analytics_topic,
topic_primary=topic_primary,
topic_tags=topic_tags,
skill_tags=skill_tags,
page_number=page_number,
)
CHILDREN: list[ChildSpec] = [
ChildSpec("1a", "1", "1", ("a",), 1, "true_false", "true_false", "KNN and Clustering", "KNN and Clustering", ("KNN and Clustering",), ("concept_check", "algorithm_property"), page_number=2),
ChildSpec("1b", "1", "1", ("b",), 1, "true_false", "true_false", "Perceptron and MLP", "Perceptron and MLP", ("Perceptron and MLP",), ("concept_check", "architecture_reasoning"), page_number=2),
ChildSpec("1c", "1", "1", ("c",), 1, "true_false", "true_false", "Perceptron and MLP", "Perceptron and MLP", ("Perceptron and MLP",), ("concept_check", "activation_selection"), page_number=2),
ChildSpec("1d", "1", "1", ("d",), 1, "true_false", "true_false", "Evaluation and Validation", "Evaluation and Validation", ("Evaluation and Validation",), ("concept_check", "metric_reasoning"), page_number=2),
ChildSpec("1e", "1", "1", ("e",), 1, "true_false", "true_false", "Vision and CNN", "Vision and CNN", ("Vision and CNN",), ("concept_check", "hardware_reasoning"), page_number=2),
ChildSpec("1f", "1", "1", ("f",), 1, "true_false", "true_false", "Vision and CNN", "Vision and CNN", ("Vision and CNN",), ("concept_check", "image_processing"), page_number=2),
ChildSpec("1g", "1", "1", ("g",), 1, "true_false", "true_false", "Vision and CNN", "Vision and CNN", ("Vision and CNN",), ("concept_check", "cnn_architecture"), page_number=2),
ChildSpec("1h", "1", "1", ("h",), 1, "true_false", "true_false", "Perceptron and MLP", "Perceptron and MLP", ("Perceptron and MLP",), ("concept_check", "regularization"), page_number=2),
ChildSpec("1i", "1", "1", ("i",), 1, "true_false", "true_false", "Search and Games", "Search and Games", ("Search and Games",), ("concept_check", "game_reasoning"), page_number=2),
ChildSpec("1j", "1", "1", ("j",), 1, "true_false", "true_false", "Search and Games", "Search and Games", ("Search and Games",), ("concept_check", "pruning_reasoning"), page_number=2),
ChildSpec("2a", "2", "2", ("a",), 6.5, "long_question", "long_answer", "Probabilistic Models", "Probabilistic Models", ("Probabilistic Models",), ("manual_computation", "probability_reasoning", "classification_decision"), page_number=4),
ChildSpec("2b", "2", "2", ("b",), 7.5, "long_question", "long_answer", "KNN and Clustering", "KNN and Clustering", ("KNN and Clustering",), ("distance_calculation", "algorithm_tracing", "classification_decision"), page_number=4),
short_answer("3a", "3", "3", ("a",), 3, analytics_topic="Evaluation and Validation", topic_primary="Evaluation and Validation", topic_tags=("Evaluation and Validation",), skill_tags=("concept_explanation", "metric_reasoning"), page_number=6),
short_answer("3b", "3", "3", ("b",), 2, analytics_topic="Perceptron and MLP", topic_primary="Perceptron and MLP", topic_tags=("Perceptron and MLP",), skill_tags=("concept_explanation", "activation_selection"), page_number=6),
short_answer("3c", "3", "3", ("c",), 2, analytics_topic="Perceptron and MLP", topic_primary="Perceptron and MLP", topic_tags=("Perceptron and MLP",), skill_tags=("architecture_reasoning", "output_layer_design"), page_number=6),
short_answer("3d", "3", "3", ("d",), 3, analytics_topic="Perceptron and MLP", topic_primary="Perceptron and MLP", topic_tags=("Perceptron and MLP",), skill_tags=("concept_explanation", "optimization_reasoning"), page_number=6),
short_answer("3e_i", "3e", "3", ("e", "i"), 1, analytics_topic="Perceptron and MLP", topic_primary="Perceptron and MLP", topic_tags=("Perceptron and MLP",), skill_tags=("optimization_reasoning",), page_number=6),
short_answer("3e_ii", "3e", "3", ("e", "ii"), 1, analytics_topic="Perceptron and MLP", topic_primary="Perceptron and MLP", topic_tags=("Perceptron and MLP",), skill_tags=("optimization_reasoning",), page_number=6),
short_answer("3f", "3", "3", ("f",), 2, analytics_topic="Perceptron and MLP", topic_primary="Perceptron and MLP", topic_tags=("Perceptron and MLP",), skill_tags=("regularization", "concept_explanation"), page_number=6),
ChildSpec("4a_i", "4a", "4", ("a", "i"), 2, "fill_blank", "fill_blank", page_number=7),
ChildSpec("4a_ii", "4a", "4", ("a", "ii"), 2, "long_question", "long_answer", page_number=7),
ChildSpec("4b_i", "4b", "4", ("b", "i"), 3, "fill_blank", "fill_blank", page_number=7),
ChildSpec("4b_ii", "4b", "4", ("b", "ii"), 4, "fill_blank", "fill_blank", page_number=7),
ChildSpec("4b_iii", "4b", "4", ("b", "iii"), 4, "long_question", "long_answer", page_number=7),
]
MARKER_RE = re.compile(r"(?m)^\(([a-z]+|[ivx]+)\)\s*")
def split_sections(text: str) -> tuple[str, dict[str, str]]:
matches = list(MARKER_RE.finditer(text))
if not matches:
return text.strip(), {}
intro = text[: matches[0].start()].strip()
sections: dict[str, str] = {}
for idx, match in enumerate(matches):
marker = match.group(1)
end = matches[idx + 1].start() if idx + 1 < len(matches) else len(text)
sections[marker] = text[match.start() : end].strip()
return intro, sections
def extract_segment(text: str, path: tuple[str, ...]) -> str:
current = text.strip()
carried_intro: list[str] = []
for depth, marker in enumerate(path):
intro, sections = split_sections(current)
if depth == 0 and intro:
carried_intro.append(intro)
current = sections.get(marker, current)
return "\n".join(part for part in [*carried_intro, current] if part).strip()
def extract_true_false_answers(answer_text: str) -> dict[str, str]:
answers: dict[str, str] = {}
matches = list(re.finditer(r"(?m)^\(([a-j])\)\s*\n?([TF])\b", answer_text))
for match in matches:
answers[match.group(1)] = match.group(2)
return answers
def derive_correct_answer(answer_text: str) -> str | None:
if not answer_text:
return None
tail = answer_text.split("Answer:", 1)[1] if "Answer:" in answer_text else answer_text
lines = [line.strip() for line in tail.splitlines() if line.strip()]
if not lines:
return None
first = lines[0]
if first.lower().startswith("marking scheme"):
return None
if len(first) <= 240:
return first
return None
def load_seed_rows() -> dict[str, dict]:
data = json.loads(PROBLEM_SEED_PATH.read_text())
return {
row["question_number"]: row
for row in data
if row["source_exam_key"] == EXAM_KEY
}
def main() -> None:
sb = get_supabase()
paper = sb.table("papers").select("id").eq("source_exam_key", EXAM_KEY).execute().data[0]
paper_id = paper["id"]
current_rows = (
sb.table("paper_questions")
.select("*")
.eq("paper_id", paper_id)
.order("display_order")
.execute()
.data
)
existing_by_number = {row["question_number"]: row for row in current_rows}
parent_rows = load_seed_rows()
tf_answers = extract_true_false_answers(parent_rows["1"]["raw_answer_text"] or "")
inserts = []
for display_order, child in enumerate(CHILDREN, start=1):
parent = parent_rows[child.top_level_number]
existing = existing_by_number.get(child.question_number, {})
question_text = extract_segment(parent["question_text"] or "", child.path)
raw_answer_text = extract_segment(parent["raw_answer_text"] or "", child.path)
correct_option = None
correct_answer = None
options = None
if child.question_type == "true_false":
correct_option = tf_answers.get(child.path[0])
options = TRUE_FALSE_OPTIONS
elif child.question_type == "fill_blank":
correct_answer = derive_correct_answer(raw_answer_text)
inserts.append(
{
"paper_id": paper_id,
"question_number": child.question_number,
"parent_question": child.parent_question,
"display_order": display_order,
"question_type": child.question_type,
"question_format": child.question_format,
"question_text": question_text,
"score": child.score,
"page_number": child.page_number,
"page_y_ratio": existing.get("page_y_ratio"),
"options": options,
"correct_option": correct_option,
"correct_answer": correct_answer,
"raw_answer_text": raw_answer_text,
"topics": existing.get("topics") or (list(child.topic_tags) if child.topic_tags else parent.get("topics")),
"topic_primary": existing.get("topic_primary") or child.topic_primary or parent.get("topic_primary"),
"analytics_topic": existing.get("analytics_topic") or child.analytics_topic or parent.get("analytics_topic"),
"topic_tags": existing.get("topic_tags") or (list(child.topic_tags) if child.topic_tags else parent.get("topic_tags")),
"skill_tags": existing.get("skill_tags") or (list(child.skill_tags) if child.skill_tags else parent.get("skill_tags")),
"difficulty": existing.get("difficulty") or parent.get("difficulty"),
"knowledge_reminder": existing.get("knowledge_reminder", ""),
"ai_hint": existing.get("ai_hint", ""),
"solution": existing.get("solution", ""),
}
)
sb.table("paper_questions").delete().eq("paper_id", paper_id).execute()
sb.table("paper_questions").insert(inserts).execute()
sb.table("papers").update({"question_count": len(inserts), "status": "processing"}).eq("id", paper_id).execute()
print(f"Inserted {len(inserts)} rows for {EXAM_KEY}.")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,232 @@
"""Split COMP2211 Spring 2022 final part B into subquestions."""
from __future__ import annotations
import json
import re
from dataclasses import dataclass
from pathlib import Path
from app.services.supabase_client import get_supabase
EXAM_KEY = "COMP2211-2022-spring-final-part-b"
PROBLEM_SEED_PATH = (
Path(__file__).resolve().parent.parent
/ "pastpaper-scraper"
/ "reviews"
/ "COMP2211"
/ "problem_seed.json"
)
@dataclass(frozen=True)
class ChildSpec:
question_number: str
parent_question: str
top_level_number: str
path: tuple[str, ...]
score: float
question_type: str
question_format: str | None = None
analytics_topic: str | None = None
topic_primary: str | None = None
topic_tags: tuple[str, ...] | None = None
skill_tags: tuple[str, ...] | None = None
options: tuple[tuple[str, str], ...] | None = None
correct_option: str | None = None
correct_answer: str | None = None
page_number: int = 1
def short_answer(
question_number: str,
parent_question: str,
top_level_number: str,
path: tuple[str, ...],
score: float,
*,
analytics_topic: str | None = None,
topic_primary: str | None = None,
topic_tags: tuple[str, ...] | None = None,
skill_tags: tuple[str, ...] | None = None,
correct_answer: str | None = None,
page_number: int,
) -> ChildSpec:
return ChildSpec(
question_number=question_number,
parent_question=parent_question,
top_level_number=top_level_number,
path=path,
score=score,
question_type="long_question",
question_format="short_answer",
analytics_topic=analytics_topic,
topic_primary=topic_primary,
topic_tags=topic_tags,
skill_tags=skill_tags,
correct_answer=correct_answer,
page_number=page_number,
)
def mc(
question_number: str,
parent_question: str,
top_level_number: str,
path: tuple[str, ...],
score: float,
*,
options: tuple[tuple[str, str], ...],
correct_option: str,
analytics_topic: str,
skill_tags: tuple[str, ...],
page_number: int,
) -> ChildSpec:
return ChildSpec(
question_number=question_number,
parent_question=parent_question,
top_level_number=top_level_number,
path=path,
score=score,
question_type="mc",
question_format="mc",
analytics_topic=analytics_topic,
topic_primary=analytics_topic,
topic_tags=(analytics_topic,),
skill_tags=skill_tags,
options=options,
correct_option=correct_option,
page_number=page_number,
)
ETHICS_ABCD = (
("A", "A"),
("B", "B"),
("C", "C"),
("D", "D"),
)
CHILDREN: list[ChildSpec] = [
ChildSpec("1a", "1", "1", ("a",), 1.5, "long_question", "long_answer", page_number=2),
short_answer("1b", "1", "1", ("b",), 1.5, analytics_topic="Vision and CNN", topic_primary="Vision and CNN", topic_tags=("Vision and CNN",), skill_tags=("concept_explanation", "data_augmentation"), page_number=2),
ChildSpec("1c", "1", "1", ("c",), 4.5, "long_question", "long_answer", page_number=2),
short_answer("1d", "1", "1", ("d",), 2, analytics_topic="Vision and CNN", topic_primary="Vision and CNN", topic_tags=("Vision and CNN",), skill_tags=("architecture_reasoning", "parameter_reduction"), page_number=3),
ChildSpec("1e", "1", "1", ("e",), 2.5, "fill_blank", "fill_blank", correct_answer="1558656", page_number=3),
ChildSpec("1f_i", "1f", "1", ("f", "i"), 2.5, "fill_blank", "fill_blank", correct_answer="2071656", page_number=3),
ChildSpec("1f_ii", "1f", "1", ("f", "ii"), 2.5, "fill_blank", "fill_blank", correct_answer="150529000", page_number=4),
short_answer("1g", "1", "1", ("g",), 2, analytics_topic="Vision and CNN", topic_primary="Vision and CNN", topic_tags=("Vision and CNN",), skill_tags=("architecture_reasoning", "comparison"), page_number=4),
ChildSpec("2a", "2", "2", ("a",), 9, "long_question", "coding", page_number=5),
short_answer("2b", "2", "2", ("b",), 4, analytics_topic="Vision and CNN", topic_primary="Vision and CNN", topic_tags=("Vision and CNN",), skill_tags=("architecture_reasoning", "regression_reasoning"), page_number=6),
ChildSpec("3a", "3", "3", ("a",), 3.5, "long_question", "long_answer", page_number=9),
short_answer("3b", "3", "3", ("b",), 0.5, analytics_topic="Search and Games", topic_primary="Search and Games", topic_tags=("Search and Games",), skill_tags=("game_reasoning",), correct_answer="E-a", page_number=9),
short_answer("3c", "3", "3", ("c",), 1.5, analytics_topic="Search and Games", topic_primary="Search and Games", topic_tags=("Search and Games",), skill_tags=("concept_explanation", "game_reasoning"), page_number=9),
short_answer("3d", "3", "3", ("d",), 2.5, analytics_topic="Search and Games", topic_primary="Search and Games", topic_tags=("Search and Games",), skill_tags=("pruning_reasoning",), correct_answer="E-j and E-f", page_number=9),
mc("4a", "4", "4", ("a",), 1, options=ETHICS_ABCD, correct_option="C", analytics_topic="Ethics of AI", skill_tags=("concept_check", "ethical_reasoning"), page_number=10),
mc("4b", "4", "4", ("b",), 1, options=ETHICS_ABCD, correct_option="A", analytics_topic="Ethics of AI", skill_tags=("concept_check", "bias_reasoning"), page_number=10),
mc("4c", "4", "4", ("c",), 1, options=ETHICS_ABCD, correct_option="C", analytics_topic="Ethics of AI", skill_tags=("concept_check", "ethical_reasoning"), page_number=10),
mc("4d", "4", "4", ("d",), 1, options=ETHICS_ABCD, correct_option="B", analytics_topic="Ethics of AI", skill_tags=("concept_check", "bias_reasoning"), page_number=10),
short_answer("4e", "4", "4", ("e",), 3, analytics_topic="Ethics of AI", topic_primary="Ethics of AI", topic_tags=("Ethics of AI",), skill_tags=("argumentation", "concept_explanation"), page_number=11),
]
MARKER_RE = re.compile(r"(?m)^\(([a-z]+|[ivx]+)\)\s*")
def split_sections(text: str) -> tuple[str, dict[str, str]]:
matches = list(MARKER_RE.finditer(text))
if not matches:
return text.strip(), {}
intro = text[: matches[0].start()].strip()
sections: dict[str, str] = {}
for idx, match in enumerate(matches):
marker = match.group(1)
end = matches[idx + 1].start() if idx + 1 < len(matches) else len(text)
sections[marker] = text[match.start() : end].strip()
return intro, sections
def extract_segment(text: str, path: tuple[str, ...]) -> str:
current = text.strip()
carried_intro: list[str] = []
for depth, marker in enumerate(path):
intro, sections = split_sections(current)
if depth == 0 and intro:
carried_intro.append(intro)
current = sections.get(marker, current)
return "\n".join(part for part in [*carried_intro, current] if part).strip()
def load_seed_rows() -> dict[str, dict]:
data = json.loads(PROBLEM_SEED_PATH.read_text())
return {
row["question_number"]: row
for row in data
if row["source_exam_key"] == EXAM_KEY
}
def main() -> None:
sb = get_supabase()
paper = sb.table("papers").select("id").eq("source_exam_key", EXAM_KEY).execute().data[0]
paper_id = paper["id"]
current_rows = (
sb.table("paper_questions")
.select("*")
.eq("paper_id", paper_id)
.order("display_order")
.execute()
.data
)
existing_by_number = {row["question_number"]: row for row in current_rows}
parent_rows = load_seed_rows()
inserts = []
for display_order, child in enumerate(CHILDREN, start=1):
parent = parent_rows[child.top_level_number]
existing = existing_by_number.get(child.question_number, {})
question_text = extract_segment(parent["question_text"] or "", child.path)
raw_answer_text = extract_segment(parent["raw_answer_text"] or "", child.path)
options = None
if child.options:
options = [{"label": label, "text": text} for label, text in child.options]
inserts.append(
{
"paper_id": paper_id,
"question_number": child.question_number,
"parent_question": child.parent_question,
"display_order": display_order,
"question_type": child.question_type,
"question_format": child.question_format,
"question_text": question_text,
"score": child.score,
"page_number": child.page_number,
"page_y_ratio": existing.get("page_y_ratio"),
"options": options,
"correct_option": child.correct_option,
"correct_answer": child.correct_answer,
"raw_answer_text": raw_answer_text,
"topics": existing.get("topics") or (list(child.topic_tags) if child.topic_tags else parent.get("topics")),
"topic_primary": existing.get("topic_primary") or child.topic_primary or parent.get("topic_primary"),
"analytics_topic": existing.get("analytics_topic") or child.analytics_topic or parent.get("analytics_topic"),
"topic_tags": existing.get("topic_tags") or (list(child.topic_tags) if child.topic_tags else parent.get("topic_tags")),
"skill_tags": existing.get("skill_tags") or (list(child.skill_tags) if child.skill_tags else parent.get("skill_tags")),
"difficulty": existing.get("difficulty") or parent.get("difficulty"),
"knowledge_reminder": existing.get("knowledge_reminder", ""),
"ai_hint": existing.get("ai_hint", ""),
"solution": existing.get("solution", ""),
}
)
sb.table("paper_questions").delete().eq("paper_id", paper_id).execute()
sb.table("paper_questions").insert(inserts).execute()
sb.table("papers").update({"question_count": len(inserts), "status": "processing"}).eq("id", paper_id).execute()
print(f"Inserted {len(inserts)} rows for {EXAM_KEY}.")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,233 @@
"""Split COMP2211 Spring 2022 midterm top-level problems into subquestions."""
from __future__ import annotations
import json
import re
from dataclasses import dataclass
from pathlib import Path
from app.services.supabase_client import get_supabase
EXAM_KEY = "COMP2211-2022-spring-midterm"
TRUE_FALSE_OPTIONS = [{"label": "True", "text": "True"}, {"label": "False", "text": "False"}]
@dataclass(frozen=True)
class ChildSpec:
question_number: str
parent_question: str
top_level_number: str
path: tuple[str, ...]
score: float
question_type: str
question_format: str | None = None
page_number: int = 1
def short_answer(
question_number: str,
parent_question: str,
top_level_number: str,
path: tuple[str, ...],
score: float,
*,
page_number: int,
) -> ChildSpec:
return ChildSpec(
question_number=question_number,
parent_question=parent_question,
top_level_number=top_level_number,
path=path,
score=score,
question_type="long_question",
question_format="short_answer",
page_number=page_number,
)
CHILDREN: list[ChildSpec] = [
*[
ChildSpec(f"1{letter}", "1", "1", (letter,), 1.5, "true_false", page_number=2)
for letter in "abcdefghij"
],
ChildSpec("2a_i", "2a", "2", ("a", "i"), 1, "fill_blank", page_number=4),
ChildSpec("2a_ii", "2a", "2", ("a", "ii"), 1, "fill_blank", page_number=4),
ChildSpec("2a_iii", "2a", "2", ("a", "iii"), 1, "fill_blank", page_number=4),
ChildSpec("2a_iv", "2a", "2", ("a", "iv"), 1, "fill_blank", page_number=4),
ChildSpec("2a_v", "2a", "2", ("a", "v"), 1, "fill_blank", page_number=4),
ChildSpec("2b", "2", "2", ("b",), 2, "fill_blank", page_number=4),
ChildSpec("2c", "2", "2", ("c",), 9, "long_question", "coding", page_number=5),
ChildSpec("3a", "3", "3", ("a",), 2, "fill_blank", page_number=7),
ChildSpec("3b_i", "3b", "3", ("b", "i"), 1.75, "fill_blank", page_number=7),
ChildSpec("3b_ii", "3b", "3", ("b", "ii"), 1.75, "fill_blank", page_number=7),
ChildSpec("3b_iii", "3b", "3", ("b", "iii"), 1.75, "fill_blank", page_number=7),
ChildSpec("3b_iv", "3b", "3", ("b", "iv"), 1.75, "fill_blank", page_number=7),
short_answer("3c", "3", "3", ("c",), 2, page_number=8),
ChildSpec("4a", "4", "4", ("a",), 3, "long_question", "long_answer", page_number=9),
short_answer("4b_i", "4b", "4", ("b", "i"), 3, page_number=9),
short_answer("4b_ii", "4b", "4", ("b", "ii"), 3, page_number=9),
ChildSpec("4c_i", "4c", "4", ("c", "i"), 2, "long_question", "long_answer", page_number=10),
ChildSpec("4c_ii", "4c", "4", ("c", "ii"), 3, "long_question", "long_answer", page_number=10),
ChildSpec("5a", "5", "5", ("a",), 4.5, "long_question", "long_answer", page_number=11),
ChildSpec("5b", "5", "5", ("b",), 1.5, "fill_blank", page_number=11),
ChildSpec("5c", "5", "5", ("c",), 4.5, "long_question", "long_answer", page_number=11),
short_answer("5d", "5", "5", ("d",), 1.5, page_number=11),
ChildSpec("6a", "6", "6", ("a",), 8, "long_question", "long_answer", page_number=12),
short_answer("6b", "6", "6", ("b",), 2, page_number=13),
ChildSpec("6c", "6", "6", ("c",), 10, "long_question", "coding", page_number=13),
short_answer("7a", "7", "7", ("a",), 4, page_number=14),
short_answer("7b", "7", "7", ("b",), 6, page_number=14),
ChildSpec("7c", "7", "7", ("c",), 2, "fill_blank", page_number=15),
]
MARKER_RE = re.compile(r"(?m)^\(([a-z]+)\)\s*")
PROBLEM_SEED_PATH = (
Path(__file__).resolve().parent.parent
/ "pastpaper-scraper"
/ "reviews"
/ "COMP2211"
/ "problem_seed.json"
)
def split_sections(text: str) -> tuple[str, dict[str, str]]:
matches = list(MARKER_RE.finditer(text))
if not matches:
return text.strip(), {}
intro = text[: matches[0].start()].strip()
sections: dict[str, str] = {}
for idx, match in enumerate(matches):
marker = match.group(1)
end = matches[idx + 1].start() if idx + 1 < len(matches) else len(text)
sections[marker] = text[match.start() : end].strip()
return intro, sections
def extract_segment(text: str, path: tuple[str, ...]) -> str:
intro, sections = split_sections(text)
if not path:
return text.strip()
first = sections.get(path[0], "")
if not first:
return text.strip()
if len(path) == 1:
return "\n".join(part for part in [intro, first] if part).strip()
child_intro, child_sections = split_sections(first)
second = child_sections.get(path[1], "")
return "\n".join(part for part in [intro, child_intro, second] if part).strip()
def extract_true_false_answers(answer_text: str) -> dict[str, str]:
answers: dict[str, str] = {}
matches = list(re.finditer(r"(?m)^\(([a-j])\)\s*\n?([TF])\b", answer_text))
for match in matches:
answers[match.group(1)] = match.group(2)
return answers
def derive_correct_answer(answer_text: str) -> str | None:
if not answer_text:
return None
if "Answer:" in answer_text:
tail = answer_text.split("Answer:", 1)[1]
else:
tail = answer_text
lines = [line.strip() for line in tail.splitlines() if line.strip()]
if not lines:
return None
first = lines[0]
if first.lower().startswith("marking scheme"):
return None
if len(first) <= 240:
return first
return None
def load_seed_rows() -> dict[str, dict]:
data = json.loads(PROBLEM_SEED_PATH.read_text())
return {
row["question_number"]: row
for row in data
if row["source_exam_key"] == EXAM_KEY
}
def main() -> None:
sb = get_supabase()
paper = (
sb.table("papers")
.select("id")
.eq("source_exam_key", EXAM_KEY)
.execute()
.data[0]
)
paper_id = paper["id"]
current_rows = (
sb.table("paper_questions")
.select("*")
.eq("paper_id", paper_id)
.order("display_order")
.execute()
.data
)
existing_by_number = {row["question_number"]: row for row in current_rows}
parent_rows = load_seed_rows()
tf_answers = extract_true_false_answers(parent_rows["1"]["raw_answer_text"] or "")
inserts = []
for display_order, child in enumerate(CHILDREN, start=1):
parent = parent_rows[child.top_level_number]
existing = existing_by_number.get(child.question_number, {})
question_text = extract_segment(parent["question_text"] or "", child.path)
raw_answer_text = extract_segment(parent["raw_answer_text"] or "", child.path)
correct_option = None
correct_answer = None
options = None
if child.question_type == "true_false":
marker = child.path[0]
correct_option = tf_answers.get(marker)
options = TRUE_FALSE_OPTIONS
elif child.question_type == "fill_blank":
correct_answer = derive_correct_answer(raw_answer_text)
inserts.append(
{
"paper_id": paper_id,
"question_number": child.question_number,
"parent_question": child.parent_question,
"display_order": display_order,
"question_type": child.question_type,
"question_format": child.question_format,
"question_text": question_text,
"score": child.score,
"page_number": child.page_number,
"page_y_ratio": existing.get("page_y_ratio"),
"options": options,
"correct_option": correct_option,
"correct_answer": correct_answer,
"raw_answer_text": raw_answer_text,
"topics": existing.get("topics") or parent.get("topics"),
"topic_primary": existing.get("topic_primary") or parent.get("topic_primary"),
"analytics_topic": existing.get("analytics_topic") or parent.get("analytics_topic"),
"topic_tags": existing.get("topic_tags") or parent.get("topic_tags"),
"skill_tags": existing.get("skill_tags") or parent.get("skill_tags"),
"difficulty": existing.get("difficulty") or parent.get("difficulty"),
"knowledge_reminder": existing.get("knowledge_reminder", ""),
"ai_hint": existing.get("ai_hint", ""),
"solution": existing.get("solution", ""),
}
)
sb.table("paper_questions").delete().eq("paper_id", paper_id).execute()
sb.table("paper_questions").insert(inserts).execute()
sb.table("papers").update({"question_count": len(inserts), "status": "processing"}).eq("id", paper_id).execute()
print(f"Inserted {len(inserts)} rows for {EXAM_KEY}.")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,268 @@
"""Split COMP2211 Spring 2023 midterm into subquestions."""
from __future__ import annotations
import json
import re
from dataclasses import dataclass
from pathlib import Path
from app.services.supabase_client import get_supabase
EXAM_KEY = "COMP2211-2023-spring-midterm"
PROBLEM_SEED_PATH = (
Path(__file__).resolve().parent.parent
/ "pastpaper-scraper"
/ "reviews"
/ "COMP2211"
/ "problem_seed.json"
)
TRUE_FALSE_OPTIONS = [{"label": "True", "text": "True"}, {"label": "False", "text": "False"}]
@dataclass(frozen=True)
class ChildSpec:
question_number: str
parent_question: str
top_level_number: str
path: tuple[str, ...]
score: float
question_type: str
question_format: str | None = None
analytics_topic: str | None = None
topic_primary: str | None = None
topic_tags: tuple[str, ...] | None = None
skill_tags: tuple[str, ...] | None = None
options: tuple[tuple[str, str], ...] | None = None
correct_option: str | None = None
correct_answer: str | None = None
page_number: int = 1
def short_answer(
question_number: str,
parent_question: str,
top_level_number: str,
path: tuple[str, ...],
score: float,
*,
analytics_topic: str | None = None,
topic_primary: str | None = None,
topic_tags: tuple[str, ...] | None = None,
skill_tags: tuple[str, ...] | None = None,
correct_answer: str | None = None,
page_number: int,
) -> ChildSpec:
return ChildSpec(
question_number=question_number,
parent_question=parent_question,
top_level_number=top_level_number,
path=path,
score=score,
question_type="long_question",
question_format="short_answer",
analytics_topic=analytics_topic,
topic_primary=topic_primary,
topic_tags=topic_tags,
skill_tags=skill_tags,
correct_answer=correct_answer,
page_number=page_number,
)
def mc(
question_number: str,
parent_question: str,
top_level_number: str,
path: tuple[str, ...],
score: float,
*,
options: tuple[tuple[str, str], ...],
correct_option: str,
analytics_topic: str,
skill_tags: tuple[str, ...],
page_number: int,
) -> ChildSpec:
return ChildSpec(
question_number=question_number,
parent_question=parent_question,
top_level_number=top_level_number,
path=path,
score=score,
question_type="mc",
question_format="mc",
analytics_topic=analytics_topic,
topic_primary=analytics_topic,
topic_tags=(analytics_topic,),
skill_tags=skill_tags,
options=options,
correct_option=correct_option,
page_number=page_number,
)
ABCDE = (("A", "A"), ("B", "B"), ("C", "C"), ("D", "D"), ("E", "E"))
CHILDREN: list[ChildSpec] = [
ChildSpec("1a", "1", "1", ("a",), 1, "true_false", "true_false", "Probabilistic Models", "Probabilistic Models", ("Probabilistic Models",), ("concept_check", "classification_decision"), page_number=3),
ChildSpec("1b", "1", "1", ("b",), 1, "true_false", "true_false", "Probabilistic Models", "Probabilistic Models", ("Probabilistic Models",), ("concept_check", "classification_decision"), page_number=3),
ChildSpec("1c", "1", "1", ("c",), 1, "true_false", "true_false", "KNN and Clustering", "KNN and Clustering", ("KNN and Clustering",), ("concept_check", "algorithm_property"), page_number=3),
ChildSpec("1d", "1", "1", ("d",), 1, "true_false", "true_false", "KNN and Clustering", "KNN and Clustering", ("KNN and Clustering",), ("concept_check", "distance_reasoning"), page_number=3),
ChildSpec("1e", "1", "1", ("e",), 1, "true_false", "true_false", "Evaluation and Validation", "Evaluation and Validation", ("Evaluation and Validation",), ("concept_check", "validation_reasoning"), page_number=3),
ChildSpec("1f", "1", "1", ("f",), 1, "true_false", "true_false", "KNN and Clustering", "KNN and Clustering", ("KNN and Clustering",), ("concept_check", "algorithm_property"), page_number=3),
ChildSpec("1g", "1", "1", ("g",), 1, "true_false", "true_false", "KNN and Clustering", "KNN and Clustering", ("KNN and Clustering",), ("concept_check", "robustness_reasoning"), page_number=3),
ChildSpec("1h", "1", "1", ("h",), 1, "true_false", "true_false", "Perceptron and MLP", "Perceptron and MLP", ("Perceptron and MLP",), ("concept_check", "decision_boundary"), page_number=3),
ChildSpec("1i", "1", "1", ("i",), 1, "true_false", "true_false", "Perceptron and MLP", "Perceptron and MLP", ("Perceptron and MLP",), ("concept_check", "optimization_reasoning"), page_number=3),
ChildSpec("1j", "1", "1", ("j",), 1, "true_false", "true_false", "Perceptron and MLP", "Perceptron and MLP", ("Perceptron and MLP",), ("concept_check", "expressiveness_reasoning"), page_number=3),
short_answer("2a_i", "2a", "2", ("a", "i"), 1, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("code_tracing",), page_number=4),
short_answer("2a_ii", "2a", "2", ("a", "ii"), 1, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("code_tracing",), page_number=4),
short_answer("2a_iii", "2a", "2", ("a", "iii"), 1, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("code_tracing",), page_number=4),
short_answer("2a_iv", "2a", "2", ("a", "iv"), 1, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("code_tracing",), page_number=4),
short_answer("2a_v", "2a", "2", ("a", "v"), 1, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("indexing", "code_tracing"), page_number=4),
short_answer("2a_vi", "2a", "2", ("a", "vi"), 1, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("indexing", "error_reasoning"), page_number=5),
short_answer("2a_vii", "2a", "2", ("a", "vii"), 1, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("masking", "code_tracing"), page_number=5),
short_answer("2a_viii", "2a", "2", ("a", "viii"), 1, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("aggregation", "code_tracing"), page_number=5),
short_answer("2a_ix", "2a", "2", ("a", "ix"), 1, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("transpose", "code_tracing"), page_number=5),
short_answer("2b_i", "2b", "2", ("b", "i"), 2, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("broadcasting", "code_tracing"), page_number=6),
short_answer("2b_ii", "2b", "2", ("b", "ii"), 2, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("broadcasting", "error_reasoning"), page_number=6),
short_answer("2b_iii", "2b", "2", ("b", "iii"), 2, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("broadcasting", "code_tracing"), page_number=6),
ChildSpec("2c", "2", "2", ("c",), 6, "long_question", "coding", "Python Fundamentals", "Python Fundamentals", ("Python Fundamentals",), ("implementation", "vectorization", "geometry_reasoning"), page_number=7),
short_answer("3", "3", "3", (), 8, analytics_topic="Probabilistic Models", topic_primary="Probabilistic Models", topic_tags=("Probabilistic Models",), skill_tags=("concept_explanation", "missing_data_reasoning"), page_number=9),
ChildSpec("4a", "4", "4", ("a",), 8, "long_question", "long_answer", "KNN and Clustering", "KNN and Clustering", ("KNN and Clustering",), ("distance_calculation", "classification_decision"), page_number=10),
short_answer("4b", "4", "4", ("b",), 6, analytics_topic="KNN and Clustering", topic_primary="KNN and Clustering", topic_tags=("KNN and Clustering",), skill_tags=("distance_reasoning", "comparison"), page_number=11),
ChildSpec("5a", "5", "5", ("a",), 7, "long_question", "long_answer", "KNN and Clustering", "KNN and Clustering", ("KNN and Clustering",), ("distance_calculation", "algorithm_tracing"), page_number=12),
ChildSpec("5b", "5", "5", ("b",), 7, "long_question", "long_answer", "KNN and Clustering", "KNN and Clustering", ("KNN and Clustering",), ("centroid_update", "algorithm_tracing"), page_number=12),
short_answer("5c", "5", "5", ("c",), 5, analytics_topic="KNN and Clustering", topic_primary="KNN and Clustering", topic_tags=("KNN and Clustering",), skill_tags=("concept_explanation", "model_selection"), page_number=14),
short_answer("6a", "6", "6", ("a",), 2, analytics_topic="Perceptron and MLP", topic_primary="Perceptron and MLP", topic_tags=("Perceptron and MLP",), skill_tags=("convergence_reasoning",), page_number=15),
mc("6b", "6", "6", ("b",), 2, options=ABCDE, correct_option="D", analytics_topic="Perceptron and MLP", skill_tags=("generalization_reasoning",), page_number=15),
short_answer("6c", "6", "6", ("c",), 2, analytics_topic="Perceptron and MLP", topic_primary="Perceptron and MLP", topic_tags=("Perceptron and MLP",), skill_tags=("activation_reasoning",), page_number=16),
ChildSpec("6d", "6", "6", ("d",), 6, "long_question", "coding", "Perceptron and MLP", "Perceptron and MLP", ("Perceptron and MLP",), ("debugging", "implementation", "weight_update"), page_number=16),
short_answer("7a", "7", "7", ("a",), 4, analytics_topic="Perceptron and MLP", topic_primary="Perceptron and MLP", topic_tags=("Perceptron and MLP",), skill_tags=("decision_boundary", "linearity_reasoning"), page_number=18),
short_answer("7b", "7", "7", ("b",), 2, analytics_topic="Perceptron and MLP", topic_primary="Perceptron and MLP", topic_tags=("Perceptron and MLP",), skill_tags=("decision_boundary", "linearity_reasoning"), page_number=18),
ChildSpec("7c", "7", "7", ("c",), 10, "long_question", "long_answer", "Perceptron and MLP", "Perceptron and MLP", ("Perceptron and MLP",), ("architecture_reasoning", "parameter_design"), page_number=19),
]
MARKER_RE = re.compile(r"(?m)^\(([a-z]+|[ivx]+)\)\s*")
def split_sections(text: str) -> tuple[str, dict[str, str]]:
matches = list(MARKER_RE.finditer(text))
if not matches:
return text.strip(), {}
intro = text[: matches[0].start()].strip()
sections: dict[str, str] = {}
for idx, match in enumerate(matches):
marker = match.group(1)
end = matches[idx + 1].start() if idx + 1 < len(matches) else len(text)
sections[marker] = text[match.start() : end].strip()
return intro, sections
def extract_segment(text: str, path: tuple[str, ...]) -> str:
current = text.strip()
carried_intro: list[str] = []
for depth, marker in enumerate(path):
intro, sections = split_sections(current)
if depth == 0 and intro:
carried_intro.append(intro)
current = sections.get(marker, current)
return "\n".join(part for part in [*carried_intro, current] if part).strip()
def extract_true_false_answers(answer_text: str) -> dict[str, str]:
answers: dict[str, str] = {}
matches = list(re.finditer(r"(?m)^\(([a-j])\)\s*\n?T\s*F", answer_text))
if matches:
return answers
for match in re.finditer(r"(?m)^\(([a-j])\)\s*\n?([TF])\b", answer_text):
answers[match.group(1)] = match.group(2)
if answers:
return answers
lines = [line.strip() for line in answer_text.splitlines() if line.strip()]
current = None
for line in lines:
m = re.fullmatch(r"\(([a-j])\)", line)
if m:
current = m.group(1)
continue
if current and line in {"T", "F"}:
answers[current] = line
current = None
return answers
def load_seed_rows() -> dict[str, dict]:
data = json.loads(PROBLEM_SEED_PATH.read_text())
return {row["question_number"]: row for row in data if row["source_exam_key"] == EXAM_KEY}
def main() -> None:
sb = get_supabase()
paper = sb.table("papers").select("id").eq("source_exam_key", EXAM_KEY).execute().data[0]
paper_id = paper["id"]
current_rows = (
sb.table("paper_questions")
.select("*")
.eq("paper_id", paper_id)
.order("display_order")
.execute()
.data
)
existing_by_number = {row["question_number"]: row for row in current_rows}
parent_rows = load_seed_rows()
tf_answers = extract_true_false_answers(parent_rows["1"]["raw_answer_text"] or "")
inserts = []
for display_order, child in enumerate(CHILDREN, start=1):
parent = parent_rows[child.top_level_number]
existing = existing_by_number.get(child.question_number, {})
question_text = extract_segment(parent["question_text"] or "", child.path)
raw_answer_text = extract_segment(parent["raw_answer_text"] or "", child.path) if child.path else (parent["raw_answer_text"] or "")
options = None
correct_option = child.correct_option
if child.options:
options = [{"label": label, "text": text} for label, text in child.options]
if child.question_type == "true_false":
options = TRUE_FALSE_OPTIONS
correct_option = tf_answers.get(child.path[0])
inserts.append(
{
"paper_id": paper_id,
"question_number": child.question_number,
"parent_question": child.parent_question,
"display_order": display_order,
"question_type": child.question_type,
"question_format": child.question_format,
"question_text": question_text,
"score": child.score,
"page_number": child.page_number,
"page_y_ratio": existing.get("page_y_ratio"),
"options": options,
"correct_option": correct_option,
"correct_answer": child.correct_answer,
"raw_answer_text": raw_answer_text,
"topics": existing.get("topics") or (list(child.topic_tags) if child.topic_tags else parent.get("topics")),
"topic_primary": existing.get("topic_primary") or child.topic_primary or parent.get("topic_primary"),
"analytics_topic": existing.get("analytics_topic") or child.analytics_topic or parent.get("analytics_topic"),
"topic_tags": existing.get("topic_tags") or (list(child.topic_tags) if child.topic_tags else parent.get("topic_tags")),
"skill_tags": existing.get("skill_tags") or (list(child.skill_tags) if child.skill_tags else parent.get("skill_tags")),
"difficulty": existing.get("difficulty") or parent.get("difficulty"),
"knowledge_reminder": existing.get("knowledge_reminder", ""),
"ai_hint": existing.get("ai_hint", ""),
"solution": existing.get("solution", ""),
}
)
sb.table("paper_questions").delete().eq("paper_id", paper_id).execute()
sb.table("paper_questions").insert(inserts).execute()
sb.table("papers").update({"question_count": len(inserts), "status": "processing"}).eq("id", paper_id).execute()
print(f"Inserted {len(inserts)} rows for {EXAM_KEY}.")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,242 @@
"""Split COMP2211 Spring 2024 final into subquestions."""
from __future__ import annotations
import json
import re
from dataclasses import dataclass
from pathlib import Path
from app.services.supabase_client import get_supabase
EXAM_KEY = "COMP2211-2024-spring-final"
PROBLEM_SEED_PATH = (
Path(__file__).resolve().parent.parent
/ "pastpaper-scraper"
/ "reviews"
/ "COMP2211"
/ "problem_seed.json"
)
TRUE_FALSE_OPTIONS = [{"label": "True", "text": "True"}, {"label": "False", "text": "False"}]
@dataclass(frozen=True)
class ChildSpec:
question_number: str
parent_question: str
top_level_number: str
path: tuple[str, ...]
score: float
question_type: str
question_format: str | None = None
analytics_topic: str | None = None
topic_primary: str | None = None
topic_tags: tuple[str, ...] | None = None
skill_tags: tuple[str, ...] | None = None
options: tuple[tuple[str, str], ...] | None = None
correct_option: str | None = None
correct_answer: str | None = None
page_number: int = 1
def short_answer(
question_number: str,
parent_question: str,
top_level_number: str,
path: tuple[str, ...],
score: float,
*,
analytics_topic: str | None = None,
topic_primary: str | None = None,
topic_tags: tuple[str, ...] | None = None,
skill_tags: tuple[str, ...] | None = None,
correct_answer: str | None = None,
page_number: int,
) -> ChildSpec:
return ChildSpec(
question_number=question_number,
parent_question=parent_question,
top_level_number=top_level_number,
path=path,
score=score,
question_type="long_question",
question_format="short_answer",
analytics_topic=analytics_topic,
topic_primary=topic_primary,
topic_tags=topic_tags,
skill_tags=skill_tags,
correct_answer=correct_answer,
page_number=page_number,
)
CHILDREN: list[ChildSpec] = [
ChildSpec("1a", "1", "1", ("a",), 1, "true_false", "true_false", "Python Fundamentals", "Python Fundamentals", ("Python Fundamentals",), ("concept_check", "code_tracing"), page_number=2),
ChildSpec("1b", "1", "1", ("b",), 1, "true_false", "true_false", "Probabilistic Models", "Probabilistic Models", ("Probabilistic Models",), ("concept_check", "classification_decision"), page_number=2),
ChildSpec("1c", "1", "1", ("c",), 1, "true_false", "true_false", "KNN and Clustering", "KNN and Clustering", ("KNN and Clustering",), ("concept_check", "algorithm_property"), page_number=2),
ChildSpec("1d", "1", "1", ("d",), 1, "true_false", "true_false", "KNN and Clustering", "KNN and Clustering", ("KNN and Clustering",), ("concept_check", "algorithm_property"), page_number=2),
ChildSpec("1e", "1", "1", ("e",), 1, "true_false", "true_false", "Perceptron and MLP", "Perceptron and MLP", ("Perceptron and MLP",), ("concept_check", "activation_reasoning"), page_number=2),
ChildSpec("1f", "1", "1", ("f",), 1, "true_false", "true_false", "Vision and CNN", "Vision and CNN", ("Vision and CNN",), ("concept_check", "image_processing"), page_number=2),
ChildSpec("1g", "1", "1", ("g",), 1, "true_false", "true_false", "Vision and CNN", "Vision and CNN", ("Vision and CNN",), ("concept_check", "cnn_complexity"), page_number=2),
ChildSpec("1h", "1", "1", ("h",), 1, "true_false", "true_false", "Vision and CNN", "Vision and CNN", ("Vision and CNN",), ("concept_check", "regularization"), page_number=2),
ChildSpec("1i", "1", "1", ("i",), 1, "true_false", "true_false", "Search and Games", "Search and Games", ("Search and Games",), ("concept_check", "pruning_reasoning"), page_number=2),
ChildSpec("1j", "1", "1", ("j",), 1, "true_false", "true_false", "Ethics of AI", "Ethics of AI", ("Ethics of AI",), ("concept_check", "research_ethics"), page_number=2),
ChildSpec("2a", "2", "2", ("a",), 4, "long_question", "coding", "Python Fundamentals", "Python Fundamentals", ("Python Fundamentals",), ("implementation", "vectorization", "masking"), page_number=3),
ChildSpec("2b", "2", "2", ("b",), 6, "long_question", "coding", "Python Fundamentals", "Python Fundamentals", ("Python Fundamentals",), ("implementation", "convolution", "array_manipulation"), page_number=4),
short_answer("3a_i", "3a", "3", ("a", "i"), 1.5, analytics_topic="Probabilistic Models", topic_primary="Probabilistic Models", topic_tags=("Probabilistic Models",), skill_tags=("manual_computation", "probability_reasoning"), page_number=6),
short_answer("3a_ii", "3a", "3", ("a", "ii"), 1.5, analytics_topic="Probabilistic Models", topic_primary="Probabilistic Models", topic_tags=("Probabilistic Models",), skill_tags=("manual_computation", "probability_reasoning"), page_number=6),
short_answer("3a_iii", "3a", "3", ("a", "iii"), 1.5, analytics_topic="Probabilistic Models", topic_primary="Probabilistic Models", topic_tags=("Probabilistic Models",), skill_tags=("manual_computation", "probability_reasoning"), page_number=6),
short_answer("3a_iv", "3a", "3", ("a", "iv"), 1.5, analytics_topic="Probabilistic Models", topic_primary="Probabilistic Models", topic_tags=("Probabilistic Models",), skill_tags=("manual_computation", "probability_reasoning"), page_number=6),
short_answer("3b_i", "3b", "3", ("b", "i"), 1.5, analytics_topic="Evaluation and Validation", topic_primary="Evaluation and Validation", topic_tags=("Evaluation and Validation",), skill_tags=("validation_reasoning",), page_number=6),
short_answer("3b_ii", "3b", "3", ("b", "ii"), 1.5, analytics_topic="Evaluation and Validation", topic_primary="Evaluation and Validation", topic_tags=("Evaluation and Validation",), skill_tags=("validation_reasoning",), page_number=6),
short_answer("3b_iii", "3b", "3", ("b", "iii"), 1.5, analytics_topic="Evaluation and Validation", topic_primary="Evaluation and Validation", topic_tags=("Evaluation and Validation",), skill_tags=("validation_reasoning",), page_number=6),
short_answer("3c", "3", "3", ("c",), 1.5, analytics_topic="Perceptron and MLP", topic_primary="Perceptron and MLP", topic_tags=("Perceptron and MLP",), skill_tags=("linearity_reasoning", "classification_decision"), page_number=6),
short_answer("4a_i", "4a", "4", ("a", "i"), 2.5, analytics_topic="Perceptron and MLP", topic_primary="Perceptron and MLP", topic_tags=("Perceptron and MLP",), skill_tags=("parameter_counting",), page_number=7),
short_answer("4a_ii", "4a", "4", ("a", "ii"), 2.5, analytics_topic="Perceptron and MLP", topic_primary="Perceptron and MLP", topic_tags=("Perceptron and MLP",), skill_tags=("model_selection",), page_number=7),
short_answer("4b", "4", "4", ("b",), 1, analytics_topic="Perceptron and MLP", topic_primary="Perceptron and MLP", topic_tags=("Perceptron and MLP",), skill_tags=("concept_explanation",), page_number=7),
short_answer("4c", "4", "4", ("c",), 2, analytics_topic="Perceptron and MLP", topic_primary="Perceptron and MLP", topic_tags=("Perceptron and MLP",), skill_tags=("activation_reasoning", "optimization_reasoning"), page_number=7),
ChildSpec("4d_i", "4d", "4", ("d", "i"), 1.5, "long_question", "long_answer", "Perceptron and MLP", "Perceptron and MLP", ("Perceptron and MLP",), ("forward_pass", "activation_reasoning"), page_number=8),
ChildSpec("4d_ii", "4d", "4", ("d", "ii"), 1.5, "long_question", "long_answer", "Perceptron and MLP", "Perceptron and MLP", ("Perceptron and MLP",), ("backpropagation", "weight_update"), page_number=8),
ChildSpec("5a", "5", "5", ("a",), 4.5, "long_question", "long_answer", "Vision and CNN", "Vision and CNN", ("Vision and CNN",), ("histogram_reasoning", "image_transform"), page_number=9),
ChildSpec("5b", "5", "5", ("b",), 3, "long_question", "long_answer", "Vision and CNN", "Vision and CNN", ("Vision and CNN",), ("thresholding", "manual_computation"), page_number=10),
ChildSpec("5c", "5", "5", ("c",), 2, "long_question", "long_answer", "Vision and CNN", "Vision and CNN", ("Vision and CNN",), ("padding", "manual_construction"), page_number=10),
short_answer("5d_i", "5d", "5", ("d", "i"), 0.5, analytics_topic="Vision and CNN", topic_primary="Vision and CNN", topic_tags=("Vision and CNN",), skill_tags=("filter_effect_reasoning",), page_number=11),
short_answer("5d_ii", "5d", "5", ("d", "ii"), 0.5, analytics_topic="Vision and CNN", topic_primary="Vision and CNN", topic_tags=("Vision and CNN",), skill_tags=("filter_effect_reasoning",), page_number=11),
short_answer("5d_iii", "5d", "5", ("d", "iii"), 0.5, analytics_topic="Vision and CNN", topic_primary="Vision and CNN", topic_tags=("Vision and CNN",), skill_tags=("filter_effect_reasoning",), page_number=11),
short_answer("5e", "5", "5", ("e",), 2, analytics_topic="Vision and CNN", topic_primary="Vision and CNN", topic_tags=("Vision and CNN",), skill_tags=("concept_explanation", "local_vs_global"), page_number=11),
ChildSpec("6a", "6", "6", ("a",), 10, "long_question", "coding", "Vision and CNN", "Vision and CNN", ("Vision and CNN",), ("implementation", "convolution", "debugging"), page_number=12),
ChildSpec("6b", "6", "6", ("b",), 3, "long_question", "coding", "Vision and CNN", "Vision and CNN", ("Vision and CNN",), ("implementation", "regularization"), page_number=15),
short_answer("7a_i", "7a", "7", ("a", "i"), 1, analytics_topic="Vision and CNN", topic_primary="Vision and CNN", topic_tags=("Vision and CNN",), skill_tags=("cnn_architecture",), page_number=16),
short_answer("7a_ii", "7a", "7", ("a", "ii"), 4, analytics_topic="Vision and CNN", topic_primary="Vision and CNN", topic_tags=("Vision and CNN",), skill_tags=("shape_reasoning", "parameter_counting"), page_number=16),
short_answer("7a_iii", "7a", "7", ("a", "iii"), 3, analytics_topic="Vision and CNN", topic_primary="Vision and CNN", topic_tags=("Vision and CNN",), skill_tags=("overfitting", "regularization"), page_number=16),
ChildSpec("7b", "7", "7", ("b",), 5, "long_question", "long_answer", "Vision and CNN", "Vision and CNN", ("Vision and CNN",), ("manual_computation", "cnn_forward_pass"), page_number=17),
short_answer("7c_i", "7c", "7", ("c", "i"), 2, analytics_topic="Vision and CNN", topic_primary="Vision and CNN", topic_tags=("Vision and CNN",), skill_tags=("shape_reasoning", "3d_convolution"), page_number=17),
short_answer("7c_ii", "7c", "7", ("c", "ii"), 1.5, analytics_topic="Vision and CNN", topic_primary="Vision and CNN", topic_tags=("Vision and CNN",), skill_tags=("parameter_counting", "3d_convolution"), page_number=17),
short_answer("7c_iii", "7c", "7", ("c", "iii"), 1.5, analytics_topic="Vision and CNN", topic_primary="Vision and CNN", topic_tags=("Vision and CNN",), skill_tags=("parameter_counting", "3d_convolution"), page_number=17),
short_answer("8a_i", "8a", "8", ("a", "i"), 1, analytics_topic="Search and Games", topic_primary="Search and Games", topic_tags=("Search and Games",), skill_tags=("tree_search", "manual_tracing"), page_number=18),
short_answer("8a_ii", "8a", "8", ("a", "ii"), 3, analytics_topic="Search and Games", topic_primary="Search and Games", topic_tags=("Search and Games",), skill_tags=("pruning", "manual_tracing"), page_number=18),
short_answer("8a_iii", "8a", "8", ("a", "iii"), 1, analytics_topic="Search and Games", topic_primary="Search and Games", topic_tags=("Search and Games",), skill_tags=("game_reasoning",), page_number=18),
short_answer("8b_i", "8b", "8", ("b", "i"), 2.5, analytics_topic="Search and Games", topic_primary="Search and Games", topic_tags=("Search and Games",), skill_tags=("utility_reasoning",), page_number=18),
short_answer("8b_ii", "8b", "8", ("b", "ii"), 2.5, analytics_topic="Search and Games", topic_primary="Search and Games", topic_tags=("Search and Games",), skill_tags=("pruning_reasoning", "concept_explanation"), page_number=18),
short_answer("9", "9", "9", (), 3, analytics_topic="Ethics of AI", topic_primary="Ethics of AI", topic_tags=("Ethics of AI",), skill_tags=("concept_explanation", "governance"), page_number=19),
]
MARKER_RE = re.compile(r"(?m)^\(([a-z]+|[ivx]+)\)\s*")
def split_sections(text: str) -> tuple[str, dict[str, str]]:
matches = list(MARKER_RE.finditer(text))
if not matches:
return text.strip(), {}
intro = text[: matches[0].start()].strip()
sections: dict[str, str] = {}
for idx, match in enumerate(matches):
marker = match.group(1)
end = matches[idx + 1].start() if idx + 1 < len(matches) else len(text)
sections[marker] = text[match.start() : end].strip()
return intro, sections
def extract_segment(text: str, path: tuple[str, ...]) -> str:
if not path:
return text.strip()
current = text.strip()
carried_intro: list[str] = []
for depth, marker in enumerate(path):
intro, sections = split_sections(current)
if depth == 0 and intro:
carried_intro.append(intro)
current = sections.get(marker, current)
return "\n".join(part for part in [*carried_intro, current] if part).strip()
def extract_true_false_answers(answer_text: str) -> dict[str, str]:
answers: dict[str, str] = {}
table_match = re.search(r"Answer\s+(T\s+F\s+T\s+F\s+F\s+T\s+F\s+F\s+F\s+T)", answer_text, re.S)
if table_match:
seq = re.findall(r"[TF]", table_match.group(1))
if len(seq) == 10:
for idx, val in enumerate(seq):
answers[chr(ord("a") + idx)] = val
return answers
seq = re.findall(r"\b([TF])\b", answer_text)
if len(seq) >= 10:
for idx, val in enumerate(seq[:10]):
answers[chr(ord("a") + idx)] = val
return answers
def load_seed_rows() -> dict[str, dict]:
data = json.loads(PROBLEM_SEED_PATH.read_text())
return {row["question_number"]: row for row in data if row["source_exam_key"] == EXAM_KEY}
def main() -> None:
sb = get_supabase()
paper = sb.table("papers").select("id").eq("source_exam_key", EXAM_KEY).execute().data[0]
paper_id = paper["id"]
current_rows = (
sb.table("paper_questions")
.select("*")
.eq("paper_id", paper_id)
.order("display_order")
.execute()
.data
)
existing_by_number = {row["question_number"]: row for row in current_rows}
parent_rows = load_seed_rows()
tf_answers = extract_true_false_answers(parent_rows["1"]["raw_answer_text"] or "")
inserts = []
for display_order, child in enumerate(CHILDREN, start=1):
parent = parent_rows[child.top_level_number]
existing = existing_by_number.get(child.question_number, {})
question_text = extract_segment(parent["question_text"] or "", child.path)
raw_answer_text = extract_segment(parent["raw_answer_text"] or "", child.path) if child.path else (parent["raw_answer_text"] or "")
options = None
correct_option = child.correct_option
if child.question_type == "true_false":
options = TRUE_FALSE_OPTIONS
correct_option = tf_answers.get(child.path[0])
elif child.options:
options = [{"label": label, "text": text} for label, text in child.options]
inserts.append(
{
"paper_id": paper_id,
"question_number": child.question_number,
"parent_question": child.parent_question,
"display_order": display_order,
"question_type": child.question_type,
"question_format": child.question_format,
"question_text": question_text,
"score": child.score,
"page_number": child.page_number,
"page_y_ratio": existing.get("page_y_ratio"),
"options": options,
"correct_option": correct_option,
"correct_answer": child.correct_answer,
"raw_answer_text": raw_answer_text,
"topics": existing.get("topics") or (list(child.topic_tags) if child.topic_tags else parent.get("topics")),
"topic_primary": existing.get("topic_primary") or child.topic_primary or parent.get("topic_primary"),
"analytics_topic": existing.get("analytics_topic") or child.analytics_topic or parent.get("analytics_topic"),
"topic_tags": existing.get("topic_tags") or (list(child.topic_tags) if child.topic_tags else parent.get("topic_tags")),
"skill_tags": existing.get("skill_tags") or (list(child.skill_tags) if child.skill_tags else parent.get("skill_tags")),
"difficulty": existing.get("difficulty") or parent.get("difficulty"),
"knowledge_reminder": existing.get("knowledge_reminder", ""),
"ai_hint": existing.get("ai_hint", ""),
"solution": existing.get("solution", ""),
}
)
sb.table("paper_questions").delete().eq("paper_id", paper_id).execute()
sb.table("paper_questions").insert(inserts).execute()
sb.table("papers").update({"question_count": len(inserts), "status": "processing"}).eq("id", paper_id).execute()
print(f"Inserted {len(inserts)} rows for {EXAM_KEY}.")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,291 @@
"""Rebuild COMP2211 Spring 2024 midterm into subquestions."""
from __future__ import annotations
import json
import re
from dataclasses import dataclass
from pathlib import Path
import fitz
from app.services.supabase_client import get_supabase
EXAM_KEY = "COMP2211-2024-spring-midterm"
ROOT = Path(__file__).resolve().parent.parent
QUESTION_PDF = ROOT / "pastpaper-scraper" / "papers" / "COMP2211" / "(COMP2211)[2024](s)midterm~=rcidkjgf^_82003.pdf"
ANSWER_PDF = ROOT / "pastpaper-scraper" / "papers" / "COMP2211" / "(COMP2211)[2024](s)midterm~=ubrzkjmz^_90406.pdf"
PROBLEM_SEED_PATH = ROOT / "pastpaper-scraper" / "reviews" / "COMP2211" / "problem_seed.json"
TRUE_FALSE_OPTIONS = [{"label": "True", "text": "True"}, {"label": "False", "text": "False"}]
@dataclass(frozen=True)
class ChildSpec:
question_number: str
parent_question: str
top_level_number: str
path: tuple[str, ...]
score: float
question_type: str
question_format: str | None = None
analytics_topic: str | None = None
topic_primary: str | None = None
topic_tags: tuple[str, ...] | None = None
skill_tags: tuple[str, ...] | None = None
page_number: int = 1
def short_answer(
question_number: str,
parent_question: str,
top_level_number: str,
path: tuple[str, ...],
score: float,
*,
analytics_topic: str | None = None,
topic_primary: str | None = None,
topic_tags: tuple[str, ...] | None = None,
skill_tags: tuple[str, ...] | None = None,
page_number: int,
) -> ChildSpec:
return ChildSpec(
question_number=question_number,
parent_question=parent_question,
top_level_number=top_level_number,
path=path,
score=score,
question_type="long_question",
question_format="short_answer",
analytics_topic=analytics_topic,
topic_primary=topic_primary,
topic_tags=topic_tags,
skill_tags=skill_tags,
page_number=page_number,
)
CHILDREN: list[ChildSpec] = [
ChildSpec("1a", "1", "1", ("a",), 0.5, "true_false", "true_false", "Python Fundamentals", "Python Fundamentals", ("Python Fundamentals",), ("concept_check", "code_tracing"), page_number=3),
ChildSpec("1b", "1", "1", ("b",), 0.5, "true_false", "true_false", "Python Fundamentals", "Python Fundamentals", ("Python Fundamentals",), ("concept_check", "broadcasting"), page_number=3),
ChildSpec("1c", "1", "1", ("c",), 0.5, "true_false", "true_false", "KNN and Clustering", "KNN and Clustering", ("KNN and Clustering",), ("concept_check", "algorithm_property"), page_number=3),
ChildSpec("1d", "1", "1", ("d",), 0.5, "true_false", "true_false", "KNN and Clustering", "KNN and Clustering", ("KNN and Clustering",), ("concept_check", "tie_reasoning"), page_number=3),
ChildSpec("1e", "1", "1", ("e",), 0.5, "true_false", "true_false", "Evaluation and Validation", "Evaluation and Validation", ("Evaluation and Validation",), ("concept_check", "cross_validation"), page_number=3),
ChildSpec("1f", "1", "1", ("f",), 0.5, "true_false", "true_false", "KNN and Clustering", "KNN and Clustering", ("KNN and Clustering",), ("concept_check", "clustering_property"), page_number=3),
ChildSpec("1g", "1", "1", ("g",), 0.5, "true_false", "true_false", "KNN and Clustering", "KNN and Clustering", ("KNN and Clustering",), ("concept_check", "robustness_reasoning"), page_number=3),
ChildSpec("1h", "1", "1", ("h",), 0.5, "true_false", "true_false", "Perceptron and MLP", "Perceptron and MLP", ("Perceptron and MLP",), ("concept_check", "decision_boundary"), page_number=3),
ChildSpec("1i", "1", "1", ("i",), 0.5, "true_false", "true_false", "Perceptron and MLP", "Perceptron and MLP", ("Perceptron and MLP",), ("concept_check", "optimization_reasoning"), page_number=3),
ChildSpec("1j", "1", "1", ("j",), 0.5, "true_false", "true_false", "KNN and Clustering", "KNN and Clustering", ("KNN and Clustering",), ("concept_check", "clustering_property"), page_number=3),
short_answer("2a_i", "2a", "2", ("a", "i"), 1, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("code_tracing",), page_number=4),
short_answer("2a_ii", "2a", "2", ("a", "ii"), 1, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("code_tracing",), page_number=4),
short_answer("2a_iii", "2a", "2", ("a", "iii"), 1, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("array_manipulation",), page_number=5),
short_answer("2a_iv", "2a", "2", ("a", "iv"), 1, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("array_construction",), page_number=5),
short_answer("2a_v", "2a", "2", ("a", "v"), 1, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("aggregation",), page_number=5),
short_answer("2a_vi", "2a", "2", ("a", "vi"), 1, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("transpose",), page_number=6),
short_answer("2a_vii", "2a", "2", ("a", "vii"), 1, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("matrix_multiplication",), page_number=6),
short_answer("2a_viii", "2a", "2", ("a", "viii"), 1, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("dot_product",), page_number=6),
short_answer("2a_ix", "2a", "2", ("a", "ix"), 1, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("broadcasting",), page_number=6),
short_answer("2a_x", "2a", "2", ("a", "x"), 1, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("error_reasoning",), page_number=7),
short_answer("2a_xi", "2a", "2", ("a", "xi"), 1, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("broadcasting",), page_number=7),
short_answer("2a_xii", "2a", "2", ("a", "xii"), 1, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("slicing",), page_number=7),
short_answer("2a_xiii", "2a", "2", ("a", "xiii"), 1, analytics_topic="Python Fundamentals", topic_primary="Python Fundamentals", topic_tags=("Python Fundamentals",), skill_tags=("views_vs_copies",), page_number=7),
ChildSpec("2b", "2", "2", ("b",), 6, "long_question", "coding", "Python Fundamentals", "Python Fundamentals", ("Python Fundamentals",), ("implementation", "vectorization", "similarity_computation"), page_number=8),
ChildSpec("3a", "3", "3", ("a",), 5.5, "long_question", "long_answer", "Evaluation and Validation", "Evaluation and Validation", ("Evaluation and Validation",), ("manual_computation", "metric_reasoning"), page_number=10),
short_answer("3b", "3", "3", ("b",), 1, analytics_topic="Evaluation and Validation", topic_primary="Evaluation and Validation", topic_tags=("Evaluation and Validation",), skill_tags=("metric_reasoning",), page_number=11),
ChildSpec("3c", "3", "3", ("c",), 2.5, "long_question", "long_answer", "Evaluation and Validation", "Evaluation and Validation", ("Evaluation and Validation",), ("manual_computation", "metric_reasoning"), page_number=11),
short_answer("3d", "3", "3", ("d",), 1, analytics_topic="Evaluation and Validation", topic_primary="Evaluation and Validation", topic_tags=("Evaluation and Validation",), skill_tags=("metric_reasoning",), page_number=12),
ChildSpec("3e", "3", "3", ("e",), 6, "long_question", "coding", "Evaluation and Validation", "Evaluation and Validation", ("Evaluation and Validation",), ("implementation", "metrics", "vectorization"), page_number=12),
ChildSpec("4a", "4", "4", ("a",), 4, "long_question", "long_answer", "Probabilistic Models", "Probabilistic Models", ("Probabilistic Models",), ("manual_computation", "gaussian_nb"), page_number=15),
ChildSpec("4b", "4", "4", ("b",), 3, "long_question", "long_answer", "Probabilistic Models", "Probabilistic Models", ("Probabilistic Models",), ("manual_computation", "likelihood_reasoning"), page_number=15),
ChildSpec("4c", "4", "4", ("c",), 4, "long_question", "long_answer", "Probabilistic Models", "Probabilistic Models", ("Probabilistic Models",), ("laplace_smoothing", "likelihood_reasoning"), page_number=16),
short_answer("4d", "4", "4", ("d",), 2, analytics_topic="Probabilistic Models", topic_primary="Probabilistic Models", topic_tags=("Probabilistic Models",), skill_tags=("prior_reasoning",), page_number=17),
ChildSpec("4e", "4", "4", ("e",), 3, "long_question", "long_answer", "Probabilistic Models", "Probabilistic Models", ("Probabilistic Models",), ("posterior_reasoning", "classification_decision"), page_number=17),
ChildSpec("5a", "5", "5", ("a",), 3, "long_question", "long_answer", "KNN and Clustering", "KNN and Clustering", ("KNN and Clustering",), ("distance_calculation", "weighted_knn"), page_number=18),
ChildSpec("5b", "5", "5", ("b",), 13, "long_question", "long_answer", "KNN and Clustering", "KNN and Clustering", ("KNN and Clustering",), ("cross_validation", "manual_tracing", "model_selection"), page_number=18),
short_answer("5c", "5", "5", ("c",), 2, analytics_topic="KNN and Clustering", topic_primary="KNN and Clustering", topic_tags=("KNN and Clustering",), skill_tags=("test_error", "model_selection"), page_number=20),
ChildSpec("6a", "6", "6", ("a",), 6, "long_question", "long_answer", "KNN and Clustering", "KNN and Clustering", ("KNN and Clustering",), ("manual_computation", "clustering"), page_number=21),
ChildSpec("6b", "6", "6", ("b",), 6, "long_question", "long_answer", "KNN and Clustering", "KNN and Clustering", ("KNN and Clustering",), ("manual_computation", "clustering"), page_number=22),
short_answer("6c", "6", "6", ("c",), 2, analytics_topic="KNN and Clustering", topic_primary="KNN and Clustering", topic_tags=("KNN and Clustering",), skill_tags=("outlier_reasoning",), page_number=22),
short_answer("6d", "6", "6", ("d",), 2, analytics_topic="KNN and Clustering", topic_primary="KNN and Clustering", topic_tags=("KNN and Clustering",), skill_tags=("model_selection", "threshold_reasoning"), page_number=22),
ChildSpec("7", "7", "7", (), 10, "long_question", "long_answer", "Evaluation and Validation", "Evaluation and Validation", ("Evaluation and Validation",), ("cross_validation", "data_leakage_reasoning"), page_number=23),
]
MARKER_RE = re.compile(r"(?m)^\(([a-z]+|[ivx]+)\)\s*")
def split_sections(text: str) -> tuple[str, dict[str, str]]:
matches = list(MARKER_RE.finditer(text))
if not matches:
return text.strip(), {}
intro = text[: matches[0].start()].strip()
sections: dict[str, str] = {}
for idx, match in enumerate(matches):
marker = match.group(1)
end = matches[idx + 1].start() if idx + 1 < len(matches) else len(text)
sections[marker] = text[match.start() : end].strip()
return intro, sections
def extract_segment(text: str, path: tuple[str, ...]) -> str:
if not path:
return text.strip()
current = text.strip()
carried_intro: list[str] = []
for depth, marker in enumerate(path):
intro, sections = split_sections(current)
if depth == 0 and intro:
carried_intro.append(intro)
current = sections.get(marker, current)
return "\n".join(part for part in [*carried_intro, current] if part).strip()
def extract_pages(pdf_path: Path, start: int, end: int) -> str:
doc = fitz.open(pdf_path)
try:
return "\n".join(doc[i].get_text("text") for i in range(start - 1, end))
finally:
doc.close()
def load_seed_rows() -> dict[str, dict]:
data = json.loads(PROBLEM_SEED_PATH.read_text())
return {row["question_number"]: row for row in data if row["source_exam_key"] == EXAM_KEY}
def build_source_rows(existing_rows: dict[str, dict]) -> dict[str, dict]:
seed_rows = load_seed_rows()
rows = dict(seed_rows)
if "5" in rows:
rows["5"] = {
**rows["5"],
"question_text": extract_pages(QUESTION_PDF, 18, 20),
"raw_answer_text": extract_pages(ANSWER_PDF, 21, 25),
"page_number": 18,
"analytics_topic": "KNN and Clustering",
"topic_primary": "KNN and Clustering",
"topic_tags": ["KNN and Clustering"],
"skill_tags": ["manual_computation", "distance_calculation", "algorithm_tracing"],
"difficulty": "medium",
}
else:
rows["5"] = {
**seed_rows["5"],
"question_text": extract_pages(QUESTION_PDF, 18, 20),
"raw_answer_text": extract_pages(ANSWER_PDF, 21, 25),
"page_number": 18,
}
if "7" in rows:
rows["7"] = {
**rows["7"],
"question_text": extract_pages(QUESTION_PDF, 23, 24),
"raw_answer_text": extract_pages(ANSWER_PDF, 31, 34),
"page_number": 23,
"analytics_topic": "Evaluation and Validation",
"topic_primary": "Evaluation and Validation",
"topic_tags": ["Evaluation and Validation"],
"skill_tags": ["cross_validation", "data_leakage_reasoning"],
"difficulty": "medium",
}
else:
rows["7"] = {
**seed_rows["7"],
"question_text": extract_pages(QUESTION_PDF, 23, 24),
"raw_answer_text": extract_pages(ANSWER_PDF, 31, 34),
"page_number": 23,
}
return rows
def extract_true_false_answers(answer_text: str) -> dict[str, str]:
answers: dict[str, str] = {}
table_match = re.search(r"Answer\s+([TF\s]+)", answer_text, re.S)
if table_match:
seq = re.findall(r"[TF]", table_match.group(1))
if len(seq) >= 10:
for idx, val in enumerate(seq[:10]):
answers[chr(ord("a") + idx)] = val
return answers
lines = [line.strip() for line in answer_text.splitlines() if line.strip()]
current_letter: str | None = None
for line in lines:
m = re.fullmatch(r"\(([a-j])\)", line)
if m:
current_letter = m.group(1)
continue
if current_letter and line in {"T", "F"}:
answers[current_letter] = line
current_letter = None
if answers:
return answers
seq = re.findall(r"\b([TF])\b", answer_text)
if len(seq) >= 10:
for idx, val in enumerate(seq[:10]):
answers[chr(ord("a") + idx)] = val
return answers
def main() -> None:
sb = get_supabase()
paper = sb.table("papers").select("id").eq("source_exam_key", EXAM_KEY).execute().data[0]
paper_id = paper["id"]
current_rows = (
sb.table("paper_questions")
.select("*")
.eq("paper_id", paper_id)
.order("display_order")
.execute()
.data
)
existing_by_number = {row["question_number"]: row for row in current_rows}
parent_rows = build_source_rows(existing_by_number)
tf_answers = extract_true_false_answers(parent_rows["1"]["raw_answer_text"] or "")
inserts = []
for display_order, child in enumerate(CHILDREN, start=1):
parent = parent_rows[child.top_level_number]
existing = existing_by_number.get(child.question_number, {})
question_text = extract_segment(parent["question_text"] or "", child.path)
raw_answer_text = extract_segment(parent["raw_answer_text"] or "", child.path) if child.path else (parent["raw_answer_text"] or "")
options = None
correct_option = None
if child.question_type == "true_false":
options = TRUE_FALSE_OPTIONS
correct_option = tf_answers.get(child.path[0])
inserts.append(
{
"paper_id": paper_id,
"question_number": child.question_number,
"parent_question": child.parent_question,
"display_order": display_order,
"question_type": child.question_type,
"question_format": child.question_format,
"question_text": question_text,
"score": child.score,
"page_number": child.page_number,
"page_y_ratio": existing.get("page_y_ratio"),
"options": options,
"correct_option": correct_option,
"correct_answer": None,
"raw_answer_text": raw_answer_text,
"topics": existing.get("topics") or (list(child.topic_tags) if child.topic_tags else parent.get("topics")),
"topic_primary": existing.get("topic_primary") or child.topic_primary or parent.get("topic_primary"),
"analytics_topic": existing.get("analytics_topic") or child.analytics_topic or parent.get("analytics_topic"),
"topic_tags": existing.get("topic_tags") or (list(child.topic_tags) if child.topic_tags else parent.get("topic_tags")),
"skill_tags": existing.get("skill_tags") or (list(child.skill_tags) if child.skill_tags else parent.get("skill_tags")),
"difficulty": existing.get("difficulty") or parent.get("difficulty"),
"knowledge_reminder": existing.get("knowledge_reminder", ""),
"ai_hint": existing.get("ai_hint", ""),
"solution": existing.get("solution", ""),
}
)
sb.table("paper_questions").delete().eq("paper_id", paper_id).execute()
sb.table("paper_questions").insert(inserts).execute()
sb.table("papers").update({"question_count": len(inserts), "status": "processing"}).eq("id", paper_id).execute()
print(f"Inserted {len(inserts)} rows for {EXAM_KEY}.")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,121 @@
"""Upload COMP2211 course-library PDFs to Supabase Storage.
Run from the backend directory:
uv run python upload_course_library_pdfs.py
Each entry maps a storage path (inside the `papers` bucket) to the local
source file under pastpaper-scraper/papers/COMP2211/.
"""
from __future__ import annotations
import sys
from pathlib import Path
# ---------------------------------------------------------------------------
# Manifest: (storage_path, local_filename)
# storage_path is relative inside the `papers` bucket.
# local_filename is relative to PAPERS_DIR below.
# ---------------------------------------------------------------------------
MANIFEST: list[tuple[str, str]] = [
(
"course-library/COMP2211/COMP2211-2022-fall-midterm/paper.pdf",
"(COMP2211)[2022](f)midterm~=yjz8dxdd^_27002.pdf",
),
(
"course-library/COMP2211/COMP2211-2022-fall-midterm/answer.pdf",
"(COMP2211)[2022](f)midterm~=yjz8dxdd^_18747.pdf",
),
(
"course-library/COMP2211/COMP2211-2022-spring-midterm/paper.pdf",
"(COMP2211)[2022](s)midterm~=b8bidkgs^_14629.pdf",
),
(
"course-library/COMP2211/COMP2211-2022-spring-midterm/answer.pdf",
"(COMP2211)[2022](s)midterm~=6ma030^_89587.pdf",
),
(
"course-library/COMP2211/COMP2211-2022-spring-final-part-a/paper.pdf",
"(COMP2211)[2022](s)final~=b8bidkgs^_33018.pdf",
),
(
"course-library/COMP2211/COMP2211-2022-spring-final-part-a/answer.pdf",
"(COMP2211)[2022](s)final~=ajou6^_82011.pdf",
),
(
"course-library/COMP2211/COMP2211-2022-spring-final-part-b/paper.pdf",
"(COMP2211)[2022](s)final~=b8bidkgs^_40627.pdf",
),
(
"course-library/COMP2211/COMP2211-2022-spring-final-part-b/answer.pdf",
"(COMP2211)[2022](s)final~=ajou6^_51199.pdf",
),
(
"course-library/COMP2211/COMP2211-2023-spring-midterm/paper.pdf",
"(COMP2211)[2023](s)midterm~=bxbidkmj^_26587.pdf",
),
(
"course-library/COMP2211/COMP2211-2023-spring-midterm/answer.pdf",
"(COMP2211)[2023](s)midterm~clchanbg^_17297.pdf",
),
(
"course-library/COMP2211/COMP2211-2024-spring-midterm/paper.pdf",
"(COMP2211)[2024](s)midterm~=rcidkjgf^_82003.pdf",
),
(
"course-library/COMP2211/COMP2211-2024-spring-midterm/answer.pdf",
"(COMP2211)[2024](s)midterm~=ubrzkjmz^_90406.pdf",
),
(
"course-library/COMP2211/COMP2211-2024-spring-final/paper.pdf",
"(COMP2211)[2024](s)final~=igk5mmg^_90365.pdf",
),
(
"course-library/COMP2211/COMP2211-2024-spring-final/answer.pdf",
"(COMP2211)[2024](s)final~=igk5mmg^_58857.pdf",
),
]
PAPERS_DIR = (
Path(__file__).parent.parent
/ "pastpaper-scraper"
/ "papers"
/ "COMP2211"
)
def main() -> None:
from app.services.supabase_client import get_supabase
sb = get_supabase()
bucket = sb.storage.from_("papers")
ok = 0
skipped = 0
failed = 0
for storage_path, local_name in MANIFEST:
local_file = PAPERS_DIR / local_name
if not local_file.exists():
print(f" MISSING local file: {local_name}")
failed += 1
continue
data = local_file.read_bytes()
try:
bucket.upload(
storage_path,
data,
file_options={"content-type": "application/pdf", "upsert": "true"},
)
print(f" OK {storage_path}")
ok += 1
except Exception as exc:
print(f" ERR {storage_path}: {exc}")
failed += 1
print(f"\nDone: {ok} uploaded, {skipped} skipped, {failed} failed.")
if __name__ == "__main__":
main()

1969
backend/uv.lock generated Normal file

File diff suppressed because it is too large Load Diff

92
deploy.md Normal file
View File

@@ -0,0 +1,92 @@
# 部署到腾讯云
## 1. 服务器准备
```bash
# SSH 登录后安装 Docker
curl -fsSL https://get.docker.com | sh
sudo systemctl enable docker && sudo systemctl start docker
# 安装 docker-compose
sudo curl -L "https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose
```
## 2. 上传代码
```bash
# 本地打包(排除 node_modules 和 .venv
cd "/Users/soda/Desktop/PastPaper Master"
tar --exclude='node_modules' --exclude='.venv' --exclude='__pycache__' --exclude='.git' \
-czf pastpaper.tar.gz .
# 上传到服务器
scp pastpaper.tar.gz root@<SERVER_IP>:/opt/pastpaper/
# 服务器上解压
ssh root@<SERVER_IP>
cd /opt/pastpaper && tar xzf pastpaper.tar.gz
```
## 3. 配置环境变量
```bash
# 编辑 .env确认所有 key 正确
vi /opt/pastpaper/.env
```
需要的变量:
- `SUPABASE_URL`, `SUPABASE_ANON_KEY`, `SUPABASE_SERVICE_ROLE_KEY`
- `DASHSCOPE_BASE_URL`, `DASHSCOPE_API_KEY`
- `DEEPSEEK_BASE_URL`, `DEEPSEEK_API_KEY`
- `LAOZHANG_BASE_URL`, `LAOZHANG_API_KEY`(备用)
- `GOOGLE_GEMINI_API_KEY`(如果服务器地区支持)
## 4. 构建并启动
```bash
cd /opt/pastpaper
docker-compose up -d --build
```
## 5. 验证
```bash
# 检查容器状态
docker-compose ps
# 检查后端健康
curl http://localhost/health
# 查看日志
docker-compose logs -f backend
docker-compose logs -f frontend
```
## 6. 域名 + HTTPS可选
如果有域名,在腾讯云控制台配置 DNS → 服务器 IP然后
```bash
# 安装 certbot
apt install -y certbot python3-certbot-nginx
# 获取证书(先把 nginx.conf 里 server_name 改成你的域名)
certbot --nginx -d your-domain.com
```
## 常用运维命令
```bash
# 重启
docker-compose restart
# 更新代码后重新构建
docker-compose up -d --build
# 查看后端日志
docker-compose logs -f backend
# 进入后端容器
docker-compose exec backend bash
```

10
docker-compose.yml Normal file
View File

@@ -0,0 +1,10 @@
services:
backend:
build: ./backend
env_file: .env
ports:
- "8001:8000"
restart: unless-stopped
dns:
- 8.8.8.8
- 1.1.1.1

View File

@@ -0,0 +1,152 @@
# Sub-question Page Number Backfill — Requirements
## Problem
All six `split_comp2211_*.py` scripts create sub-questions by inheriting `page_number`
from their parent question:
```python
"page_number": parent.get("page_number"),
```
This is wrong for sub-questions that span multiple pages. For example, Q1 True/False
has 10 statements (aj); if (a)(f) are on page 1 and (g)(j) are on page 2, all ten
inherit page 1 from the parent. Clicking Q1h in the UI scrolls to page 1 instead of page 2.
## Goal
Every `ChildSpec` in every split script should carry its own correct `page_number`.
When the script runs, it writes that page number to the database instead of inheriting
from the parent.
## Files to modify
```
backend/split_comp2211_2022_fall_midterm.py ← does not exist yet; parent is seed SQL
backend/split_comp2211_2022_spring_midterm.py
backend/split_comp2211_2022_spring_final_part_a.py
backend/split_comp2211_2022_spring_final_part_b.py
backend/split_comp2211_2023_spring_midterm.py
backend/split_comp2211_2024_spring_midterm.py
backend/split_comp2211_2024_spring_final.py
```
Note: `2022-fall-midterm` sub-questions were inserted directly via the seed SQL
(`supabase/seeds/comp2211_problem_level_questions.sql`), not via a split script.
Their page numbers must be fixed directly in that SQL file or via a separate UPDATE.
## How to determine page numbers
Use PyMuPDF (`import pymupdf` — already in the venv) to search for question markers
in the local PDF files. The PDFs are at:
```
../pastpaper-scraper/papers/COMP2211/<filename>
```
Filename mapping (from `upload_course_library_pdfs.py`):
| Exam key | Local paper PDF |
|----------|----------------|
| COMP2211-2022-fall-midterm | (COMP2211)[2022](f)midterm~=yjz8dxdd^_27002.pdf |
| COMP2211-2022-spring-midterm | (COMP2211)[2022](s)midterm~=b8bidkgs^_14629.pdf |
| COMP2211-2022-spring-final-part-a | (COMP2211)[2022](s)final~=b8bidkgs^_33018.pdf |
| COMP2211-2022-spring-final-part-b | (COMP2211)[2022](s)final~=b8bidkgs^_40627.pdf |
| COMP2211-2023-spring-midterm | (COMP2211)[2023](s)midterm~=bxbidkmj^_26587.pdf |
| COMP2211-2024-spring-midterm | (COMP2211)[2024](s)midterm~=rcidkjgf^_82003.pdf |
| COMP2211-2024-spring-final | (COMP2211)[2024](s)final~=igk5mmg^_90365.pdf |
### Suggested search strategy
```python
import pymupdf
doc = pymupdf.open("path/to/paper.pdf")
for page_num, page in enumerate(doc, start=1):
text = page.get_text()
print(f"--- Page {page_num} ---")
print(text[:500])
```
Search for markers like:
- `"(a)"`, `"(b)"`, ... for True/False sub-statements
- `"Q2(a)"`, `"2(a)"`, `"Question 2"` for major sub-questions
- `"(i)"`, `"(ii)"` for nested sub-questions
Page numbers are 1-indexed (matching the `page_number` field in the database).
## Code changes per split script
### Step 1 — Add `page_number` field to `ChildSpec`
Each script has its own `ChildSpec` dataclass. Add the field with a default so
existing call sites don't break immediately:
```python
@dataclass(frozen=True)
class ChildSpec:
...
page_number: int = 1 # add this field
```
### Step 2 — Set correct page numbers in each `ChildSpec` instance
Fill in the actual page after inspecting the PDF:
```python
ChildSpec("1a", "1", "1", ("a",), 1.5, "true_false", page_number=1),
ChildSpec("1b", "1", "1", ("b",), 1.5, "true_false", page_number=1),
...
ChildSpec("1h", "1", "1", ("h",), 1.5, "true_false", page_number=2),
```
### Step 3 — Write `page_number` in the upsert payload
Find where the script builds the INSERT/upsert dict and replace the inherited value:
```python
# Before:
"page_number": parent.get("page_number"),
# After:
"page_number": child.page_number,
```
### Step 4 — Update existing rows in the database
After modifying the scripts, run each script once — they already use upsert/update
semantics, so re-running overwrites the old (inherited) page numbers with the correct ones.
If a script does INSERT-only (not upsert), add a separate UPDATE pass:
```python
sb.table("paper_questions").update({"page_number": child.page_number}) \
.eq("paper_id", paper_id) \
.eq("question_number", child.question_number) \
.execute()
```
## 2022-fall-midterm (seed SQL)
Sub-questions for this paper are in:
`supabase/seeds/comp2211_problem_level_questions.sql`
The seed has a `page_number` column in the VALUES rows. Find all rows for
`COMP2211-2022-fall-midterm` and correct the values. Then run a direct UPDATE
against the live database:
```sql
-- Example — adjust actual page numbers after inspecting the PDF
UPDATE paper_questions
SET page_number = 2
WHERE paper_id = (SELECT id FROM papers WHERE source_exam_key = 'COMP2211-2022-fall-midterm')
AND question_number IN ('1g', '1h', '1i', '1j');
```
## Definition of Done
- [ ] Every `ChildSpec` in every split script has an explicit `page_number`
- [ ] No script uses `parent.get("page_number")` for the upsert payload
- [ ] All six scripts have been re-run against the live database
- [ ] 2022-fall-midterm sub-questions updated via SQL
- [ ] Spot-check: clicking Q1h in a paper where Q1 spans 2 pages scrolls to page 2 in the UI

View File

@@ -0,0 +1,243 @@
# Tag Schema & Similar Question Retrieval — Requirements
## Background
Current state of `paper_questions` tagging for COMP2211:
- `analytics_topic`: 8 coarse buckets (e.g. "KNN and Clustering" covers both KNN and K-Means)
- `topic_tags`: redundant copy of `analytics_topic`, adds no information
- `skill_tags`: fine-grained snake_case labels (e.g. `centroid_update`, `distance_calculation`), not shown to users
- `question_text`: at subquestion level, but currently stores **parent problem header text**, not the actual subquestion statement
The result is that similar question retrieval conflates KNN and K-Means, cannot distinguish "write code" from "trace algorithm", and produces low-precision recommendations.
---
## Goal
Every subquestion should carry enough structured metadata that the retrieval system can return **topically and skill-wise identical questions across different exam years**, rather than just questions from the same broad topic bucket.
Precision target: a question on K-Means centroid update should retrieve other K-Means centroid update questions, not KNN distance questions.
---
## Field Definitions (revised)
### `analytics_topic` — single string, primary retrieval bucket
Granularity: **algorithm or concept level**, not course-section level.
Allowed values for COMP2211 (replace current 8-bucket system):
| New value | Replaces / splits |
|-----------|-------------------|
| `Naive Bayes` | Probabilistic Models (partial) |
| `Bayesian Inference` | Probabilistic Models (partial) |
| `KNN` | KNN and Clustering (partial) |
| `K-Means` | KNN and Clustering (partial) |
| `Perceptron` | Perceptron and MLP (partial) |
| `MLP` | Perceptron and MLP (partial) |
| `CNN` | Vision and CNN |
| `Evaluation Metrics` | Evaluation and Validation (partial) |
| `Cross Validation` | Evaluation and Validation (partial) |
| `Python and NumPy` | Python Fundamentals |
| `Search Algorithms` | Search and Games (partial) |
| `Game Trees` | Search and Games (partial) |
| `Ethics of AI` | Ethics of AI (unchanged) |
Rules:
- One value per question — pick the **most specific** algorithm being tested
- If a subquestion genuinely spans two algorithms, pick the one being asked to compute/demonstrate
- `True/False` is **not** a valid analytics_topic (it is a format, not a topic)
---
### `topic_tags` — string array, secondary topic labels
Granularity: **concept and variant level** within the algorithm.
Purpose: catch cross-topic overlaps and concept aliases.
Examples:
```
analytics_topic = "K-Means"
topic_tags = ["K-Means", "Centroid Update", "Convergence"]
analytics_topic = "KNN"
topic_tags = ["KNN", "Euclidean Distance", "Classification"]
analytics_topic = "Naive Bayes"
topic_tags = ["Naive Bayes", "Prior", "Likelihood", "Posterior"]
analytics_topic = "Evaluation Metrics"
topic_tags = ["Evaluation Metrics", "Precision", "Recall", "F1 Score"]
analytics_topic = "MLP"
topic_tags = ["MLP", "Backpropagation", "Activation Function", "Hidden Layer"]
analytics_topic = "Python and NumPy"
topic_tags = ["NumPy", "Broadcasting", "Array Indexing", "Vectorization"]
```
Rules:
- First element should match or alias `analytics_topic`
- Include concept names a student would search for ("F1 Score", not "metric_reasoning")
- 25 tags per question; avoid over-tagging
- Human-readable, title-case, no underscores
---
### `skill_tags` — string array, task type labels
Granularity: **what the student must do**, not what the topic is.
Current values are acceptable in meaning but must be converted to human-readable form.
Rename convention: `snake_case``Title Case with spaces`
| Old | New |
|-----|-----|
| `concept_check` | `Concept Check` |
| `code_tracing` | `Code Tracing` |
| `algorithm_tracing` | `Algorithm Tracing` |
| `distance_calculation` | `Distance Calculation` |
| `centroid_update` | `Centroid Update` |
| `weight_update` | `Weight Update` |
| `decision_boundary` | `Decision Boundary` |
| `implementation` | `Implementation` |
| `debugging` | `Debugging` |
| `model_selection` | `Model Selection` |
| `concept_explanation` | `Concept Explanation` |
| `architecture_reasoning` | `Architecture Reasoning` |
| `convergence_reasoning` | `Convergence Reasoning` |
| `generalization_reasoning` | `Generalization Reasoning` |
| `classification_decision` | `Classification Decision` |
Rules:
- 13 tags per question
- Describes the **task type**, not the subject matter
- These are used for retrieval ranking, not primary display
---
### `question_text` — the actual subquestion statement
Current problem: subquestions store the **parent problem header** as `question_text`, not the individual statement.
Required fix per subquestion type:
| Type | What `question_text` should contain |
|------|-------------------------------------|
| True/False subquestion (Q1aQ1j) | The specific T/F statement being judged |
| Code output (Q2a_iQ2a_v) | The specific code snippet + "What is the output?" |
| Calculation subquestion (Q4a, Q5a) | The specific sub-task, e.g. "Compute the Euclidean distance between..." |
| Written explanation (Q3, Q5c) | The full question prompt for that part |
This is a **data extraction quality issue**. The backfill script must extract the correct per-subquestion text from the source PDF or from `raw_answer_text`.
---
## Backfill Requirements
### Script: `backfill_comp2211_tags.py`
Target: all `paper_questions` where `paper_id` in the COMP2211 course library.
For each question:
1. **Re-classify `analytics_topic`** using the new value list above
- Use `question_text` + existing `topic_tags` + `skill_tags` as signals
- If `analytics_topic` is currently `"KNN and Clustering"`:
- Look at `skill_tags` and `question_text`
- If `centroid_update`, `algorithm_tracing`, or text contains "K-Means" / "centroid" → set `"K-Means"`
- Otherwise → set `"KNN"`
- If `analytics_topic` is currently `"Perceptron and MLP"`:
- If `question_text` or `skill_tags` references hidden layer, backprop, activation function → `"MLP"`
- Otherwise → `"Perceptron"`
- If `analytics_topic` is currently `"Probabilistic Models"`:
- If Naive Bayes in text → `"Naive Bayes"`
- Otherwise → `"Bayesian Inference"`
- If `analytics_topic` is currently `"Evaluation and Validation"`:
- If cross-validation, train/val split in text → `"Cross Validation"`
- Otherwise → `"Evaluation Metrics"`
- If `analytics_topic` is currently `"Search and Games"`:
- If minimax, alpha-beta, game tree in text → `"Game Trees"`
- Otherwise → `"Search Algorithms"`
2. **Rebuild `topic_tags`** — do not copy `analytics_topic`; derive from question content
3. **Rename `skill_tags`** — convert all snake_case values to Title Case per the mapping table above
4. **Do not overwrite `question_text`** in this pass (separate task)
---
## Retrieval Algorithm Changes (backend `questions.py`)
### Separate topic and skill contributions
Current `similarity_score()` merges `analytics_topic`, `topic_tags`, and `skill_tags` into one set. This causes skill tags like `centroid_update` to appear as "Shared topic: centroid_update" in the UI.
Required split:
```python
def similarity_score(target, candidate):
score = 0
reasons = []
# 1. analytics_topic exact match: 40 pts
if target.get("analytics_topic") and target["analytics_topic"] == candidate.get("analytics_topic"):
score += 40
reasons.append(f"Same topic: {target['analytics_topic']}")
# 2. topic_tags overlap: up to 20 pts (10 per shared tag, max 2)
target_tt = set(t.lower() for t in (target.get("topic_tags") or []))
candidate_tt = set(t.lower() for t in (candidate.get("topic_tags") or []))
shared_tt = target_tt & candidate_tt
tt_pts = min(len(shared_tt) * 10, 20)
if tt_pts:
score += tt_pts
reasons.append(f"Shared concept: {', '.join(sorted(shared_tt)[:2])}")
# 3. skill_tags overlap: up to 20 pts (10 per shared tag, max 2)
target_st = set(t.lower() for t in (target.get("skill_tags") or []))
candidate_st = set(t.lower() for t in (candidate.get("skill_tags") or []))
shared_st = target_st & candidate_st
st_pts = min(len(shared_st) * 10, 20)
if st_pts:
score += st_pts
reasons.append(f"Shared skill: {', '.join(sorted(shared_st)[:2])}")
# 4. Same question format: 10 pts
if question_family(candidate) == question_family(target):
score += 10
reasons.append("Same format")
# 5. Same difficulty: 5 pts
if candidate.get("difficulty") and candidate["difficulty"] == target.get("difficulty"):
score += 5
reasons.append("Same difficulty")
# 6. Full-text similarity: up to 20 pts (from tsvector RPC)
# (injected externally, not computed here)
return min(score, 99), reasons
```
### Threshold and display
- Filter: `match_percent < 20` (raised from 10; ensures analytics_topic at least partially matches)
- UI display: show `match_reasons` chips, but replace snake_case with Title Case before display
---
## Definition of Done
- [ ] All COMP2211 questions have `analytics_topic` from the new value list
- [ ] No `analytics_topic` value of `"KNN and Clustering"`, `"Perceptron and MLP"`, `"Probabilistic Models"`, `"Evaluation and Validation"`, `"Search and Games"` remains
- [ ] `topic_tags` contains 25 human-readable concept names, not a copy of `analytics_topic`
- [ ] `skill_tags` values are Title Case with spaces
- [ ] Similar question retrieval returns 0 cross-algorithm false positives between KNN and K-Means
- [ ] `match_reasons` chips in the UI show no underscores
- [ ] Retrieval threshold enforces `analytics_topic` match as a hard or near-hard requirement

12
frontend/Dockerfile Normal file
View File

@@ -0,0 +1,12 @@
FROM node:20-alpine AS build
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci
COPY . .
RUN npm run build
FROM nginx:alpine
COPY --from=build /app/dist /usr/share/nginx/html
COPY nginx.conf /etc/nginx/conf.d/default.conf
EXPOSE 80

13
frontend/index.html Normal file
View File

@@ -0,0 +1,13 @@
<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<link rel="icon" type="image/jpeg" href="/favicon.jpg" />
<title>PastPaper Master</title>
</head>
<body>
<div id="root"></div>
<script type="module" src="/src/main.tsx"></script>
</body>
</html>

27
frontend/nginx.conf Normal file
View File

@@ -0,0 +1,27 @@
server {
listen 80;
server_name pastpaper.knowit.top;
root /usr/share/nginx/html;
index index.html;
# SPA fallback
location / {
try_files $uri $uri/ /index.html;
}
# API proxy to backend
location /api/ {
proxy_pass http://backend:8000/api/;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_read_timeout 300s;
client_max_body_size 50M;
}
# Health check proxy
location /health {
proxy_pass http://backend:8000/health;
}
}

3058
frontend/package-lock.json generated Normal file

File diff suppressed because it is too large Load Diff

30
frontend/package.json Normal file
View File

@@ -0,0 +1,30 @@
{
"name": "frontend",
"version": "1.0.0",
"description": "",
"type": "module",
"scripts": {
"dev": "vite",
"build": "tsc && vite build",
"preview": "vite preview"
},
"dependencies": {
"@supabase/supabase-js": "^2.103.0",
"katex": "^0.16.38",
"pdfjs-dist": "^5.5.207",
"react": "^19.2.4",
"react-dom": "^19.2.4",
"react-pdf": "^10.4.1",
"react-router-dom": "^7.13.1"
},
"devDependencies": {
"@tailwindcss/vite": "^4.2.1",
"@types/katex": "^0.16.8",
"@types/react": "^19.2.14",
"@types/react-dom": "^19.2.3",
"@vitejs/plugin-react": "^4.7.0",
"tailwindcss": "^4.2.1",
"typescript": "^5.9.3",
"vite": "^7.3.1"
}
}

BIN
frontend/public/favicon.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 103 KiB

30
frontend/src/App.tsx Normal file
View File

@@ -0,0 +1,30 @@
import { Navigate, Routes, Route } from "react-router-dom";
import { useAuth } from "./contexts/AuthContext";
import ProcessingBanner from "./components/layout/ProcessingBanner";
import LoginPage from "./pages/LoginPage";
import HomePage from "./pages/HomePage";
import UploadPage from "./pages/UploadPage";
import WorkbenchPage from "./pages/WorkbenchPage";
import ErrorBookPage from "./pages/ErrorBookPage";
import AnalyticsPage from "./pages/AnalyticsPage";
export default function App() {
const { session, loading } = useAuth();
if (loading) return <div className="min-h-screen bg-gray-50 flex items-center justify-center"><div className="text-gray-400 text-sm">Loading...</div></div>;
return (
<>
<ProcessingBanner />
<Routes>
<Route path="/login" element={session ? <Navigate to="/" replace /> : <LoginPage />} />
<Route path="/" element={<HomePage />} />
<Route path="/upload" element={<UploadPage />} />
<Route path="/paper/:id" element={<WorkbenchPage />} />
<Route path="/error-book" element={<ErrorBookPage />} />
<Route path="/analytics" element={<AnalyticsPage />} />
<Route path="/analytics/:courseCode" element={<AnalyticsPage />} />
</Routes>
</>
);
}

View File

@@ -0,0 +1,69 @@
import { Link } from "react-router-dom";
import { useAuth } from "@/contexts/AuthContext";
export default function Header({
courseCode,
paperTitle,
}: {
courseCode?: string;
paperTitle?: string;
}) {
const { user, signOut } = useAuth();
return (
<header className="h-14 border-b border-gray-200 bg-white flex items-center px-6 shrink-0">
<Link to="/" className="text-lg font-bold text-blue-600 mr-6">
PastPaper Master
</Link>
{courseCode && (
<div className="flex items-center gap-2 text-sm text-gray-600">
<span className="bg-blue-50 text-blue-700 px-2 py-0.5 rounded font-medium">
{courseCode}
</span>
{paperTitle && <span>{paperTitle}</span>}
<Link
to={`/analytics/${courseCode}`}
className="ml-2 flex items-center gap-1 px-2.5 py-1 text-xs font-medium text-indigo-600 bg-indigo-50 rounded hover:bg-indigo-100 transition-colors"
>
<svg className="w-3 h-3" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
<path strokeLinecap="round" strokeLinejoin="round" d="M3 13.125C3 12.504 3.504 12 4.125 12h2.25c.621 0 1.125.504 1.125 1.125v6.75C7.5 20.496 6.996 21 6.375 21h-2.25A1.125 1.125 0 013 19.875v-6.75zM9.75 8.625c0-.621.504-1.125 1.125-1.125h2.25c.621 0 1.125.504 1.125 1.125v11.25c0 .621-.504 1.125-1.125 1.125h-2.25a1.125 1.125 0 01-1.125-1.125V8.625zM16.5 4.125c0-.621.504-1.125 1.125-1.125h2.25C20.496 3 21 3.504 21 4.125v15.75c0 .621-.504 1.125-1.125 1.125h-2.25a1.125 1.125 0 01-1.125-1.125V4.125z" />
</svg>
AI Analytics
</Link>
</div>
)}
<div className="ml-auto flex items-center gap-4 text-sm">
<Link to="/" className="text-gray-500 hover:text-gray-800">
My Papers
</Link>
<Link to="/error-book" className="text-gray-500 hover:text-gray-800">
Error Book
</Link>
<Link to="/analytics" className="text-gray-500 hover:text-gray-800">
Analytics
</Link>
<Link to="/upload" className="text-blue-600 hover:text-blue-800 font-medium">
Upload
</Link>
{user ? (
<div className="flex items-center gap-3 pl-4 border-l border-gray-200">
<span className="text-xs text-gray-400">{user.email}</span>
<button
onClick={signOut}
className="text-xs text-gray-500 hover:text-gray-800 px-2 py-1 rounded hover:bg-gray-100"
>
Sign out
</button>
</div>
) : (
<Link
to="/login"
className="text-sm text-blue-600 hover:text-blue-800 font-medium pl-4 border-l border-gray-200"
>
Sign in
</Link>
)}
</div>
</header>
);
}

View File

@@ -0,0 +1,183 @@
import { useEffect, useRef, useState } from "react";
import { Link } from "react-router-dom";
import { myPapers } from "@/lib/api";
import { useAuth } from "@/contexts/AuthContext";
import type { Paper } from "@/types/api";
interface Notification {
paperId: string;
label: string;
}
const POLL_MS = 4000;
export default function ProcessingBanner() {
const { user } = useAuth();
const [processing, setProcessing] = useState<Paper[]>([]);
const [doneNotifs, setDoneNotifs] = useState<Notification[]>([]);
const [expanded, setExpanded] = useState(false);
const knownIds = useRef<Set<string>>(new Set());
// Drag state
const [pos, setPos] = useState({ x: window.innerWidth - 220, y: 24 });
const dragging = useRef(false);
const dragOffset = useRef({ x: 0, y: 0 });
const widgetRef = useRef<HTMLDivElement>(null);
useEffect(() => {
if (!user) return;
let cancelled = false;
const poll = async () => {
try {
const papers = await myPapers();
if (cancelled) return;
const inProgress = papers.filter((p) => p.status === "processing" || p.status === "uploaded");
setProcessing(inProgress);
papers
.filter((p) => p.status === "ready" && knownIds.current.has(p.id))
.forEach((p) => {
knownIds.current.delete(p.id);
const label = `${p.course_code} ${p.year} ${p.term} ${p.exam_type}`;
setDoneNotifs((prev) => [...prev, { paperId: p.id, label }]);
setTimeout(() => {
setDoneNotifs((prev) => prev.filter((n) => n.paperId !== p.id));
}, 8000);
});
inProgress.forEach((p) => knownIds.current.add(p.id));
} catch {
// silent
}
};
poll();
const interval = setInterval(poll, POLL_MS);
return () => { cancelled = true; clearInterval(interval); };
}, [user]);
// Drag handlers
const onMouseDown = (e: React.MouseEvent) => {
// Only drag on the header bar
dragging.current = true;
dragOffset.current = {
x: e.clientX - pos.x,
y: e.clientY - pos.y,
};
e.preventDefault();
};
useEffect(() => {
const onMouseMove = (e: MouseEvent) => {
if (!dragging.current) return;
setPos({
x: Math.max(0, Math.min(window.innerWidth - 200, e.clientX - dragOffset.current.x)),
y: Math.max(0, Math.min(window.innerHeight - 60, e.clientY - dragOffset.current.y)),
});
};
const onMouseUp = () => { dragging.current = false; };
window.addEventListener("mousemove", onMouseMove);
window.addEventListener("mouseup", onMouseUp);
return () => {
window.removeEventListener("mousemove", onMouseMove);
window.removeEventListener("mouseup", onMouseUp);
};
}, []);
if (!user || (processing.length === 0 && doneNotifs.length === 0)) return null;
const total = processing.length + doneNotifs.length;
return (
<div
ref={widgetRef}
className="fixed z-50 select-none"
style={{ left: pos.x, top: pos.y }}
>
{/* ── Header / collapsed pill ── */}
<div
onMouseDown={onMouseDown}
onClick={() => setExpanded((v) => !v)}
className="flex items-center gap-2 bg-gray-900 text-white text-xs px-3.5 py-2.5 rounded-xl shadow-lg cursor-grab active:cursor-grabbing"
style={{ minWidth: 180 }}
>
<span className="w-3 h-3 border-2 border-white border-t-transparent rounded-full animate-spin shrink-0" />
<span className="flex-1 font-medium">
{processing.length > 0
? `${processing.length} processing…`
: `${doneNotifs.length} ready`}
</span>
{doneNotifs.length > 0 && (
<span className="w-4 h-4 flex items-center justify-center bg-green-500 rounded-full text-[10px] font-bold shrink-0">
{doneNotifs.length}
</span>
)}
<span className="text-gray-400 text-[10px] shrink-0">{expanded ? "▲" : "▼"}</span>
</div>
{/* ── Expanded panel ── */}
{expanded && (
<div className="mt-1.5 flex flex-col gap-1.5" style={{ minWidth: 240 }}>
{processing.map((p) => {
const step = p.processing_step;
const progress = p.processing_progress || 0;
const total = p.processing_total || 0;
const pct = total > 0 ? Math.round((progress / total) * 100) : 0;
return (
<div
key={p.id}
className="bg-gray-900 text-white text-xs px-3.5 py-2.5 rounded-xl shadow-lg"
>
<div className="flex items-center gap-2.5 mb-1.5">
<span className="w-3 h-3 border-2 border-white border-t-transparent rounded-full animate-spin shrink-0" />
<span className="truncate">
<span className="font-semibold">{p.course_code}</span>{" "}
{p.year} {p.term} {p.exam_type}
</span>
</div>
{step && (
<>
<div className="text-[10px] text-gray-400 mb-1 truncate">{step}</div>
{total > 0 && (
<div className="h-1.5 bg-gray-700 rounded-full overflow-hidden">
<div className="h-full bg-blue-400 rounded-full transition-all duration-500" style={{ width: `${pct}%` }} />
</div>
)}
</>
)}
</div>
);
})}
{doneNotifs.map((n) => (
<div
key={n.paperId}
className="flex items-center gap-2.5 bg-green-600 text-white text-xs px-3.5 py-2.5 rounded-xl shadow-lg"
>
<span className="text-sm leading-none"></span>
<span className="flex-1 truncate font-semibold">{n.label}</span>
<Link
to={`/paper/${n.paperId}`}
className="shrink-0 underline font-semibold hover:text-green-100"
onClick={(e) => e.stopPropagation()}
>
Open
</Link>
<button
onClick={(e) => {
e.stopPropagation();
setDoneNotifs((prev) => prev.filter((x) => x.paperId !== n.paperId));
}}
className="shrink-0 text-green-200 hover:text-white"
>
×
</button>
</div>
))}
</div>
)}
</div>
);
}

View File

@@ -0,0 +1,65 @@
import { useState } from "react";
const schemes = {
blue: {
border: "border-blue-200",
bg: "bg-blue-50",
text: "text-blue-800",
icon: "text-blue-500",
},
amber: {
border: "border-amber-200",
bg: "bg-amber-50",
text: "text-amber-800",
icon: "text-amber-500",
},
green: {
border: "border-green-200",
bg: "bg-green-50",
text: "text-green-800",
icon: "text-green-500",
},
} as const;
export default function CollapsibleSection({
title,
colorScheme,
defaultOpen = false,
children,
}: {
title: string;
colorScheme: keyof typeof schemes;
defaultOpen?: boolean;
children: React.ReactNode;
}) {
const [isOpen, setIsOpen] = useState(defaultOpen);
const s = schemes[colorScheme];
return (
<div className={`rounded-lg border ${s.border} mb-3`}>
<button
onClick={() => setIsOpen(!isOpen)}
className={`w-full flex items-center justify-between p-3 rounded-t-lg ${s.bg} cursor-pointer`}
>
<span className={`font-semibold text-sm ${s.text}`}>{title}</span>
<svg
className={`w-4 h-4 ${s.icon} transition-transform duration-200 ${isOpen ? "rotate-180" : ""}`}
fill="none"
viewBox="0 0 24 24"
stroke="currentColor"
strokeWidth={2}
>
<path strokeLinecap="round" strokeLinejoin="round" d="M19 9l-7 7-7-7" />
</svg>
</button>
<div
className="grid transition-[grid-template-rows] duration-300 ease-in-out"
style={{ gridTemplateRows: isOpen ? "1fr" : "0fr" }}
>
<div className="overflow-hidden">
<div className="p-3">{children}</div>
</div>
</div>
</div>
);
}

View File

@@ -0,0 +1,86 @@
import { useMemo } from "react";
import katex from "katex";
/**
* Pre-render all LaTeX in an HTML string at the string level,
* then set innerHTML. This avoids DOM-based auto-render issues
* where delimiters get split across text nodes or special chars
* like # cause silent failures.
*/
function renderLatexInString(html: string): string {
// Strip <code class="latex"> and <pre class="latex"> wrappers
let s = html
.replace(/<code[^>]*class="latex"[^>]*>(.*?)<\/code>/gs, "$1")
.replace(/<pre[^>]*class="latex"[^>]*>(.*?)<\/pre>/gs, "$1");
// 1) Render display math: $$...$$ and \[...\]
s = s.replace(/\$\$([\s\S]+?)\$\$/g, (_match, tex: string) => {
return renderTex(tex.trim(), true);
});
s = s.replace(/\\\[([\s\S]+?)\\\]/g, (_match, tex: string) => {
return renderTex(tex.trim(), true);
});
// 2) Render inline math: $...$ and \(...\)
// Negative lookbehind for \ to avoid matching \$ escapes
// Also avoid matching $$ (already handled above)
s = s.replace(/(?<![\\$])\$(?!\$)((?:[^$\\]|\\.)+?)\$/g, (_match, tex: string) => {
return renderTex(tex, false);
});
s = s.replace(/\\\(([\s\S]+?)\\\)/g, (_match, tex: string) => {
return renderTex(tex, false);
});
return s;
}
function decodeHtmlEntities(s: string): string {
return s
.replace(/&amp;/g, "&")
.replace(/&lt;/g, "<")
.replace(/&gt;/g, ">")
.replace(/&quot;/g, '"')
.replace(/&#39;/g, "'")
.replace(/&nbsp;/g, " ");
}
function renderTex(tex: string, displayMode: boolean): string {
// Decode HTML entities that might appear in DB-sourced HTML
let cleaned = decodeHtmlEntities(tex);
// Sanitize common issues that cause KaTeX to fail:
// 1) # and % inside \text{} — escape them
cleaned = cleaned.replace(/\\text\{([^}]*)\}/g, (_m, inner: string) => {
return "\\text{" + inner.replace(/#/g, "\\#").replace(/%/g, "\\%") + "}";
});
// 2) Standalone # outside \text{} in math — escape it
cleaned = cleaned.replace(/(?<!\\)#(?!\\)/g, "\\#");
try {
return katex.renderToString(cleaned, {
displayMode,
throwOnError: false,
trust: true,
strict: false,
});
} catch {
// Fallback: show the raw LaTeX in a styled span
return `<span class="katex-error" style="color:#E11D48;font-size:0.85em">${tex}</span>`;
}
}
export default function KaTeXRenderer({
html,
className,
}: {
html: string;
className?: string;
}) {
const rendered = useMemo(() => renderLatexInString(html), [html]);
return (
<div
className={`kb-html-content text-sm ${className ?? ""}`}
dangerouslySetInnerHTML={{ __html: rendered }}
/>
);
}

View File

@@ -0,0 +1,15 @@
const statusConfig = {
uploaded: { label: "Uploaded", bg: "bg-gray-100", text: "text-gray-600" },
processing: { label: "Processing...", bg: "bg-blue-100", text: "text-blue-700" },
ready: { label: "Ready", bg: "bg-green-100", text: "text-green-700" },
error: { label: "Error", bg: "bg-red-100", text: "text-red-700" },
} as const;
export default function StatusBadge({ status }: { status: string }) {
const config = statusConfig[status as keyof typeof statusConfig] ?? statusConfig.uploaded;
return (
<span className={`inline-block px-2 py-0.5 rounded-full text-xs font-medium ${config.bg} ${config.text}`}>
{config.label}
</span>
);
}

View File

@@ -0,0 +1,63 @@
import { useRef, useState } from "react";
export default function FilePickerField({
label,
required,
file,
onFileChange,
}: {
label: string;
required?: boolean;
file: File | null;
onFileChange: (file: File | null) => void;
}) {
const inputRef = useRef<HTMLInputElement>(null);
const [isDragging, setIsDragging] = useState(false);
const handleDrop = (e: React.DragEvent) => {
e.preventDefault();
setIsDragging(false);
const f = e.dataTransfer.files[0];
if (f?.type === "application/pdf") onFileChange(f);
};
return (
<div>
<label className="block text-sm font-medium text-gray-700 mb-1">
{label} {required && <span className="text-red-500">*</span>}
</label>
<div
className={`border-2 border-dashed rounded-lg p-6 text-center cursor-pointer transition-colors
${isDragging ? "border-blue-400 bg-blue-50" : "border-gray-300 hover:border-gray-400"}`}
onClick={() => inputRef.current?.click()}
onDragOver={(e) => { e.preventDefault(); setIsDragging(true); }}
onDragLeave={() => setIsDragging(false)}
onDrop={handleDrop}
>
<input
ref={inputRef}
type="file"
accept=".pdf"
className="hidden"
onChange={(e) => onFileChange(e.target.files?.[0] ?? null)}
/>
{file ? (
<div className="flex items-center justify-center gap-2">
<span className="text-blue-600 font-medium text-sm">{file.name}</span>
<button
type="button"
onClick={(e) => { e.stopPropagation(); onFileChange(null); }}
className="text-gray-400 hover:text-red-500 text-xs"
>
Remove
</button>
</div>
) : (
<div className="text-gray-400 text-sm">
Click or drag PDF file here
</div>
)}
</div>
</div>
);
}

View File

@@ -0,0 +1,184 @@
import { useState, useCallback } from "react";
import { useNavigate } from "react-router-dom";
import { uploadPaper } from "@/lib/api";
import FilePickerField from "./FilePickerField";
/** Try to extract course code, year, term, exam type from filename */
function parseFilename(name: string): {
courseCode?: string;
year?: number;
term?: string;
examType?: string;
} {
const result: ReturnType<typeof parseFilename> = {};
// Remove extension
const base = name.replace(/\.[^.]+$/, "").replace(/[_\-]+/g, " ");
// Course code: 2-4 uppercase letters + 4 digits + optional letter (e.g. COMP2211, MATH1014H)
const courseMatch = base.match(/([A-Za-z]{2,4}\s*\d{4}[A-Za-z]?)/i);
if (courseMatch) {
result.courseCode = courseMatch[1].replace(/\s/g, "").toUpperCase();
}
// Year: 4-digit (2019-2029) or 2-digit (19-29)
const year4 = base.match(/\b(20[1-2]\d)\b/);
if (year4) {
result.year = Number(year4[1]);
} else {
const year2 = base.match(/\b(\d{2})\b/);
if (year2) {
const y = Number(year2[1]);
if (y >= 15 && y <= 29) result.year = 2000 + y;
}
}
// Term
const lower = base.toLowerCase();
if (/spring|spr/i.test(lower)) result.term = "spring";
else if (/fall|aut/i.test(lower)) result.term = "fall";
else if (/summer|sum/i.test(lower)) result.term = "summer";
// Exam type
if (/mid/i.test(lower)) result.examType = "midterm";
else if (/final|fin/i.test(lower)) result.examType = "final";
else if (/quiz/i.test(lower)) result.examType = "quiz";
return result;
}
export default function UploadForm() {
const navigate = useNavigate();
const [paperFile, setPaperFile] = useState<File | null>(null);
const [answerFile, setAnswerFile] = useState<File | null>(null);
const [courseCode, setCourseCode] = useState("");
const [year, setYear] = useState(new Date().getFullYear());
const [term, setTerm] = useState("fall");
const [examType, setExamType] = useState("midterm");
const [submitting, setSubmitting] = useState(false);
const [error, setError] = useState<string | null>(null);
const [autoFilled, setAutoFilled] = useState(false);
const handlePaperFile = useCallback((file: File | null) => {
setPaperFile(file);
if (!file) { setAutoFilled(false); return; }
const parsed = parseFilename(file.name);
const filled: string[] = [];
if (parsed.courseCode) { setCourseCode(parsed.courseCode); filled.push("course"); }
if (parsed.year) { setYear(parsed.year); filled.push("year"); }
if (parsed.term) { setTerm(parsed.term); filled.push("term"); }
if (parsed.examType) { setExamType(parsed.examType); filled.push("type"); }
setAutoFilled(filled.length > 0);
}, []);
const handleSubmit = async (e: React.FormEvent) => {
e.preventDefault();
if (!paperFile || !courseCode) return;
setSubmitting(true);
setError(null);
try {
const fd = new FormData();
fd.append("paper_file", paperFile);
if (answerFile) fd.append("answer_file", answerFile);
fd.append("course_code", courseCode);
fd.append("year", String(year));
fd.append("term", term);
fd.append("exam_type", examType);
const result = await uploadPaper(fd);
navigate(`/paper/${result.paper_id}`);
} catch (err) {
setError(err instanceof Error ? err.message : "Upload failed");
setSubmitting(false);
}
};
return (
<form onSubmit={handleSubmit} className="max-w-lg mx-auto space-y-5">
<FilePickerField
label="Paper PDF"
required
file={paperFile}
onFileChange={handlePaperFile}
/>
{autoFilled && (
<div className="text-xs text-green-600 bg-green-50 px-3 py-1.5 rounded-lg -mt-3">
Auto-filled from filename please verify below
</div>
)}
<FilePickerField
label="Answer / Solution PDF (optional)"
file={answerFile}
onFileChange={setAnswerFile}
/>
<div>
<label className="block text-sm font-medium text-gray-700 mb-1">
Course Code <span className="text-red-500">*</span>
</label>
<input
type="text"
value={courseCode}
onChange={(e) => setCourseCode(e.target.value.toUpperCase())}
placeholder="e.g. COMP2011"
className="w-full border border-gray-300 rounded-lg px-3 py-2 text-sm focus:outline-none focus:ring-2 focus:ring-blue-500"
required
/>
</div>
<div className="grid grid-cols-3 gap-3">
<div>
<label className="block text-sm font-medium text-gray-700 mb-1">Year</label>
<input
type="number"
value={year}
onChange={(e) => setYear(Number(e.target.value))}
className="w-full border border-gray-300 rounded-lg px-3 py-2 text-sm focus:outline-none focus:ring-2 focus:ring-blue-500"
/>
</div>
<div>
<label className="block text-sm font-medium text-gray-700 mb-1">Term</label>
<select
value={term}
onChange={(e) => setTerm(e.target.value)}
className="w-full border border-gray-300 rounded-lg px-3 py-2 text-sm focus:outline-none focus:ring-2 focus:ring-blue-500"
>
<option value="fall">Fall</option>
<option value="spring">Spring</option>
<option value="summer">Summer</option>
</select>
</div>
<div>
<label className="block text-sm font-medium text-gray-700 mb-1">Exam Type</label>
<select
value={examType}
onChange={(e) => setExamType(e.target.value)}
className="w-full border border-gray-300 rounded-lg px-3 py-2 text-sm focus:outline-none focus:ring-2 focus:ring-blue-500"
>
<option value="midterm">Midterm</option>
<option value="final">Final</option>
<option value="quiz">Quiz</option>
</select>
</div>
</div>
{error && (
<div className="text-red-600 text-sm bg-red-50 p-3 rounded-lg">{error}</div>
)}
<button
type="submit"
disabled={!paperFile || !courseCode || submitting}
className="w-full bg-blue-600 text-white py-2.5 rounded-lg font-medium text-sm
hover:bg-blue-700 disabled:opacity-50 disabled:cursor-not-allowed transition-colors"
>
{submitting ? "Uploading..." : "Upload & Analyze"}
</button>
</form>
);
}

View File

@@ -0,0 +1,58 @@
import type { Question } from "@/types/api";
export default function ActionBar({
question,
onGenerateVariant,
isGenerating,
onPhotoOpen,
answerState,
}: {
question: Question | null;
onGenerateVariant: () => void;
isGenerating: boolean;
onPhotoOpen: () => void;
answerState?: "correct" | "wrong" | null;
}) {
if (!question) return null;
const isLong = question.question_type === "long_question" || question.question_type === "long_answer" || question.question_type === "coding";
return (
<div className="border-t border-gray-200 bg-white px-4 py-3 shrink-0 space-y-2">
{/* Answer state feedback (for non-long questions, driven by QuestionDetail) */}
{answerState && (
<div className={`text-center text-sm font-medium py-1.5 rounded-lg ${
answerState === "correct"
? "bg-green-50 text-green-600"
: "bg-red-50 text-red-600"
}`}>
{answerState === "correct" ? "Correct!" : "Added to error book"}
</div>
)}
{/* Long question: Upload handwritten answer */}
{isLong && (
<button
onClick={onPhotoOpen}
className="w-full py-2.5 rounded-lg text-sm font-medium bg-blue-600 text-white hover:bg-blue-700 transition-colors"
>
Upload handwritten answer
</button>
)}
{/* Generate variant — always available */}
<button
onClick={onGenerateVariant}
disabled={isGenerating}
className="w-full py-2 rounded-lg text-sm font-medium bg-purple-50 text-purple-700 border border-purple-200 hover:bg-purple-100 disabled:opacity-50 transition-colors"
>
{isGenerating ? (
<span className="flex items-center justify-center gap-2">
<span className="w-3 h-3 border-2 border-purple-600 border-t-transparent rounded-full animate-spin" />
Generating...
</span>
) : "Generate Variant"}
</button>
</div>
);
}

View File

@@ -0,0 +1,21 @@
import type { Question } from "@/types/api";
import CollapsibleSection from "@/components/shared/CollapsibleSection";
import KaTeXRenderer from "@/components/shared/KaTeXRenderer";
export default function AiTrioPanel({ question }: { question: Question }) {
return (
<div>
<CollapsibleSection title="Knowledge Reminder" colorScheme="blue" defaultOpen>
<KaTeXRenderer html={question.knowledge_reminder} />
</CollapsibleSection>
<CollapsibleSection title="AI Hint" colorScheme="amber">
<KaTeXRenderer html={question.ai_hint} />
</CollapsibleSection>
<CollapsibleSection title="Solution" colorScheme="green">
<KaTeXRenderer html={question.solution} />
</CollapsibleSection>
</div>
);
}

View File

@@ -0,0 +1,170 @@
import { useState, useRef, useEffect, useCallback } from "react";
import { Document, Page, pdfjs } from "react-pdf";
import "react-pdf/dist/Page/AnnotationLayer.css";
import "react-pdf/dist/Page/TextLayer.css";
pdfjs.GlobalWorkerOptions.workerSrc = `https://unpkg.com/pdfjs-dist@${pdfjs.version}/build/pdf.worker.min.mjs`;
export default function PdfViewer({
fileUrl,
currentPage,
onPageChange,
}: {
fileUrl: string;
currentPage?: number;
onPageChange?: (page: number) => void;
}) {
const [numPages, setNumPages] = useState(0);
const [containerWidth, setContainerWidth] = useState(0);
const containerRef = useRef<HTMLDivElement>(null);
const scrollRef = useRef<HTMLDivElement>(null);
const pageRefs = useRef<Map<number, HTMLDivElement>>(new Map());
const [jumpPage, setJumpPage] = useState("");
const programmaticScroll = useRef(false);
// Resize observer for container width
useEffect(() => {
if (!containerRef.current) return;
const observer = new ResizeObserver((entries) => {
setContainerWidth(entries[0].contentRect.width);
});
observer.observe(containerRef.current);
return () => observer.disconnect();
}, []);
// Scroll to page when currentPage changes (programmatic)
useEffect(() => {
if (!currentPage || currentPage < 1) return;
const el = pageRefs.current.get(currentPage);
if (el) {
programmaticScroll.current = true;
el.scrollIntoView({ behavior: "smooth", block: "start" });
setTimeout(() => { programmaticScroll.current = false; }, 2000);
}
}, [currentPage]);
// IntersectionObserver to detect visible page on user scroll
useEffect(() => {
if (numPages === 0 || !scrollRef.current) return;
const visiblePages = new Map<number, number>();
const observer = new IntersectionObserver(
(entries) => {
for (const entry of entries) {
const pageNum = Number(entry.target.getAttribute("data-page"));
if (entry.isIntersecting) {
visiblePages.set(pageNum, entry.intersectionRatio);
} else {
visiblePages.delete(pageNum);
}
}
// Don't fire callback during programmatic scroll
if (programmaticScroll.current) return;
// Find the page with the highest visibility ratio
let bestPage = 0;
let bestRatio = 0;
for (const [page, ratio] of visiblePages) {
if (ratio > bestRatio) {
bestRatio = ratio;
bestPage = page;
}
}
if (bestPage > 0) {
onPageChange?.(bestPage);
}
},
{
root: scrollRef.current,
threshold: [0, 0.25, 0.5, 0.75, 1],
},
);
for (const [, el] of pageRefs.current) {
observer.observe(el);
}
return () => observer.disconnect();
}, [numPages, onPageChange]);
const setPageRef = useCallback((pageNum: number, el: HTMLDivElement | null) => {
if (el) {
el.setAttribute("data-page", String(pageNum));
pageRefs.current.set(pageNum, el);
} else {
pageRefs.current.delete(pageNum);
}
}, []);
const handleJump = () => {
const p = parseInt(jumpPage, 10);
if (p >= 1 && p <= numPages) {
const el = pageRefs.current.get(p);
el?.scrollIntoView({ behavior: "smooth", block: "start" });
}
setJumpPage("");
};
return (
<div ref={containerRef} className="h-full flex flex-col bg-gray-100">
{/* Page controls */}
<div className="flex items-center justify-center gap-3 py-2 bg-white border-b border-gray-200 text-sm shrink-0">
<span className="text-gray-600">
{numPages} pages
</span>
<span className="text-gray-300">|</span>
<span className="text-gray-600">
Go to{" "}
<input
type="number"
value={jumpPage}
onChange={(e) => setJumpPage(e.target.value)}
onKeyDown={(e) => { if (e.key === "Enter") handleJump(); }}
placeholder="#"
className="w-12 text-center border border-gray-300 rounded px-1 py-0.5 text-sm"
min={1}
max={numPages}
/>
</span>
</div>
{/* All pages scrollable */}
<div ref={scrollRef} className="flex-1 overflow-auto">
<Document
file={fileUrl}
onLoadSuccess={({ numPages: n }) => setNumPages(n)}
loading={
<div className="flex items-center justify-center h-64 text-gray-400">
Loading PDF...
</div>
}
error={
<div className="flex items-center justify-center h-64 text-red-400">
Failed to load PDF
</div>
}
>
{numPages > 0 &&
Array.from({ length: numPages }, (_, i) => i + 1).map((pageNum) => (
<div
key={pageNum}
ref={(el) => setPageRef(pageNum, el)}
className="flex justify-center mb-2"
>
<div className="bg-white shadow-sm">
<Page
pageNumber={pageNum}
width={containerWidth > 0 ? containerWidth - 48 : undefined}
renderAnnotationLayer
renderTextLayer
/>
</div>
</div>
))}
</Document>
</div>
</div>
);
}

View File

@@ -0,0 +1,90 @@
import { useState, useRef } from "react";
import { uploadPhoto } from "@/lib/api";
import type { UserAttempt } from "@/types/api";
export default function PhotoUpload({
questionId,
onClose,
onSubmitted,
}: {
questionId: string;
onClose: () => void;
onSubmitted: (promise: Promise<{ attempt: UserAttempt; ocr_text: string; grade: { is_correct: boolean; score_given?: number; feedback: string } }>) => void;
}) {
const [file, setFile] = useState<File | null>(null);
const [preview, setPreview] = useState<string | null>(null);
const [submitting, setSubmitting] = useState(false);
const [error, setError] = useState<string | null>(null);
const inputRef = useRef<HTMLInputElement>(null);
const handleFile = (f: File) => {
setFile(f);
setPreview(URL.createObjectURL(f));
setError(null);
};
const handleSubmit = () => {
if (!file || submitting) return;
setSubmitting(true);
const promise = uploadPhoto(questionId, file);
// Close modal immediately, let parent handle the async result
onSubmitted(promise);
onClose();
};
return (
<div className="fixed inset-0 bg-black/40 flex items-center justify-center z-50 p-4">
<div className="bg-white rounded-xl shadow-xl max-w-lg w-full max-h-[90vh] overflow-y-auto">
<div className="p-5">
<div className="flex items-center justify-between mb-4">
<h3 className="text-lg font-semibold text-gray-900">Upload Answer Photo</h3>
<button onClick={onClose} className="text-gray-400 hover:text-gray-600 text-xl">&times;</button>
</div>
{!preview ? (
<div
onClick={() => inputRef.current?.click()}
className="border-2 border-dashed border-gray-300 rounded-lg p-8 text-center cursor-pointer hover:border-blue-400 transition-colors"
>
<div className="text-3xl mb-2">📷</div>
<p className="text-sm text-gray-600">Click to take photo or select image</p>
<input
ref={inputRef}
type="file"
accept="image/*"
capture="environment"
className="hidden"
onChange={(e) => {
const f = e.target.files?.[0];
if (f) handleFile(f);
}}
/>
</div>
) : (
<div className="space-y-3">
<img src={preview} alt="Preview" className="w-full rounded-lg border" />
{error && (
<div className="text-sm text-red-600 bg-red-50 rounded-lg p-3">{error}</div>
)}
<div className="flex gap-2">
<button
onClick={() => { setFile(null); setPreview(null); }}
className="flex-1 py-2 rounded-lg text-sm border border-gray-200 text-gray-600 hover:bg-gray-50"
>
Retake
</button>
<button
onClick={handleSubmit}
disabled={submitting}
className="flex-1 py-2 rounded-lg text-sm bg-blue-600 text-white font-medium hover:bg-blue-700 disabled:opacity-50"
>
Submit for Grading
</button>
</div>
</div>
)}
</div>
</div>
</div>
);
}

View File

@@ -0,0 +1,260 @@
import { useState, useEffect } from "react";
import type { Question } from "@/types/api";
import { subquestionLabel } from "@/lib/questionGroups";
const typeLabels: Record<string, string> = {
mc: "Multiple Choice",
true_false: "True / False",
fill_blank: "Fill in Blank",
long_question: "Long Question",
long_answer: "Long Answer",
short_answer: "Short Answer",
coding: "Coding",
};
const difficultyColors: Record<string, string> = {
easy: "bg-green-100 text-green-700",
medium: "bg-yellow-100 text-yellow-700",
hard: "bg-red-100 text-red-700",
};
export default function QuestionDetail({
question,
onAnswerResult,
}: {
question: Question;
onAnswerResult?: (isCorrect: boolean, userAnswer: string) => void;
}) {
const [selectedOption, setSelectedOption] = useState<string | null>(null);
const [checked, setChecked] = useState(false);
const [fillAnswer, setFillAnswer] = useState("");
const [fillChecked, setFillChecked] = useState(false);
// True/False: per-statement answers { "a": "True", "b": "False", ... }
const [tfAnswer, setTfAnswer] = useState<"True" | "False" | null>(null);
const [tfChecked, setTfChecked] = useState(false);
// Reset state when question changes
useEffect(() => {
setSelectedOption(null);
setChecked(false);
setFillAnswer("");
setFillChecked(false);
setTfAnswer(null);
setTfChecked(false);
}, [question.id]);
const isCorrectMc = checked && selectedOption === question.correct_option;
const isCorrectFill =
fillChecked &&
question.correct_answer != null &&
fillAnswer.trim().toLowerCase() === question.correct_answer.trim().toLowerCase();
const handleMcCheck = () => {
if (!selectedOption) return;
setChecked(true);
const correct = selectedOption === question.correct_option;
onAnswerResult?.(correct, selectedOption);
};
const handleFillCheck = () => {
if (!fillAnswer.trim()) return;
setFillChecked(true);
const correct =
question.correct_answer != null &&
fillAnswer.trim().toLowerCase() === question.correct_answer.trim().toLowerCase();
onAnswerResult?.(correct, fillAnswer.trim());
};
const getOptionStyle = (label: string) => {
if (!checked) {
return label === selectedOption
? "border-blue-400 bg-blue-50"
: "border-gray-200 hover:bg-gray-50";
}
if (label === question.correct_option) return "border-green-400 bg-green-50";
if (label === selectedOption) return "border-red-400 bg-red-50";
return "border-gray-200 opacity-50";
};
return (
<div className="mb-4">
{/* Header row */}
<div className="flex items-center gap-2 mb-2 flex-wrap">
<span className="text-base font-bold text-gray-900">
Q{question.question_number.match(/^\d+/)?.[0] ?? question.question_number}
</span>
{question.question_number.replace(/^\d+/, "") && (
<span className="text-xs px-2 py-0.5 rounded bg-gray-100 text-gray-600">
{subquestionLabel(question)}
</span>
)}
<span className="text-xs px-2 py-0.5 rounded bg-blue-100 text-blue-700">
{typeLabels[question.question_type] ?? question.question_type}
</span>
{question.score != null && (
<span className="text-xs text-gray-500">{question.score} pts</span>
)}
{question.difficulty && (
<span
className={`text-xs px-2 py-0.5 rounded ${difficultyColors[question.difficulty] ?? ""}`}
>
{question.difficulty}
</span>
)}
</div>
{/* Topics */}
{question.topics && question.topics.length > 0 && (
<div className="flex gap-1 mb-3 flex-wrap">
{question.topics.map((t) => (
<span key={t} className="text-xs bg-gray-100 text-gray-600 px-2 py-0.5 rounded-full">
{t}
</span>
))}
</div>
)}
{/* MC options */}
{question.question_type === "mc" && question.options && (
<>
<div className="mt-3 space-y-1.5">
{question.options.map((opt) => (
<button
key={opt.label}
onClick={() => { if (!checked) setSelectedOption(opt.label); }}
className={`w-full flex items-start gap-2 p-2 rounded-lg border text-sm text-left transition-colors ${getOptionStyle(opt.label)}`}
disabled={checked}
>
<span className={`font-semibold shrink-0 w-6 ${
checked && opt.label === question.correct_option ? "text-green-600" :
checked && opt.label === selectedOption ? "text-red-600" :
opt.label === selectedOption ? "text-blue-600" : "text-blue-600"
}`}>
{opt.label}.
</span>
<span className="text-gray-700">{opt.text}</span>
{checked && opt.label === question.correct_option && (
<span className="ml-auto text-green-600 text-xs font-medium shrink-0">Correct</span>
)}
</button>
))}
</div>
{!checked && selectedOption && (
<button
onClick={handleMcCheck}
className="mt-2 px-4 py-1.5 bg-blue-600 text-white rounded-lg text-sm font-medium hover:bg-blue-700 transition-colors"
>
Check Answer
</button>
)}
{checked && (
<div className={`mt-2 text-sm font-medium ${isCorrectMc ? "text-green-600" : "text-red-600"}`}>
{isCorrectMc ? "Correct!" : `Wrong — the answer is ${question.correct_option}`}
</div>
)}
</>
)}
{/* True/False */}
{question.question_type === "true_false" && (() => {
// Normalize T/F/True/False to "true"/"false"
const normTF = (v: string | null | undefined): string => {
if (!v) return "";
const l = v.trim().toLowerCase();
if (l === "t" || l === "true") return "true";
if (l === "f" || l === "false") return "false";
return l;
};
const correctNorm = normTF(question.correct_option ?? question.correct_answer);
const correctDisplay = correctNorm === "true" ? "True" : "False";
return (
<>
<div className="mt-3 flex gap-2">
{(["True", "False"] as const).map((val) => {
const isSelected = tfAnswer === val;
const isCorrectVal = tfChecked && normTF(val) === correctNorm;
const isWrongVal = tfChecked && isSelected && !isCorrectVal;
return (
<button
key={val}
onClick={() => { if (!tfChecked) setTfAnswer(val); }}
disabled={tfChecked}
className={`flex-1 py-2 rounded-lg border text-sm font-semibold transition-colors ${
isCorrectVal
? "border-green-400 bg-green-50 text-green-700"
: isWrongVal
? "border-red-400 bg-red-50 text-red-700"
: isSelected
? "border-blue-400 bg-blue-50 text-blue-700"
: "border-gray-200 text-gray-600 hover:bg-gray-50"
}`}
>
{val === "True" ? "T — True" : "F — False"}
</button>
);
})}
</div>
{!tfChecked && tfAnswer && (
<button
onClick={() => {
setTfChecked(true);
const isCorrect = normTF(tfAnswer) === correctNorm;
onAnswerResult?.(isCorrect, tfAnswer);
}}
className="mt-2 px-4 py-1.5 bg-blue-600 text-white rounded-lg text-sm font-medium hover:bg-blue-700 transition-colors"
>
Check Answer
</button>
)}
{tfChecked && (
<div className={`mt-2 text-sm font-medium ${
normTF(tfAnswer) === correctNorm ? "text-green-600" : "text-red-600"
}`}>
{normTF(tfAnswer) === correctNorm
? "Correct!"
: `Wrong — the answer is ${correctDisplay}`}
</div>
)}
</>
);
})()}
{/* Fill-blank input */}
{question.question_type === "fill_blank" && (
<div className="mt-3">
<div className="flex gap-2">
<input
type="text"
value={fillAnswer}
onChange={(e) => { if (!fillChecked) setFillAnswer(e.target.value); }}
placeholder="Type your answer..."
disabled={fillChecked}
className={`flex-1 border rounded-lg px-3 py-2 text-sm focus:outline-none focus:ring-2 focus:ring-blue-500 ${
fillChecked
? isCorrectFill ? "border-green-400 bg-green-50" : "border-red-400 bg-red-50"
: "border-gray-300"
}`}
onKeyDown={(e) => { if (e.key === "Enter") handleFillCheck(); }}
/>
{!fillChecked && (
<button
onClick={handleFillCheck}
disabled={!fillAnswer.trim()}
className="px-4 py-2 bg-blue-600 text-white rounded-lg text-sm font-medium hover:bg-blue-700 disabled:opacity-50 transition-colors"
>
Check
</button>
)}
</div>
{fillChecked && (
<div className={`mt-2 text-sm font-medium ${isCorrectFill ? "text-green-600" : "text-red-600"}`}>
{isCorrectFill
? "Correct!"
: `Wrong — the answer is: ${question.correct_answer ?? "N/A"}`}
</div>
)}
</div>
)}
</div>
);
}

View File

@@ -0,0 +1,56 @@
import type { Question } from "@/types/api";
import type { QuestionGroup } from "@/lib/questionGroups";
import { subquestionLabel } from "@/lib/questionGroups";
export default function QuestionNav({
groups,
currentGroupKey,
currentQuestionId,
onSelectGroup,
onSelectQuestion,
}: {
groups: QuestionGroup[];
currentGroupKey: string | null;
currentQuestionId: string | null;
onSelectGroup: (groupKey: string) => void;
onSelectQuestion: (questionId: string) => void;
}) {
const activeGroup = groups.find((group) => group.key === currentGroupKey) ?? null;
return (
<div className="border-b border-gray-200 bg-white px-4 py-2 shrink-0">
<div className="flex gap-1.5 overflow-x-auto hide-scrollbar">
{groups.map((group) => (
<button
key={group.key}
onClick={() => onSelectGroup(group.key)}
className={`px-3 py-1.5 rounded-lg text-xs font-medium whitespace-nowrap transition-colors
${group.key === currentGroupKey
? "bg-blue-600 text-white"
: "bg-gray-100 text-gray-600 hover:bg-gray-200"
}`}
>
{group.label}
</button>
))}
</div>
{activeGroup && activeGroup.questions.length > 1 && (
<div className="flex gap-1.5 overflow-x-auto hide-scrollbar mt-2">
{activeGroup.questions.map((question) => (
<button
key={question.id}
onClick={() => onSelectQuestion(question.id)}
className={`px-2.5 py-1 rounded-md text-[11px] font-medium whitespace-nowrap transition-colors
${question.id === currentQuestionId
? "bg-blue-50 text-blue-700 border border-blue-200"
: "bg-gray-50 text-gray-500 border border-gray-200 hover:bg-gray-100"
}`}
>
{subquestionLabel(question)}
</button>
))}
</div>
)}
</div>
);
}

View File

@@ -0,0 +1,130 @@
import { useEffect, useState } from "react";
import { Link } from "react-router-dom";
import { getSimilarQuestions } from "@/lib/api";
import type { Question, SimilarQuestion } from "@/types/api";
const typeLabel: Record<string, string> = {
mc: "MC",
true_false: "T/F",
fill_blank: "Fill",
long_question: "Long",
long_answer: "Long",
short_answer: "Short",
coding: "Code",
};
function matchColor(percent: number): string {
if (percent >= 80) return "bg-green-100 text-green-700";
if (percent >= 60) return "bg-amber-100 text-amber-700";
return "bg-gray-100 text-gray-600";
}
function cleanReason(reason: string): string {
// "Shared topic: foo_bar, baz_qux" → "Shared topic: Foo Bar, Baz Qux"
return reason.replace(/[_]/g, " ").replace(/:\s*(.+)$/, (_, rest) =>
": " + rest.split(",").map((s: string) =>
s.trim().replace(/\b\w/g, (c: string) => c.toUpperCase())
).join(", ")
);
}
export default function SimilarHistoryPanel({ question }: { question: Question }) {
const [items, setItems] = useState<SimilarQuestion[]>([]);
const [loading, setLoading] = useState(true);
const [error, setError] = useState<string | null>(null);
const [isOpen, setIsOpen] = useState(true);
useEffect(() => {
let cancelled = false;
setLoading(true);
setError(null);
setItems([]);
getSimilarQuestions(question.id)
.then((data) => {
if (cancelled) return;
setItems(data);
setLoading(false);
})
.catch((err: unknown) => {
if (cancelled) return;
setError(err instanceof Error ? err.message : "Failed to load.");
setLoading(false);
});
return () => { cancelled = true; };
}, [question.id]);
return (
<div className="rounded-lg border border-blue-200 mb-3 overflow-hidden">
<button
onClick={() => setIsOpen((open) => !open)}
className="w-full flex items-center justify-between p-3 bg-blue-50"
>
<div className="flex items-center gap-2">
<span className="w-5 h-5 flex items-center justify-center rounded bg-blue-600 text-white text-xs font-bold">S</span>
<span className="font-semibold text-sm text-blue-800">Similar Questions</span>
</div>
<span className="text-xs text-blue-600">{loading ? "…" : items.length}</span>
</button>
{isOpen && (
<div className="p-2 space-y-1.5 bg-white">
{loading && <div className="text-xs text-gray-400 px-1 py-2">Loading</div>}
{!loading && error && (
<div className="text-xs text-red-600 bg-red-50 border border-red-200 rounded px-3 py-2">{error}</div>
)}
{!loading && !error && items.length === 0 && (
<div className="text-xs text-gray-400 px-1 py-2">No similar questions found.</div>
)}
{items.map((item) => (
<Link
key={item.id}
to={`/paper/${item.paper_id}`}
className="flex items-center gap-2 px-2.5 py-2 rounded-lg border border-gray-100 hover:border-blue-200 hover:bg-blue-50/40 transition-colors"
>
{/* Match % badge */}
<span className={`shrink-0 text-[11px] font-bold px-1.5 py-0.5 rounded ${matchColor(item.match_percent)}`}>
{item.match_percent}%
</span>
{/* Main info */}
<div className="flex-1 min-w-0">
<div className="flex items-center gap-1.5 flex-wrap">
<span className="text-xs font-semibold text-gray-700">{item.source}</span>
<span className="text-xs text-gray-400">·</span>
<span className="text-xs text-gray-500">Q{item.question_number}</span>
{item.question_type && (
<>
<span className="text-xs text-gray-400">·</span>
<span className="text-xs text-gray-500">{typeLabel[item.question_type] ?? item.question_type}</span>
</>
)}
</div>
{/* Topics + reasons in one row */}
<div className="flex gap-1 flex-wrap mt-1">
{item.topics.slice(0, 2).map((topic) => (
<span key={topic} className="text-[10px] px-1.5 py-0.5 rounded bg-gray-100 text-gray-500">
{topic}
</span>
))}
{item.match_reasons
?.filter((r) => !r.startsWith("Same format") && !r.startsWith("Same difficulty"))
.slice(0, 2)
.map((reason) => (
<span key={reason} className="text-[10px] px-1.5 py-0.5 rounded bg-blue-50 text-blue-500">
{cleanReason(reason)}
</span>
))}
</div>
</div>
<span className="text-gray-300 text-xs shrink-0"></span>
</Link>
))}
</div>
)}
</div>
);
}

View File

@@ -0,0 +1,148 @@
import { useState } from "react";
import type { VariantQuestion } from "@/types/api";
import KaTeXRenderer from "@/components/shared/KaTeXRenderer";
import CollapsibleSection from "@/components/shared/CollapsibleSection";
export default function VariantDetail({
variant,
}: {
variant: VariantQuestion;
}) {
const [selectedOption, setSelectedOption] = useState<string | null>(null);
const [checked, setChecked] = useState(false);
const [fillAnswer, setFillAnswer] = useState("");
const [fillChecked, setFillChecked] = useState(false);
const isMc = (variant.question_type === "mc" || variant.question_type === "true_false") && variant.options;
const handleMcCheck = () => {
if (!selectedOption) return;
setChecked(true);
};
const handleFillCheck = () => {
if (!fillAnswer.trim()) return;
setFillChecked(true);
};
const isCorrectMc = checked && selectedOption === variant.correct_answer;
const isCorrectFill =
fillChecked &&
fillAnswer.trim().toLowerCase() === variant.correct_answer.trim().toLowerCase();
const getOptionStyle = (label: string) => {
if (!checked) {
return label === selectedOption
? "border-blue-400 bg-blue-50"
: "border-gray-200 hover:bg-gray-50";
}
if (label === variant.correct_answer) return "border-green-400 bg-green-50";
if (label === selectedOption) return "border-red-400 bg-red-50";
return "border-gray-200 opacity-50";
};
return (
<div>
{/* Header */}
<div className="flex items-center gap-2 mb-3">
<span className="w-5 h-5 flex items-center justify-center bg-purple-600 text-white text-xs font-bold rounded-full">V</span>
<span className="text-sm font-semibold text-gray-900">Similar Question</span>
<span className="text-xs px-2 py-0.5 rounded bg-purple-100 text-purple-700">
{variant.question_type}
</span>
</div>
{/* Question text */}
<div className="text-sm text-gray-800 leading-relaxed bg-purple-50 rounded-lg p-3 border border-purple-200 mb-4">
<KaTeXRenderer html={variant.question_text} />
</div>
{/* MC options */}
{isMc && variant.options && (
<>
<div className="space-y-1.5">
{variant.options.map((opt) => (
<button
key={opt.label}
onClick={() => { if (!checked) setSelectedOption(opt.label); }}
disabled={checked}
className={`w-full flex items-start gap-2 p-2 rounded-lg border text-sm text-left transition-colors ${getOptionStyle(opt.label)}`}
>
<span className="font-semibold shrink-0 w-6 text-blue-600">{opt.label}.</span>
<span className="text-gray-700">{opt.text}</span>
{checked && opt.label === variant.correct_answer && (
<span className="ml-auto text-green-600 text-xs font-medium shrink-0">Correct</span>
)}
</button>
))}
</div>
{!checked && selectedOption && (
<button
onClick={handleMcCheck}
className="mt-2 px-4 py-1.5 bg-blue-600 text-white rounded-lg text-sm font-medium hover:bg-blue-700"
>
Check Answer
</button>
)}
{checked && (
<div className={`mt-2 text-sm font-medium ${isCorrectMc ? "text-green-600" : "text-red-600"}`}>
{isCorrectMc ? "Correct!" : `Wrong — the answer is ${variant.correct_answer}`}
</div>
)}
</>
)}
{/* Non-MC input */}
{!isMc && (
<div className="mb-3">
<div className="flex gap-2">
<input
type="text"
value={fillAnswer}
onChange={(e) => { if (!fillChecked) setFillAnswer(e.target.value); }}
placeholder="Type your answer..."
disabled={fillChecked}
className={`flex-1 border rounded-lg px-3 py-2 text-sm focus:outline-none focus:ring-2 focus:ring-blue-500 ${
fillChecked
? isCorrectFill ? "border-green-400 bg-green-50" : "border-red-400 bg-red-50"
: "border-gray-300"
}`}
onKeyDown={(e) => { if (e.key === "Enter") handleFillCheck(); }}
/>
{!fillChecked && (
<button
onClick={handleFillCheck}
disabled={!fillAnswer.trim()}
className="px-4 py-2 bg-blue-600 text-white rounded-lg text-sm font-medium hover:bg-blue-700 disabled:opacity-50"
>
Check
</button>
)}
</div>
{fillChecked && (
<div className={`mt-2 text-sm font-medium ${isCorrectFill ? "text-green-600" : "text-red-600"}`}>
{isCorrectFill ? "Correct!" : `Answer: ${variant.correct_answer}`}
</div>
)}
</div>
)}
{/* AI Trio */}
<div className="mt-4 space-y-2">
{variant.knowledge_reminder && (
<CollapsibleSection title="Knowledge Reminder" colorScheme="blue">
<KaTeXRenderer html={variant.knowledge_reminder} />
</CollapsibleSection>
)}
{variant.ai_hint && (
<CollapsibleSection title="AI Hint" colorScheme="amber">
<KaTeXRenderer html={variant.ai_hint} />
</CollapsibleSection>
)}
<CollapsibleSection title="Solution" colorScheme="green">
<KaTeXRenderer html={variant.solution} />
</CollapsibleSection>
</div>
</div>
);
}

View File

@@ -0,0 +1,189 @@
import { useState } from "react";
import type { VariantQuestion } from "@/types/api";
import KaTeXRenderer from "@/components/shared/KaTeXRenderer";
export default function VariantModal({
variant,
onClose,
}: {
variant: VariantQuestion;
onClose: () => void;
}) {
const [selectedOption, setSelectedOption] = useState<string | null>(null);
const [checked, setChecked] = useState(false);
const [fillAnswer, setFillAnswer] = useState("");
const [fillChecked, setFillChecked] = useState(false);
const [showKnowledge, setShowKnowledge] = useState(false);
const [showHint, setShowHint] = useState(false);
const [showSolution, setShowSolution] = useState(false);
const isMc = (variant.question_type === "mc" || variant.question_type === "true_false") && variant.options;
const handleMcCheck = () => {
if (!selectedOption) return;
setChecked(true);
};
const handleFillCheck = () => {
if (!fillAnswer.trim()) return;
setFillChecked(true);
};
const isCorrectMc = checked && selectedOption === variant.correct_answer;
const isCorrectFill =
fillChecked &&
fillAnswer.trim().toLowerCase() === variant.correct_answer.trim().toLowerCase();
const getOptionStyle = (label: string) => {
if (!checked) {
return label === selectedOption
? "border-blue-400 bg-blue-50"
: "border-gray-200 hover:bg-gray-50";
}
if (label === variant.correct_answer) return "border-green-400 bg-green-50";
if (label === selectedOption) return "border-red-400 bg-red-50";
return "border-gray-200 opacity-50";
};
return (
<div className="fixed inset-0 bg-black/40 flex items-center justify-center z-50 p-4">
<div className="bg-white rounded-xl shadow-xl max-w-lg w-full max-h-[90vh] overflow-y-auto">
<div className="p-5">
<div className="flex items-center justify-between mb-4">
<h3 className="text-lg font-semibold text-gray-900">Similar Question</h3>
<button onClick={onClose} className="text-gray-400 hover:text-gray-600 text-xl">&times;</button>
</div>
{/* Question text */}
<div className="text-sm text-gray-800 leading-relaxed bg-gray-50 rounded-lg p-3 border border-gray-200 mb-3">
<KaTeXRenderer html={variant.question_text} />
</div>
{/* MC options */}
{isMc && variant.options && (
<>
<div className="space-y-1.5">
{variant.options.map((opt) => (
<button
key={opt.label}
onClick={() => { if (!checked) setSelectedOption(opt.label); }}
disabled={checked}
className={`w-full flex items-start gap-2 p-2 rounded-lg border text-sm text-left transition-colors ${getOptionStyle(opt.label)}`}
>
<span className="font-semibold shrink-0 w-6 text-blue-600">{opt.label}.</span>
<span className="text-gray-700">{opt.text}</span>
{checked && opt.label === variant.correct_answer && (
<span className="ml-auto text-green-600 text-xs font-medium shrink-0">Correct</span>
)}
</button>
))}
</div>
{!checked && selectedOption && (
<button
onClick={handleMcCheck}
className="mt-2 px-4 py-1.5 bg-blue-600 text-white rounded-lg text-sm font-medium hover:bg-blue-700"
>
Check Answer
</button>
)}
{checked && (
<div className={`mt-2 text-sm font-medium ${isCorrectMc ? "text-green-600" : "text-red-600"}`}>
{isCorrectMc ? "Correct!" : `Wrong — the answer is ${variant.correct_answer}`}
</div>
)}
</>
)}
{/* Non-MC input */}
{!isMc && (
<div className="mt-1">
<div className="flex gap-2">
<input
type="text"
value={fillAnswer}
onChange={(e) => { if (!fillChecked) setFillAnswer(e.target.value); }}
placeholder="Type your answer..."
disabled={fillChecked}
className={`flex-1 border rounded-lg px-3 py-2 text-sm focus:outline-none focus:ring-2 focus:ring-blue-500 ${
fillChecked
? isCorrectFill ? "border-green-400 bg-green-50" : "border-red-400 bg-red-50"
: "border-gray-300"
}`}
onKeyDown={(e) => { if (e.key === "Enter") handleFillCheck(); }}
/>
{!fillChecked && (
<button
onClick={handleFillCheck}
disabled={!fillAnswer.trim()}
className="px-4 py-2 bg-blue-600 text-white rounded-lg text-sm font-medium hover:bg-blue-700 disabled:opacity-50"
>
Check
</button>
)}
</div>
{fillChecked && (
<div className={`mt-2 text-sm font-medium ${isCorrectFill ? "text-green-600" : "text-red-600"}`}>
{isCorrectFill ? "Correct!" : `Answer: ${variant.correct_answer}`}
</div>
)}
</div>
)}
{/* AI Trio: Knowledge / Hint / Solution */}
<div className="mt-4 border-t pt-3 space-y-2">
{variant.knowledge_reminder && (
<div>
<button
onClick={() => setShowKnowledge(!showKnowledge)}
className="text-sm text-blue-600 hover:text-blue-800 font-medium"
>
{showKnowledge ? "▾ Hide Knowledge" : "▸ Knowledge Reminder"}
</button>
{showKnowledge && (
<div className="mt-2 bg-blue-50 rounded-lg p-3 text-sm border border-blue-200">
<KaTeXRenderer html={variant.knowledge_reminder} />
</div>
)}
</div>
)}
{variant.ai_hint && (
<div>
<button
onClick={() => setShowHint(!showHint)}
className="text-sm text-amber-600 hover:text-amber-800 font-medium"
>
{showHint ? "▾ Hide Hint" : "▸ AI Hint"}
</button>
{showHint && (
<div className="mt-2 bg-amber-50 rounded-lg p-3 text-sm border border-amber-200">
<KaTeXRenderer html={variant.ai_hint} />
</div>
)}
</div>
)}
<div>
<button
onClick={() => setShowSolution(!showSolution)}
className="text-sm text-green-600 hover:text-green-800 font-medium"
>
{showSolution ? "▾ Hide Solution" : "▸ Solution"}
</button>
{showSolution && (
<div className="mt-2 bg-green-50 rounded-lg p-3 text-sm border border-green-200">
<KaTeXRenderer html={variant.solution} />
</div>
)}
</div>
</div>
<button
onClick={onClose}
className="mt-4 w-full py-2 rounded-lg text-sm bg-gray-100 text-gray-700 font-medium hover:bg-gray-200"
>
Close
</button>
</div>
</div>
</div>
);
}

View File

@@ -0,0 +1,49 @@
import { createContext, useContext, useEffect, useState } from "react";
import type { Session, User } from "@supabase/supabase-js";
import { supabase } from "@/lib/supabase";
interface AuthContextValue {
session: Session | null;
user: User | null;
loading: boolean;
signOut: () => Promise<void>;
}
const AuthContext = createContext<AuthContextValue>({
session: null,
user: null,
loading: true,
signOut: async () => {},
});
export function AuthProvider({ children }: { children: React.ReactNode }) {
const [session, setSession] = useState<Session | null>(null);
const [loading, setLoading] = useState(true);
useEffect(() => {
supabase.auth.getSession().then(({ data }) => {
setSession(data.session);
setLoading(false);
});
const { data: { subscription } } = supabase.auth.onAuthStateChange((_event, session) => {
setSession(session);
});
return () => subscription.unsubscribe();
}, []);
const signOut = async () => {
await supabase.auth.signOut();
};
return (
<AuthContext.Provider value={{ session, user: session?.user ?? null, loading, signOut }}>
{children}
</AuthContext.Provider>
);
}
export function useAuth() {
return useContext(AuthContext);
}

View File

@@ -0,0 +1,43 @@
import { useEffect, useState } from "react";
import { getPaper } from "@/lib/api";
import type { Paper } from "@/types/api";
const POLL_INTERVAL = 3000;
export function usePaper(paperId: string) {
const [paper, setPaper] = useState<Paper | null>(null);
const [loading, setLoading] = useState(true);
const [error, setError] = useState<string | null>(null);
useEffect(() => {
let intervalId: number | null = null;
let cancelled = false;
const fetchPaper = async () => {
try {
const data = await getPaper(paperId);
if (cancelled) return;
setPaper(data);
setLoading(false);
if (data.status === "ready" || data.status === "error") {
if (intervalId !== null) clearInterval(intervalId);
}
} catch (err) {
if (cancelled) return;
setError(err instanceof Error ? err.message : "Unknown error");
setLoading(false);
if (intervalId !== null) clearInterval(intervalId);
}
};
fetchPaper();
intervalId = window.setInterval(fetchPaper, POLL_INTERVAL);
return () => {
cancelled = true;
if (intervalId !== null) clearInterval(intervalId);
};
}, [paperId]);
return { paper, loading, error };
}

View File

@@ -0,0 +1,33 @@
import { useEffect, useState } from "react";
import { getQuestions } from "@/lib/api";
import type { Question } from "@/types/api";
export function useQuestions(paperId: string, enabled: boolean) {
const [questions, setQuestions] = useState<Question[]>([]);
const [loading, setLoading] = useState(false);
const [error, setError] = useState<string | null>(null);
useEffect(() => {
if (!enabled) return;
let cancelled = false;
setLoading(true);
getQuestions(paperId)
.then((data) => {
if (!cancelled) {
setQuestions(data);
setLoading(false);
}
})
.catch((err) => {
if (!cancelled) {
setError(err instanceof Error ? err.message : "Unknown error");
setLoading(false);
}
});
return () => { cancelled = true; };
}, [paperId, enabled]);
return { questions, loading, error };
}

190
frontend/src/lib/api.ts Normal file
View File

@@ -0,0 +1,190 @@
import type {
CourseAnalytics,
Paper,
Question,
QuestionVariant,
SimilarQuestion,
UploadResponse,
UserAttempt,
} from "@/types/api";
import { supabase } from "@/lib/supabase";
const API_BASE = "/api";
async function authHeaders(): Promise<Record<string, string>> {
const { data } = await supabase.auth.getSession();
const token = data.session?.access_token;
if (!token) return {};
return { Authorization: `Bearer ${token}` };
}
export async function uploadPaper(formData: FormData): Promise<UploadResponse> {
const headers = await authHeaders();
const res = await fetch(`${API_BASE}/papers/upload`, {
method: "POST",
headers,
body: formData,
});
if (!res.ok) throw new Error(`Upload failed: ${res.status}`);
return res.json();
}
export async function getPaper(paperId: string): Promise<Paper> {
const res = await fetch(`${API_BASE}/papers/${paperId}`);
if (!res.ok) throw new Error(`Paper not found: ${res.status}`);
return res.json();
}
export async function getQuestions(paperId: string): Promise<Question[]> {
const res = await fetch(`${API_BASE}/papers/${paperId}/questions`);
if (!res.ok) throw new Error(`Questions fetch failed: ${res.status}`);
return res.json();
}
export async function myPapers(): Promise<Paper[]> {
const headers = await authHeaders();
const res = await fetch(`${API_BASE}/papers/mine`, { headers });
if (!res.ok) throw new Error(`My papers fetch failed: ${res.status}`);
return res.json();
}
export async function listPapers(): Promise<Paper[]> {
const res = await fetch(`${API_BASE}/papers/`);
if (!res.ok) throw new Error(`List papers failed: ${res.status}`);
return res.json();
}
export async function recordAttempt(
questionId: string,
attemptType: string,
userAnswer: string | null,
isCorrect: boolean | null,
): Promise<UserAttempt> {
const headers = await authHeaders();
const res = await fetch(`${API_BASE}/attempts/`, {
method: "POST",
headers: { "Content-Type": "application/json", ...headers },
body: JSON.stringify({
question_id: questionId,
attempt_type: attemptType,
user_answer: userAnswer,
is_correct: isCorrect,
}),
});
if (!res.ok) throw new Error(`Attempt save failed: ${res.status}`);
return res.json();
}
export async function uploadPhoto(
questionId: string,
photo: File,
): Promise<{ attempt: UserAttempt; ocr_text: string; grade: { is_correct: boolean; score_given?: number; feedback: string; error_at_step: number | null } }> {
const headers = await authHeaders();
const fd = new FormData();
fd.append("question_id", questionId);
fd.append("photo", photo);
const res = await fetch(`${API_BASE}/attempts/photo`, {
method: "POST",
headers,
body: fd,
});
if (!res.ok) throw new Error(`Photo upload failed: ${res.status}`);
return res.json();
}
export async function getPaperAttempts(paperId: string): Promise<{
question_id: string;
is_correct: boolean;
feedback: string | null;
photo_ocr_text: string | null;
}[]> {
const headers = await authHeaders();
const res = await fetch(`${API_BASE}/attempts/by-paper/${paperId}`, { headers });
if (!res.ok) return [];
return res.json();
}
export async function generateVariant(questionId: string): Promise<QuestionVariant> {
const headers = await authHeaders();
const res = await fetch(`${API_BASE}/questions/${questionId}/variant`, {
method: "POST",
headers,
});
if (!res.ok) throw new Error(`Variant generation failed: ${res.status}`);
return res.json();
}
export async function getVariants(questionId: string): Promise<QuestionVariant[]> {
const headers = await authHeaders();
const res = await fetch(`${API_BASE}/questions/${questionId}/variants`, { headers });
if (!res.ok) throw new Error(`Variants fetch failed: ${res.status}`);
return res.json();
}
export async function updateVariant(variantId: string, data: { favorited?: boolean }): Promise<QuestionVariant> {
const headers = await authHeaders();
const res = await fetch(`${API_BASE}/questions/variant/${variantId}`, {
method: "PATCH",
headers: { "Content-Type": "application/json", ...headers },
body: JSON.stringify(data),
});
if (!res.ok) throw new Error(`Variant update failed: ${res.status}`);
return res.json();
}
export async function deleteVariant(variantId: string): Promise<void> {
const headers = await authHeaders();
await fetch(`${API_BASE}/questions/variant/${variantId}`, { method: "DELETE", headers });
}
export async function getFavoriteVariants(): Promise<QuestionVariant[]> {
const headers = await authHeaders();
const res = await fetch(`${API_BASE}/questions/variants/favorited`, { headers });
if (!res.ok) throw new Error(`Favorited variants fetch failed: ${res.status}`);
return res.json();
}
export async function getErrorBook(courseCode?: string): Promise<UserAttempt[]> {
const headers = await authHeaders();
const params = new URLSearchParams();
if (courseCode) params.set("course_code", courseCode);
const query = params.toString() ? `?${params.toString()}` : "";
const res = await fetch(`${API_BASE}/attempts/error-book${query}`, { headers });
if (!res.ok) throw new Error(`Error book fetch failed: ${res.status}`);
return res.json();
}
export async function updateAttempt(
attemptId: string,
data: { in_error_book?: boolean; mastered?: boolean },
): Promise<UserAttempt> {
const headers = await authHeaders();
const res = await fetch(`${API_BASE}/attempts/${attemptId}`, {
method: "PATCH",
headers: { "Content-Type": "application/json", ...headers },
body: JSON.stringify(data),
});
if (!res.ok) throw new Error(`Attempt update failed: ${res.status}`);
return res.json();
}
export async function listCourses(): Promise<string[]> {
const res = await fetch(`${API_BASE}/analytics/courses`);
if (!res.ok) throw new Error(`Courses fetch failed: ${res.status}`);
return res.json();
}
export async function getCourseAnalytics(courseCode: string): Promise<CourseAnalytics> {
const res = await fetch(`${API_BASE}/analytics/course/${courseCode}`);
if (!res.ok) throw new Error(`Analytics fetch failed: ${res.status}`);
return res.json();
}
export async function getSimilarQuestions(
questionId: string,
limit = 6,
): Promise<SimilarQuestion[]> {
const res = await fetch(`${API_BASE}/questions/${questionId}/similar?limit=${limit}`);
if (!res.ok) throw new Error(`Similar question fetch failed: ${res.status}`);
return res.json();
}

View File

@@ -0,0 +1,45 @@
import type { Question } from "@/types/api";
export interface QuestionGroup {
key: string;
label: string;
questions: Question[];
startPage: number;
}
function topLevelKey(questionNumber: string): string {
const match = questionNumber.match(/^\d+/);
return match?.[0] ?? questionNumber;
}
export function groupQuestions(questions: Question[]): QuestionGroup[] {
const groups = new Map<string, QuestionGroup>();
for (const question of questions) {
const key = topLevelKey(question.question_number);
const existing = groups.get(key);
if (existing) {
existing.questions.push(question);
existing.startPage = Math.min(existing.startPage, question.page_number ?? existing.startPage);
continue;
}
groups.set(key, {
key,
label: `Q${key}`,
questions: [question],
startPage: question.page_number ?? 1,
});
}
return Array.from(groups.values()).sort((a, b) => Number(a.key) - Number(b.key));
}
export function subquestionLabel(question: Question): string {
const remainder = question.question_number.replace(/^\d+/, "");
if (!remainder) return "Main";
return remainder
.replace(/^_+/, "")
.split("_")
.filter(Boolean)
.join(".");
}

View File

@@ -0,0 +1,6 @@
import { createClient } from "@supabase/supabase-js";
const supabaseUrl = import.meta.env.VITE_SUPABASE_URL as string;
const supabaseAnonKey = import.meta.env.VITE_SUPABASE_ANON_KEY as string;
export const supabase = createClient(supabaseUrl, supabaseAnonKey);

16
frontend/src/main.tsx Normal file
View File

@@ -0,0 +1,16 @@
import { StrictMode } from "react";
import { createRoot } from "react-dom/client";
import { BrowserRouter } from "react-router-dom";
import App from "./App";
import { AuthProvider } from "./contexts/AuthContext";
import "./styles/globals.css";
createRoot(document.getElementById("root")!).render(
<StrictMode>
<BrowserRouter>
<AuthProvider>
<App />
</AuthProvider>
</BrowserRouter>
</StrictMode>,
);

View File

@@ -0,0 +1,521 @@
import { useEffect, useMemo, useState } from "react";
import { Link, useNavigate, useParams } from "react-router-dom";
import Header from "@/components/layout/Header";
import { getCourseAnalytics, listCourses } from "@/lib/api";
import type { CourseAnalytics, AnalyticsTopicQuestion } from "@/types/api";
const typeLabel: Record<string, string> = {
mc: "Multiple Choice",
true_false: "True / False",
fill_blank: "Fill in Blank",
long_question: "Long Question",
short_answer: "Short Answer",
coding: "Coding",
};
const TYPE_COLORS: Record<string, string> = {
mc: "bg-violet-50 text-violet-700 border-violet-200",
true_false: "bg-amber-50 text-amber-700 border-amber-200",
fill_blank: "bg-teal-50 text-teal-700 border-teal-200",
long_question: "bg-sky-50 text-sky-700 border-sky-200",
short_answer: "bg-rose-50 text-rose-700 border-rose-200",
coding: "bg-emerald-50 text-emerald-700 border-emerald-200",
};
const DIFF_COLORS: Record<string, string> = {
hard: "text-red-600 bg-red-50 border-red-200",
medium: "text-amber-600 bg-amber-50 border-amber-200",
easy: "text-green-600 bg-green-50 border-green-200",
};
type QItem = AnalyticsTopicQuestion;
type Analytics = CourseAnalytics;
const PAGE_SIZE = 8;
export default function AnalyticsPage() {
const { courseCode } = useParams<{ courseCode?: string }>();
const navigate = useNavigate();
const [courses, setCourses] = useState<string[]>([]);
const [search, setSearch] = useState("");
useEffect(() => { listCourses().then(setCourses).catch(() => {}); }, []);
const filtered = useMemo(() => {
const q = search.trim().toUpperCase();
return q ? courses.filter((c) => c.includes(q)) : courses;
}, [courses, search]);
const normalizedCourse = courseCode?.toUpperCase();
const [analytics, setAnalytics] = useState<Analytics | null>(null);
const [loading, setLoading] = useState(false);
const [error, setError] = useState<string | null>(null);
useEffect(() => {
if (!normalizedCourse) return;
let cancelled = false;
setLoading(true);
setAnalytics(null);
setError(null);
getCourseAnalytics(normalizedCourse)
.then((data) => { if (!cancelled) { setAnalytics(data); setLoading(false); } })
.catch((err) => { if (!cancelled) { setError(err instanceof Error ? err.message : "Failed"); setLoading(false); } });
return () => { cancelled = true; };
}, [normalizedCourse]);
// ── Course picker ──
if (!normalizedCourse) {
return (
<div className="min-h-screen bg-gray-50">
<Header />
<main className="max-w-2xl mx-auto px-6 py-12">
<h1 className="text-2xl font-bold text-gray-900 mb-1">Analytics</h1>
<p className="text-sm text-gray-500 mb-6">Select a course to view statistics.</p>
<input
type="text"
placeholder="Search course code..."
value={search}
onChange={(e) => setSearch(e.target.value)}
className="w-full px-4 py-2.5 border border-gray-300 rounded-xl text-sm focus:outline-none focus:ring-2 focus:ring-blue-500 mb-4"
/>
{filtered.length === 0 ? (
<p className="text-sm text-gray-400">No courses found.</p>
) : (
<div className="grid grid-cols-2 gap-3">
{filtered.map((code) => (
<button key={code} onClick={() => navigate(`/analytics/${code}`)}
className="text-left px-4 py-3 bg-white border border-gray-200 rounded-xl hover:border-blue-400 hover:bg-blue-50 transition-colors">
<span className="font-semibold text-gray-900">{code}</span>
</button>
))}
</div>
)}
</main>
</div>
);
}
// ── Dashboard ──
return (
<div className="min-h-screen bg-gray-50">
<Header />
<main className="max-w-7xl mx-auto px-6 py-8">
<div className="mb-6 flex items-center gap-3">
<button onClick={() => navigate("/analytics")} className="text-sm text-gray-400 hover:text-gray-600"> All courses</button>
<span className="text-gray-300">/</span>
<h1 className="text-2xl font-bold text-gray-900">{normalizedCourse}</h1>
</div>
{loading && <div className="text-sm text-gray-400">Loading analytics...</div>}
{error && <div className="text-sm text-red-600">{error}</div>}
{!loading && !error && analytics && (
<>
{/* KPI row */}
<section className="grid grid-cols-4 gap-4 mb-6">
<KpiCard label="Papers" value={analytics.kpi.papers} />
<KpiCard label="Questions" value={analytics.kpi.questions} />
<KpiCard label="Topics" value={analytics.kpi.topics} />
<KpiCard label="Avg Difficulty" value={analytics.kpi.difficulty} />
</section>
{/* Main area: left = search, right = charts */}
<section className="grid grid-cols-[5fr_2fr] gap-6">
{/* Left: Global search */}
<GlobalSearch questions={analytics.all_questions} topics={analytics.topic_frequency.map((t) => t.label)} />
{/* Right: Interactive charts + stats */}
<div className="space-y-5">
<InteractiveChart
topicData={analytics.topic_frequency.slice(0, 8).map((t) => ({ label: t.label, value: t.count }))}
typeData={analytics.question_types.map((t) => ({ label: typeLabel[t.label] ?? t.label, value: t.count }))}
diffData={[
{ label: "Easy", value: analytics.difficulty_distribution.easy },
{ label: "Medium", value: analytics.difficulty_distribution.medium },
{ label: "Hard", value: analytics.difficulty_distribution.hard },
].filter((d) => d.value > 0)}
/>
<Panel title="High Yield Topics">
{analytics.high_yield_topics.length === 0 ? (
<div className="text-sm text-gray-400">No data yet.</div>
) : (
<ul className="space-y-2">
{analytics.high_yield_topics.map((t, i) => (
<li key={t} className="flex items-center gap-3 text-sm text-gray-700">
<span className="w-6 h-6 rounded-full bg-red-50 text-red-600 flex items-center justify-center text-xs font-semibold">{i + 1}</span>
<span>{t}</span>
</li>
))}
</ul>
)}
</Panel>
</div>
</section>
</>
)}
</main>
</div>
);
}
// ── Global Search Engine ──
function GlobalSearch({ questions, topics }: { questions: QItem[]; topics: string[] }) {
const [search, setSearch] = useState("");
const [topicFilter, setTopicFilter] = useState<string | null>(null);
const [typeFilter, setTypeFilter] = useState<string | null>(null);
const [yearFilter, setYearFilter] = useState<number | null>(null);
const [termFilter, setTermFilter] = useState<string | null>(null);
const [diffFilter, setDiffFilter] = useState<string | null>(null);
const [visibleCount, setVisibleCount] = useState(PAGE_SIZE);
const types = useMemo(() => [...new Set(questions.map((q) => q.question_type))].sort(), [questions]);
const years = useMemo(() => [...new Set(questions.map((q) => q.year).filter(Boolean))].sort((a, b) => (b ?? 0) - (a ?? 0)) as number[], [questions]);
const terms = useMemo(() => {
const order = ["spring", "summer", "fall", "winter"];
return [...new Set(questions.map((q) => q.term).filter(Boolean))].sort((a, b) => order.indexOf(a!) - order.indexOf(b!)) as string[];
}, [questions]);
const diffs = useMemo(() => [...new Set(questions.map((q) => q.difficulty).filter(Boolean))] as string[], [questions]);
const filtered = useMemo(() => {
const q = search.toLowerCase();
return questions.filter((item) => {
if (topicFilter && !item.topics?.includes(topicFilter)) return false;
if (typeFilter && item.question_type !== typeFilter) return false;
if (yearFilter && item.year !== yearFilter) return false;
if (termFilter && item.term !== termFilter) return false;
if (diffFilter && item.difficulty !== diffFilter) return false;
if (q && !item.preview.toLowerCase().includes(q) && !item.source.toLowerCase().includes(q) && !item.question_number.toLowerCase().includes(q) && !item.topics?.some((t) => t.toLowerCase().includes(q))) return false;
return true;
});
}, [questions, search, topicFilter, typeFilter, yearFilter, termFilter, diffFilter]);
const activeCount = [topicFilter, typeFilter, yearFilter, termFilter, diffFilter].filter(Boolean).length;
useEffect(() => setVisibleCount(PAGE_SIZE), [search, topicFilter, typeFilter, yearFilter, termFilter, diffFilter]);
const visible = filtered.slice(0, visibleCount);
const hasMore = visibleCount < filtered.length;
return (
<div className="bg-white border border-gray-200 rounded-2xl p-6">
<h2 className="text-sm font-semibold text-gray-500 uppercase tracking-wide mb-4">Question Search</h2>
{/* Search bar */}
<div className="relative mb-3">
<input
type="text"
value={search}
onChange={(e) => setSearch(e.target.value)}
placeholder="Search questions, topics, papers..."
className="w-full pl-9 pr-3 py-2.5 text-sm border border-gray-200 rounded-xl bg-gray-50 focus:bg-white focus:outline-none focus:ring-2 focus:ring-blue-400"
/>
<span className="absolute left-3 top-1/2 -translate-y-1/2 text-gray-400">🔍</span>
</div>
{/* Filter rows */}
<div className="space-y-2 mb-3">
{/* Topic */}
<FilterRow label="Topic">
<TopicCombobox topics={topics} value={topicFilter} onChange={setTopicFilter} />
</FilterRow>
{/* Type + Year + Term + Difficulty in one row */}
<div className="flex items-center gap-3 flex-wrap">
<FilterRow label="Type">
<div className="flex gap-1 flex-wrap">
{types.map((t) => (
<Pill key={t} label={typeLabel[t] ?? t} active={typeFilter === t}
color={TYPE_COLORS[t]} onClick={() => setTypeFilter(typeFilter === t ? null : t)} />
))}
</div>
</FilterRow>
<FilterRow label="Year">
<div className="flex gap-1 flex-wrap">
{years.map((y) => (
<Pill key={y} label={String(y)} active={yearFilter === y}
onClick={() => setYearFilter(yearFilter === y ? null : y)} />
))}
</div>
</FilterRow>
<FilterRow label="Term">
<div className="flex gap-1 flex-wrap">
{terms.map((t) => (
<Pill key={t} label={t.charAt(0).toUpperCase() + t.slice(1)} active={termFilter === t}
onClick={() => setTermFilter(termFilter === t ? null : t)} />
))}
</div>
</FilterRow>
<FilterRow label="Diff">
<div className="flex gap-1">
{(["easy", "medium", "hard"] as const).filter((d) => diffs.includes(d)).map((d) => (
<Pill key={d} label={d.charAt(0).toUpperCase() + d.slice(1)} active={diffFilter === d}
color={DIFF_COLORS[d]} onClick={() => setDiffFilter(diffFilter === d ? null : d)} />
))}
</div>
</FilterRow>
</div>
</div>
{/* Results count + clear */}
<div className="flex items-center justify-between mb-3 pb-3 border-b border-gray-100">
<span className="text-xs text-gray-400">
{filtered.length} question{filtered.length !== 1 ? "s" : ""}
{activeCount > 0 || search ? " matched" : ""}
</span>
{(activeCount > 0 || search) && (
<button onClick={() => { setTopicFilter(null); setTypeFilter(null); setYearFilter(null); setTermFilter(null); setDiffFilter(null); setSearch(""); }}
className="text-xs text-blue-500 hover:text-blue-700">Clear all</button>
)}
</div>
{/* Results */}
<div className="space-y-2">
{visible.map((q, i) => (
<QuestionCard key={`${q.paper_id}-${q.question_number}-${i}`} question={q} />
))}
</div>
{hasMore && (
<button onClick={() => setVisibleCount((v) => v + PAGE_SIZE)}
className="w-full mt-3 py-2 text-xs text-blue-600 hover:text-blue-700 bg-blue-50 rounded-xl font-medium">
Show more ({filtered.length - visibleCount} remaining)
</button>
)}
{filtered.length === 0 && (
<div className="text-center py-6 text-sm text-gray-400">No questions match your search.</div>
)}
</div>
);
}
// ── Interactive Pie Chart ──
const PIE_PALETTE = [
"#3B82F6", "#8B5CF6", "#F59E0B", "#10B981", "#EF4444",
"#EC4899", "#06B6D4", "#F97316", "#6366F1", "#14B8A6",
];
function InteractiveChart({ topicData, typeData, diffData }: {
topicData: { label: string; value: number }[];
typeData: { label: string; value: number }[];
diffData: { label: string; value: number }[];
}) {
const [view, setView] = useState<"topic" | "type" | "difficulty">("topic");
const [hovered, setHovered] = useState<number | null>(null);
const data = view === "topic" ? topicData : view === "type" ? typeData : diffData;
const colors = view === "difficulty"
? ["#10B981", "#F59E0B", "#EF4444"]
: PIE_PALETTE;
const total = data.reduce((s, d) => s + d.value, 0);
// Build conic-gradient
let cumPct = 0;
const segments = data.map((d, i) => {
const pct = total ? (d.value / total) * 100 : 0;
const start = cumPct;
cumPct += pct;
return { ...d, pct, start, end: cumPct, color: colors[i % colors.length] };
});
const gradient = segments
.map((s) => `${s.color} ${s.start}% ${s.end}%`)
.join(", ");
return (
<section className="bg-white border border-gray-200 rounded-2xl p-5">
{/* Tab switcher */}
<div className="flex gap-1 mb-4">
{(["topic", "type", "difficulty"] as const).map((t) => (
<button key={t} onClick={() => { setView(t); setHovered(null); }}
className={`text-xs px-3 py-1.5 rounded-lg font-medium transition-colors ${
view === t ? "bg-gray-900 text-white" : "bg-gray-100 text-gray-500 hover:text-gray-700"
}`}>
{t === "topic" ? "Topics" : t === "type" ? "Types" : "Difficulty"}
</button>
))}
</div>
{/* Pie */}
<div className="flex items-center gap-4">
<div className="relative w-36 h-36 shrink-0">
<div
className="w-full h-full rounded-full"
style={{ background: `conic-gradient(${gradient})` }}
/>
<div className="absolute inset-3 bg-white rounded-full flex items-center justify-center">
{hovered !== null ? (
<div className="text-center">
<div className="text-lg font-bold text-gray-900">{segments[hovered].value}</div>
<div className="text-[9px] text-gray-400">{segments[hovered].pct.toFixed(0)}%</div>
</div>
) : (
<div className="text-center">
<div className="text-lg font-bold text-gray-900">{total}</div>
<div className="text-[9px] text-gray-400">total</div>
</div>
)}
</div>
</div>
{/* Legend */}
<div className="flex-1 space-y-1 max-h-36 overflow-y-auto">
{segments.map((s, i) => (
<div
key={s.label}
onMouseEnter={() => setHovered(i)}
onMouseLeave={() => setHovered(null)}
className={`flex items-center gap-2 px-2 py-1 rounded-lg cursor-default transition-colors ${
hovered === i ? "bg-gray-50" : ""
}`}
>
<span className="w-2.5 h-2.5 rounded-full shrink-0" style={{ backgroundColor: s.color }} />
<span className="text-xs text-gray-700 flex-1 truncate">{s.label}</span>
<span className="text-xs text-gray-400 tabular-nums">{s.value}</span>
</div>
))}
</div>
</div>
</section>
);
}
// ── Shared components ──
function QuestionCard({ question: q }: { question: QItem }) {
const typeColor = TYPE_COLORS[q.question_type] ?? "bg-gray-50 text-gray-600 border-gray-200";
const cleanPreview = (q.preview || "")
.replace(/^Problem\s+\d+\s*\[.*?\]\s*/i, "")
.replace(/^(True\/False Questions?\s*)?Indicate whether.*?(answer\.\s*)/i, "")
.trim();
return (
<Link to={`/paper/${q.paper_id}`}
className="flex items-start gap-3 bg-gray-50 border border-gray-200 rounded-xl px-3.5 py-2.5 hover:border-blue-300 hover:bg-white hover:shadow-sm transition-all group">
<span className="shrink-0 inline-flex items-center justify-center w-8 h-8 rounded-lg bg-blue-600 text-white text-xs font-bold mt-0.5">
{q.question_number}
</span>
<div className="flex-1 min-w-0">
<div className="flex items-center gap-1.5 mb-1 flex-wrap">
<span className="text-xs font-medium text-blue-600">{q.source}</span>
<span className="text-gray-300">·</span>
<span className={`text-[10px] px-1.5 py-0.5 rounded border font-medium ${typeColor}`}>
{typeLabel[q.question_type] ?? q.question_type}
</span>
{q.difficulty && (
<>
<span className="text-gray-300">·</span>
<span className={`text-[10px] px-1.5 py-0.5 rounded border font-medium ${DIFF_COLORS[q.difficulty] ?? ""}`}>
{q.difficulty}
</span>
</>
)}
{q.topics?.slice(0, 2).map((t) => (
<span key={t} className="text-[10px] px-1.5 py-0.5 rounded bg-gray-100 text-gray-500 border border-gray-200">{t}</span>
))}
</div>
<p className="text-xs text-gray-600 line-clamp-2 leading-relaxed">{cleanPreview || q.preview}</p>
</div>
<span className="shrink-0 text-gray-300 group-hover:text-blue-500 text-sm pt-1"></span>
</Link>
);
}
function FilterRow({ label, children }: { label: string; children: React.ReactNode }) {
return (
<div className="flex items-center gap-1.5">
<span className="text-[10px] text-gray-400 w-10 shrink-0">{label}</span>
{children}
</div>
);
}
function Pill({ label, active, color, onClick }: { label: string; active: boolean; color?: string; onClick: () => void }) {
return (
<button onClick={onClick}
className={`text-[10px] px-2 py-1 rounded-full border font-medium transition-colors whitespace-nowrap ${
active ? (color ?? "bg-blue-50 text-blue-700 border-blue-200") : "bg-white text-gray-400 border-gray-200 hover:text-gray-600"
}`}>
{label}
</button>
);
}
function KpiCard({ label, value }: { label: string; value: string | number }) {
return (
<div className="bg-white border border-gray-200 rounded-2xl p-5">
<div className="text-2xl font-semibold text-gray-900">{value}</div>
<div className="text-xs uppercase tracking-wide text-gray-400 mt-2">{label}</div>
</div>
);
}
function Panel({ title, children }: { title: string; children: React.ReactNode }) {
return (
<section className="bg-white border border-gray-200 rounded-2xl p-5">
<h2 className="text-sm font-semibold text-gray-500 uppercase tracking-wide mb-4">{title}</h2>
{children}
</section>
);
}
function TopicCombobox({ topics, value, onChange }: { topics: string[]; value: string | null; onChange: (v: string | null) => void }) {
const [input, setInput] = useState("");
const [open, setOpen] = useState(false);
const filtered = useMemo(() => {
const q = input.toLowerCase();
return q ? topics.filter((t) => t.toLowerCase().includes(q)) : topics;
}, [topics, input]);
const handleSelect = (t: string | null) => {
onChange(t);
setInput(t ?? "");
setOpen(false);
};
return (
<div className="relative">
<div className="flex items-center gap-1">
<input
type="text"
value={value ? (input || value) : input}
onChange={(e) => { setInput(e.target.value); setOpen(true); if (!e.target.value) onChange(null); }}
onFocus={() => setOpen(true)}
placeholder="All Topics"
className="text-xs border border-gray-200 rounded-lg px-2 py-1.5 bg-white focus:outline-none focus:ring-1 focus:ring-blue-400 w-48"
/>
{value && (
<button onClick={() => { onChange(null); setInput(""); }} className="text-gray-400 hover:text-gray-600 text-xs"></button>
)}
</div>
{open && filtered.length > 0 && (
<div className="absolute z-20 top-full mt-1 w-56 max-h-48 overflow-y-auto bg-white border border-gray-200 rounded-lg shadow-lg">
{filtered.map((t) => (
<button
key={t}
onClick={() => handleSelect(t)}
className={`w-full text-left px-3 py-1.5 text-xs hover:bg-blue-50 transition-colors ${value === t ? "bg-blue-50 text-blue-700 font-medium" : "text-gray-700"}`}
>
{t}
</button>
))}
</div>
)}
{open && <div className="fixed inset-0 z-10" onClick={() => setOpen(false)} />}
</div>
);
}
function DiffStat({ label, value }: { label: string; value: number }) {
return (
<div className="bg-gray-50 rounded-xl px-3 py-4">
<div className="text-xl font-semibold text-gray-900">{value}</div>
<div className="text-xs uppercase tracking-wide text-gray-400 mt-1">{label}</div>
</div>
);
}

View File

@@ -0,0 +1,296 @@
import { useEffect, useMemo, useState } from "react";
import { Link } from "react-router-dom";
import Header from "@/components/layout/Header";
import KaTeXRenderer from "@/components/shared/KaTeXRenderer";
import { getErrorBook, updateAttempt, getFavoriteVariants, updateVariant } from "@/lib/api";
import { useAuth } from "@/contexts/AuthContext";
import type { UserAttempt, QuestionVariant } from "@/types/api";
const typeLabel: Record<string, string> = {
mc: "Multiple Choice",
true_false: "True / False",
fill_blank: "Fill in Blank",
long_question: "Long Question",
short_answer: "Short Answer",
coding: "Coding",
};
const TYPE_COLORS: Record<string, string> = {
mc: "bg-violet-50 text-violet-700",
true_false: "bg-amber-50 text-amber-700",
fill_blank: "bg-teal-50 text-teal-700",
long_question: "bg-sky-50 text-sky-700",
short_answer: "bg-rose-50 text-rose-700",
coding: "bg-emerald-50 text-emerald-700",
};
const DIFF_COLORS: Record<string, string> = {
easy: "text-green-600",
medium: "text-amber-600",
hard: "text-red-600",
};
export default function ErrorBookPage() {
const { user } = useAuth();
const [entries, setEntries] = useState<UserAttempt[]>([]);
const [favoriteVariants, setFavoriteVariants] = useState<QuestionVariant[]>([]);
const [loading, setLoading] = useState(true);
const [error, setError] = useState<string | null>(null);
const [courseFilter, setCourseFilter] = useState<string>("all");
useEffect(() => {
if (!user) { setLoading(false); return; }
let cancelled = false;
setLoading(true);
Promise.all([getErrorBook(), getFavoriteVariants()])
.then(([attempts, variants]) => {
if (cancelled) return;
setEntries(attempts);
setFavoriteVariants(variants);
setLoading(false);
})
.catch((err) => {
if (cancelled) return;
setError(err instanceof Error ? err.message : "Failed to load error book");
setLoading(false);
});
return () => { cancelled = true; };
}, [user]);
const courses = useMemo(
() => Array.from(new Set(
entries.map((e) => e.paper_questions?.paper?.course_code).filter((v): v is string => Boolean(v)),
)).sort(),
[entries],
);
const filteredEntries = useMemo(() => {
if (courseFilter === "all") return entries;
return entries.filter((e) => e.paper_questions?.paper?.course_code === courseFilter);
}, [courseFilter, entries]);
async function handleMarkMastered(attemptId: string) {
await updateAttempt(attemptId, { mastered: true });
setEntries((prev) => prev.filter((e) => e.id !== attemptId));
}
async function handleRemove(attemptId: string) {
await updateAttempt(attemptId, { in_error_book: false });
setEntries((prev) => prev.filter((e) => e.id !== attemptId));
}
async function handleUnfavoriteVariant(variantId: string) {
await updateVariant(variantId, { favorited: false });
setFavoriteVariants((prev) => prev.filter((v) => v.id !== variantId));
}
return (
<div className="min-h-screen bg-gray-50">
<Header />
<main className="max-w-4xl mx-auto px-6 py-8">
{/* Header */}
<div className="flex items-end justify-between gap-4 mb-6">
<div>
<h1 className="text-2xl font-bold text-gray-900">Error Book</h1>
<p className="text-sm text-gray-500 mt-1">Review your mistakes and track progress.</p>
</div>
<div className="flex gap-3 text-sm">
<StatCard label="To Review" value={filteredEntries.length} color="red" />
<StatCard label="Courses" value={courses.length} color="blue" />
</div>
</div>
{/* Course filter */}
<div className="flex gap-2 mb-6 flex-wrap">
<Pill active={courseFilter === "all"} onClick={() => setCourseFilter("all")} label="All" />
{courses.map((c) => (
<Pill key={c} active={courseFilter === c} onClick={() => setCourseFilter(c)} label={c} />
))}
</div>
{!user && (
<div className="bg-white border border-gray-200 rounded-xl p-12 text-center">
<div className="text-3xl mb-3">🔒</div>
<p className="text-gray-500 mb-4">Sign in to unlock your Error Book</p>
<Link to="/login" className="inline-block px-5 py-2 bg-indigo-600 text-white text-sm font-medium rounded-lg hover:bg-indigo-700 transition-colors">
Sign in
</Link>
</div>
)}
{user && loading && <div className="text-sm text-gray-400">Loading...</div>}
{user && error && <div className="text-sm text-red-600">{error}</div>}
{user && !loading && !error && filteredEntries.length === 0 && favoriteVariants.length === 0 && (
<div className="bg-white border border-gray-200 rounded-xl p-12 text-center">
<div className="text-3xl mb-3">🎉</div>
<p className="text-gray-500">No mistakes yet. Keep practicing!</p>
</div>
)}
{/* Saved variants */}
{favoriteVariants.length > 0 && (
<div className="mb-8">
<h2 className="text-xs font-semibold text-gray-400 uppercase tracking-wide mb-3">
Saved Variants ({favoriteVariants.length})
</h2>
<div className="space-y-2">
{favoriteVariants.map((v) => (
<div key={v.id} className="flex items-center gap-3 bg-white border border-yellow-200 rounded-xl px-4 py-3">
<span className="text-yellow-400"></span>
<div className="flex-1 min-w-0">
<span className="text-sm font-medium text-gray-700">Variant of Q{v.source_question_number}</span>
<p className="text-xs text-gray-500 truncate">{v.variant_data.question_text?.replace(/<[^>]*>/g, "").slice(0, 100)}</p>
</div>
<button onClick={() => void handleUnfavoriteVariant(v.id)} className="text-xs text-gray-400 hover:text-red-500">Remove</button>
</div>
))}
</div>
</div>
)}
{/* Error entries */}
<div className="space-y-4">
{filteredEntries.map((entry) => (
<ErrorCard
key={entry.id}
entry={entry}
onMastered={() => void handleMarkMastered(entry.id)}
onRemove={() => void handleRemove(entry.id)}
/>
))}
</div>
</main>
</div>
);
}
function ErrorCard({ entry, onMastered, onRemove }: { entry: UserAttempt; onMastered: () => void; onRemove: () => void }) {
const [showFeedback, setShowFeedback] = useState(true);
const question = entry.paper_questions;
if (!question) return null;
const courseCode = question.paper?.course_code;
const paperId = question.paper?.id;
const paper = question.paper;
const paperInfo = paper ? `${paper.year} ${paper.term} ${paper.exam_type}` : "";
const typeColor = TYPE_COLORS[question.question_type] ?? "bg-gray-100 text-gray-600";
const diffColor = DIFF_COLORS[question.difficulty ?? ""] ?? "";
// Clean preview: strip boilerplate
const preview = (question.question_text || "")
.replace(/^Problem\s+\d+\s*\[.*?\]\s*/i, "")
.slice(0, 200);
return (
<article className="bg-white border border-gray-200 rounded-xl overflow-hidden">
{/* Header */}
<div className="px-5 pt-4 pb-3">
<div className="flex items-start justify-between gap-3">
<div className="flex items-center gap-2 flex-wrap">
<span className="inline-flex items-center justify-center w-9 h-9 rounded-lg bg-red-600 text-white text-sm font-bold">
{question.question_number}
</span>
<div>
<div className="flex items-center gap-1.5">
<span className={`text-[11px] px-2 py-0.5 rounded-full font-medium ${typeColor}`}>
{typeLabel[question.question_type] ?? question.question_type}
</span>
{question.difficulty && (
<span className={`text-[11px] font-medium ${diffColor}`}>{question.difficulty}</span>
)}
{courseCode && (
<Link to={`/analytics/${courseCode}`} className="text-[11px] px-2 py-0.5 rounded-full bg-blue-50 text-blue-700 hover:bg-blue-100">
{courseCode}
</Link>
)}
</div>
<div className="text-[11px] text-gray-400 mt-0.5">
{paperId ? <Link to={`/paper/${paperId}`} className="hover:text-blue-600">{paperInfo}</Link> : paperInfo}
{" · "}
{new Date(entry.created_at).toLocaleDateString("en-CA")}
</div>
</div>
</div>
{/* Score badge */}
{entry.feedback && (
<div className="flex items-center gap-1 bg-red-50 border border-red-200 rounded-lg px-2.5 py-1">
<span className="text-red-600 text-sm font-bold"></span>
<span className="text-xs text-red-600 font-medium">Incorrect</span>
</div>
)}
</div>
{/* Question preview */}
<p className="text-sm text-gray-600 mt-3 line-clamp-2">{preview}</p>
{/* Topics */}
{question.topics && question.topics.length > 0 && (
<div className="flex gap-1 mt-2 flex-wrap">
{question.topics.slice(0, 4).map((t) => (
<span key={t} className="text-[10px] px-1.5 py-0.5 rounded bg-gray-100 text-gray-500">{t}</span>
))}
</div>
)}
</div>
{/* AI Feedback section */}
{entry.feedback && (
<div className="border-t border-gray-100">
<button
onClick={() => setShowFeedback((v) => !v)}
className="w-full flex items-center justify-between px-5 py-2.5 text-xs font-medium text-blue-700 bg-blue-50/50 hover:bg-blue-50"
>
<span>AI Feedback</span>
<span>{showFeedback ? "▲" : "▼"}</span>
</button>
{showFeedback && (
<div className="px-5 py-4 bg-white">
<KaTeXRenderer html={entry.feedback} className="text-sm text-gray-700 leading-relaxed" />
</div>
)}
</div>
)}
{/* Actions */}
<div className="border-t border-gray-100 px-5 py-2.5 flex items-center gap-4 bg-gray-50/50">
{paperId && (
<Link to={`/paper/${paperId}`} className="text-xs font-medium text-blue-600 hover:text-blue-700">
Open paper
</Link>
)}
<button onClick={onMastered} className="text-xs font-medium text-green-600 hover:text-green-700">
Mark mastered
</button>
<button onClick={onRemove} className="text-xs font-medium text-gray-400 hover:text-gray-600">
Remove
</button>
</div>
</article>
);
}
function StatCard({ label, value, color }: { label: string; value: number; color: string }) {
const bg = color === "red" ? "bg-red-50 border-red-200" : "bg-blue-50 border-blue-200";
const text = color === "red" ? "text-red-700" : "text-blue-700";
return (
<div className={`border rounded-xl px-4 py-2.5 ${bg}`}>
<div className={`text-xl font-bold ${text}`}>{value}</div>
<div className="text-[10px] uppercase tracking-wide text-gray-400 mt-0.5">{label}</div>
</div>
);
}
function Pill({ active, onClick, label }: { active: boolean; onClick: () => void; label: string }) {
return (
<button
onClick={onClick}
className={`px-3 py-1.5 text-xs font-medium rounded-full border transition-colors ${
active ? "bg-gray-900 text-white border-gray-900" : "bg-white text-gray-600 border-gray-200 hover:border-gray-300"
}`}
>
{label}
</button>
);
}

View File

@@ -0,0 +1,705 @@
import { useEffect, useRef, useState } from "react";
import { Link, useNavigate } from "react-router-dom";
import { listPapers, myPapers } from "@/lib/api";
import { useAuth } from "@/contexts/AuthContext";
import type { Paper } from "@/types/api";
function getWorkedIds(userId: string): string[] {
try {
const raw = localStorage.getItem(`worked_papers_${userId}`);
return raw ? JSON.parse(raw) : [];
} catch { return []; }
}
const fontSora = { fontFamily: "'Sora', sans-serif" };
const fontMono = { fontFamily: "'IBM Plex Mono', monospace" };
/* ── Feature cards data ── */
const FEATURES = [
{
icon: (
<svg className="w-6 h-6" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={1.5}>
<path strokeLinecap="round" strokeLinejoin="round" d="M9.813 15.904L9 18.75l-.813-2.846a4.5 4.5 0 00-3.09-3.09L2.25 12l2.846-.813a4.5 4.5 0 003.09-3.09L9 5.25l.813 2.846a4.5 4.5 0 003.09 3.09L15.75 12l-2.846.813a4.5 4.5 0 00-3.09 3.09zM18.259 8.715L18 9.75l-.259-1.035a3.375 3.375 0 00-2.455-2.456L14.25 6l1.036-.259a3.375 3.375 0 002.455-2.456L18 2.25l.259 1.035a3.375 3.375 0 002.455 2.456L21.75 6l-1.036.259a3.375 3.375 0 00-2.455 2.456z" />
</svg>
),
title: "AI Analysis",
desc: "Every question gets knowledge reminders, hints, and step-by-step solutions.",
color: "#6366F1",
},
{
icon: (
<svg className="w-6 h-6" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={1.5}>
<path strokeLinecap="round" strokeLinejoin="round" d="M12 6.042A8.967 8.967 0 006 3.75c-1.052 0-2.062.18-3 .512v14.25A8.987 8.987 0 016 18c2.305 0 4.408.867 6 2.292m0-14.25a8.966 8.966 0 016-2.292c1.052 0 2.062.18 3 .512v14.25A8.987 8.987 0 0018 18a8.967 8.967 0 00-6 2.292m0-14.25v14.25" />
</svg>
),
title: "Smart Error Book",
desc: "Auto-collect mistakes with AI feedback. Review, understand, and master.",
color: "#E11D48",
},
{
icon: (
<svg className="w-6 h-6" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={1.5}>
<path strokeLinecap="round" strokeLinejoin="round" d="M3 13.125C3 12.504 3.504 12 4.125 12h2.25c.621 0 1.125.504 1.125 1.125v6.75C7.5 20.496 6.996 21 6.375 21h-2.25A1.125 1.125 0 013 19.875v-6.75zM9.75 8.625c0-.621.504-1.125 1.125-1.125h2.25c.621 0 1.125.504 1.125 1.125v11.25c0 .621-.504 1.125-1.125 1.125h-2.25a1.125 1.125 0 01-1.125-1.125V8.625zM16.5 4.125c0-.621.504-1.125 1.125-1.125h2.25C20.496 3 21 3.504 21 4.125v15.75c0 .621-.504 1.125-1.125 1.125h-2.25a1.125 1.125 0 01-1.125-1.125V4.125z" />
</svg>
),
title: "Course Analytics",
desc: "Topic frequency, difficulty distribution, and high-yield focus areas.",
color: "#0D9488",
},
{
icon: (
<svg className="w-6 h-6" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={1.5}>
<path strokeLinecap="round" strokeLinejoin="round" d="M19.5 12c0-1.232-.046-2.453-.138-3.662a4.006 4.006 0 00-3.7-3.7 48.678 48.678 0 00-7.324 0 4.006 4.006 0 00-3.7 3.7c-.017.22-.032.441-.046.662M19.5 12l3-3m-3 3l-3-3m-12 3c0 1.232.046 2.453.138 3.662a4.006 4.006 0 003.7 3.7 48.656 48.656 0 007.324 0 4.006 4.006 0 003.7-3.7c.017-.22.032-.441.046-.662M4.5 12l3 3m-3-3l-3 3" />
</svg>
),
title: "Variant Generation",
desc: "Generate unlimited similar questions for extra practice on weak topics.",
color: "#7C3AED",
},
];
/* ── Filter options ── */
const COURSE_OPTIONS = ["COMP2011", "COMP2211", "MATH1014", "PHYS1112", "MATH2023", "ELEC2100"];
const TERM_OPTIONS = ["spring", "fall"];
const TYPE_OPTIONS = ["midterm", "final"];
/* ── Chevron SVG ── */
function ChevronDown({ className = "" }: { className?: string }) {
return (
<svg className={className} fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2.5}>
<path strokeLinecap="round" strokeLinejoin="round" d="M19 9l-7 7-7-7" />
</svg>
);
}
/* ── Dropdown select component ── */
function Dropdown({
label,
value,
options,
onChange,
}: {
label: string;
value: string | null;
options: { value: string; label: string }[];
onChange: (v: string | null) => void;
}) {
const [open, setOpen] = useState(false);
const ref = useRef<HTMLDivElement>(null);
useEffect(() => {
const handler = (e: MouseEvent) => {
if (ref.current && !ref.current.contains(e.target as Node)) setOpen(false);
};
document.addEventListener("mousedown", handler);
return () => document.removeEventListener("mousedown", handler);
}, []);
const selected = options.find((o) => o.value === value);
return (
<div ref={ref} className="relative" style={{ minWidth: 150 }}>
<div className="text-[11px] font-semibold text-indigo-300 uppercase tracking-wider mb-1.5" style={fontSora}>
{label}
</div>
<button
onClick={() => setOpen(!open)}
className="w-full flex items-center justify-between bg-white px-3.5 py-2.5 text-sm cursor-pointer whitespace-nowrap"
style={{ borderRadius: 0, ...fontMono }}
>
<span className={`${selected ? "text-slate-800 font-semibold" : "text-slate-400"} mr-2`}>
{selected ? selected.label : `All ${label}s`}
</span>
<ChevronDown className={`w-4 h-4 text-slate-400 transition-transform ${open ? "rotate-180" : ""}`} />
</button>
{open && (
<div
className="absolute top-full left-0 right-0 mt-1 bg-white shadow-lg z-50 overflow-hidden"
style={{ borderRadius: 0, border: "1px solid #E2E8F0" }}
>
<button
onClick={() => { onChange(null); setOpen(false); }}
className={`w-full text-left px-3.5 py-2 text-sm hover:bg-indigo-50 transition-colors ${
!value ? "text-indigo-600 font-semibold bg-indigo-50/50" : "text-slate-500"
}`}
style={fontMono}
>
All {label}s
</button>
{options.map((o) => (
<button
key={o.value}
onClick={() => { onChange(o.value); setOpen(false); }}
className={`w-full text-left px-3.5 py-2 text-sm hover:bg-indigo-50 transition-colors ${
value === o.value ? "text-indigo-600 font-semibold bg-indigo-50/50" : "text-slate-600"
}`}
style={fontMono}
>
{o.label}
</button>
))}
</div>
)}
</div>
);
}
export default function HomePage() {
const navigate = useNavigate();
const { user, signOut } = useAuth();
const [papers, setPapers] = useState<Paper[]>([]);
const [papersLoading, setPapersLoading] = useState(false);
const [myUploadedPapers, setMyUploadedPapers] = useState<Paper[]>([]);
const [workedPapers, setWorkedPapers] = useState<Paper[]>([]);
const [courseInput, setCourseInput] = useState("");
const [courseFilter, setCourseFilter] = useState<string | null>(null);
const [showSuggestions, setShowSuggestions] = useState(false);
const [termFilter, setTermFilter] = useState<string | null>(null);
const [typeFilter, setTypeFilter] = useState<string | null>(null);
const [analyzing, setAnalyzing] = useState(false);
const inputRef = useRef<HTMLDivElement>(null);
// Autocomplete suggestions
const suggestions = courseInput.trim()
? COURSE_OPTIONS.filter((c) =>
c.toLowerCase().includes(courseInput.trim().toLowerCase())
)
: [];
// Close suggestions on outside click
useEffect(() => {
const handler = (e: MouseEvent) => {
if (inputRef.current && !inputRef.current.contains(e.target as Node)) setShowSuggestions(false);
};
document.addEventListener("mousedown", handler);
return () => document.removeEventListener("mousedown", handler);
}, []);
useEffect(() => {
let cancelled = false;
setPapersLoading(true);
listPapers()
.then((data) => {
if (cancelled) return;
setPapers(
data.sort((a, b) => {
if (a.course_code !== b.course_code) return a.course_code.localeCompare(b.course_code);
if (a.year !== b.year) return b.year - a.year;
if (a.term !== b.term) return a.term.localeCompare(b.term);
return a.exam_type.localeCompare(b.exam_type);
}),
);
})
.catch(() => {
if (!cancelled) setPapers([]);
})
.finally(() => {
if (!cancelled) setPapersLoading(false);
});
return () => {
cancelled = true;
};
}, []);
// My Papers
useEffect(() => {
if (!user) return;
let cancelled = false;
myPapers().then((data) => {
if (cancelled) return;
setMyUploadedPapers(data.filter((p) => p.status !== "error"));
}).catch(() => {});
return () => { cancelled = true; };
}, [user]);
useEffect(() => {
if (!user || papers.length === 0) return;
const workedIds = new Set(getWorkedIds(user.id));
setWorkedPapers(papers.filter((p) => workedIds.has(p.id)));
}, [user, papers]);
// Filter papers
const hasFilter = courseFilter || termFilter || typeFilter;
const filteredPapers = papers.filter((p) => {
if (courseFilter && p.course_code !== courseFilter) return false;
if (termFilter && p.term !== termFilter) return false;
if (typeFilter && p.exam_type !== typeFilter) return false;
return true;
});
const selectCourse = (code: string) => {
setCourseInput(code);
setCourseFilter(code);
setShowSuggestions(false);
};
return (
<div className="min-h-screen" style={{ background: "#FAFAFA" }}>
{/* ══════ Nav ══════ */}
<nav className="bg-white border-b border-slate-200">
<div className="max-w-[1200px] mx-auto px-6 h-14 flex items-center justify-between">
<div className="flex items-center gap-2">
<div
className="w-8 h-8 flex items-center justify-center text-white text-sm font-bold"
style={{ background: "#6366F1", borderRadius: 0 }}
>
PM
</div>
<span className="text-lg font-bold text-slate-800" style={fontSora}>
PastPaper Master
</span>
</div>
<div className="flex items-center gap-5 text-sm" style={fontSora}>
<Link to="/" className="text-indigo-600 font-semibold">
Home
</Link>
<Link to="/analytics" className="text-slate-500 hover:text-slate-800 transition-colors">
Analytics
</Link>
<Link to="/error-book" className="text-slate-500 hover:text-slate-800 transition-colors">
Error Book
</Link>
<Link
to="/upload"
className="px-4 py-1.5 text-white text-xs font-semibold"
style={{ background: "#6366F1", borderRadius: 0 }}
>
Upload Paper
</Link>
{user ? (
<div className="flex items-center gap-3 pl-3 border-l border-slate-200">
<span className="text-xs text-slate-400 max-w-[140px] truncate" style={fontMono}>{user.email}</span>
<button
onClick={() => void signOut()}
className="text-xs text-slate-400 hover:text-red-500 transition-colors"
>
Sign out
</button>
</div>
) : (
<Link
to="/login"
className="text-sm text-indigo-600 font-semibold pl-3 border-l border-slate-200 hover:text-indigo-800 transition-colors"
>
Sign in
</Link>
)}
</div>
</div>
</nav>
{/* ══════ Hero + Filter ══════ */}
<section
className="relative overflow-hidden"
style={{ background: "linear-gradient(135deg, #1E1B4B 0%, #312E81 50%, #4338CA 100%)" }}
>
<div className="max-w-[1200px] mx-auto px-6 pt-16 pb-10 text-center relative z-10">
<h1
className="text-4xl font-bold text-white mb-4 leading-tight"
style={fontSora}
>
The Smartest Way to<br />
<span style={{ color: "#A5B4FC" }}>Master Past Papers</span>
</h1>
<p className="text-indigo-200 text-base mb-10 max-w-xl mx-auto" style={fontSora}>
Upload any HKUST past paper. AI breaks down every question with analysis,
hints, and solutions so you study smarter, not harder.
</p>
{/* ── Filter row: Course input + Term dropdown + Type dropdown ── */}
<div className="max-w-[680px] mx-auto">
<div className="flex gap-3 items-end">
{/* Course code input with autocomplete */}
<div ref={inputRef} className="relative flex-1">
<div className="text-[11px] font-semibold text-indigo-300 uppercase tracking-wider mb-1.5 text-left" style={fontSora}>
Course Code
</div>
<div className="flex bg-white" style={{ borderRadius: 0 }}>
<input
type="text"
value={courseInput}
onChange={(e) => {
const v = e.target.value.toUpperCase();
setCourseInput(v);
setCourseFilter(COURSE_OPTIONS.includes(v) ? v : null);
setShowSuggestions(true);
}}
onFocus={() => setShowSuggestions(true)}
placeholder="e.g. COMP2011"
className="flex-1 px-3.5 py-2.5 text-sm text-slate-800 outline-none bg-transparent font-semibold"
style={fontMono}
/>
{courseInput && (
<button
onClick={() => { setCourseInput(""); setCourseFilter(null); }}
className="px-2 text-slate-300 hover:text-slate-500 transition-colors"
>
<svg className="w-4 h-4" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
<path strokeLinecap="round" strokeLinejoin="round" d="M6 18L18 6M6 6l12 12" />
</svg>
</button>
)}
</div>
{/* Autocomplete dropdown */}
{showSuggestions && suggestions.length > 0 && !courseFilter && (
<div
className="absolute top-full left-0 right-0 mt-1 bg-white shadow-lg z-50 overflow-hidden"
style={{ borderRadius: 0, border: "1px solid #E2E8F0" }}
>
{suggestions.map((c) => (
<button
key={c}
onClick={() => selectCourse(c)}
className="w-full text-left px-3.5 py-2.5 text-sm text-slate-700 hover:bg-indigo-50 hover:text-indigo-600 transition-colors"
style={fontMono}
>
<span className="font-semibold">{c.slice(0, courseInput.length)}</span>
{c.slice(courseInput.length)}
</button>
))}
</div>
)}
</div>
{/* Term dropdown */}
<Dropdown
label="Term"
value={termFilter}
options={[
{ value: "spring", label: "Spring" },
{ value: "fall", label: "Fall" },
]}
onChange={setTermFilter}
/>
{/* Exam Type dropdown */}
<Dropdown
label="Exam Type"
value={typeFilter}
options={[
{ value: "midterm", label: "Midterm" },
{ value: "final", label: "Final" },
]}
onChange={setTypeFilter}
/>
{/* Buttons */}
<div className="flex gap-2 items-end">
<div>
<div className="mb-1.5" />
<button
className="px-6 py-2.5 text-white text-sm font-semibold shrink-0"
style={{ background: "#6366F1", borderRadius: 0, ...fontSora }}
>
Search
</button>
</div>
<div>
<div className="mb-1.5" />
<button
onClick={() => {
setAnalyzing(true);
setTimeout(() => {
if (courseFilter) navigate(`/analytics/${courseFilter}`);
else navigate("/analytics");
}, 1200);
}}
disabled={analyzing}
className="px-5 py-2.5 text-sm font-semibold shrink-0 border transition-all flex items-center gap-2"
style={{
borderRadius: 0,
background: analyzing ? "#BE123C" : courseFilter ? "#E11D48" : "transparent",
color: courseFilter || analyzing ? "#fff" : "rgba(165,180,252,0.7)",
borderColor: analyzing ? "#BE123C" : courseFilter ? "#E11D48" : "rgba(165,180,252,0.3)",
...fontSora,
}}
>
{analyzing && (
<svg className="w-4 h-4 animate-spin" viewBox="0 0 24 24" fill="none">
<circle className="opacity-25" cx="12" cy="12" r="10" stroke="currentColor" strokeWidth="3" />
<path className="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4z" />
</svg>
)}
{analyzing ? "Analyzing..." : "Analyze"}
</button>
</div>
</div>
</div>
{/* ── Results panel ── */}
{hasFilter && (
<div
className="mt-3 text-left max-h-[300px] overflow-y-auto"
style={{ background: "rgba(255,255,255,0.06)", backdropFilter: "blur(8px)", border: "1px solid rgba(255,255,255,0.1)" }}
>
{papersLoading ? (
<div className="p-6 text-center">
<p className="text-indigo-300 text-sm" style={fontSora}>Loading papers...</p>
</div>
) : filteredPapers.length === 0 ? (
<div className="p-6 text-center">
<p className="text-indigo-300 text-sm" style={fontSora}>No papers match these filters</p>
</div>
) : (
<>
<div className="px-4 pt-3 pb-1 flex items-center justify-between">
<span className="text-[11px] font-semibold text-indigo-400 uppercase tracking-wider" style={fontSora}>
{filteredPapers.length} paper{filteredPapers.length > 1 ? "s" : ""} found
</span>
{courseFilter && (
<Link
to={`/analytics/${courseFilter}`}
className="flex items-center gap-1.5 px-3 py-1 text-[11px] font-bold text-white hover:opacity-90 transition-opacity"
style={{ background: "#6366F1", borderRadius: 0, ...fontMono }}
>
<svg className="w-3 h-3" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
<path strokeLinecap="round" strokeLinejoin="round" d="M3 13.125C3 12.504 3.504 12 4.125 12h2.25c.621 0 1.125.504 1.125 1.125v6.75C7.5 20.496 6.996 21 6.375 21h-2.25A1.125 1.125 0 013 19.875v-6.75zM9.75 8.625c0-.621.504-1.125 1.125-1.125h2.25c.621 0 1.125.504 1.125 1.125v11.25c0 .621-.504 1.125-1.125 1.125h-2.25a1.125 1.125 0 01-1.125-1.125V8.625zM16.5 4.125c0-.621.504-1.125 1.125-1.125h2.25C20.496 3 21 3.504 21 4.125v15.75c0 .621-.504 1.125-1.125 1.125h-2.25a1.125 1.125 0 01-1.125-1.125V4.125z" />
</svg>
AI Analytics · {courseFilter}
</Link>
)}
</div>
{filteredPapers.map((p) => (
<button
key={p.id}
onClick={() => { navigate(`/paper/${p.id}`); }}
className="w-full flex items-center justify-between px-4 py-3 text-left transition-colors hover:bg-white/10 cursor-pointer"
style={{ borderBottom: "1px solid rgba(255,255,255,0.06)" }}
>
<div className="flex items-center gap-3">
<div className="w-8 h-8 flex items-center justify-center shrink-0" style={{ background: "rgba(255,255,255,0.1)" }}>
<svg className="w-4 h-4 text-indigo-300" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={1.5}>
<path strokeLinecap="round" strokeLinejoin="round" d="M19.5 14.25v-2.625a3.375 3.375 0 00-3.375-3.375h-1.5A1.125 1.125 0 0113.5 7.125v-1.5a3.375 3.375 0 00-3.375-3.375H8.25m2.25 0H5.625c-.621 0-1.125.504-1.125 1.125v17.25c0 .621.504 1.125 1.125 1.125h12.75c.621 0 1.125-.504 1.125-1.125V11.25a9 9 0 00-9-9z" />
</svg>
</div>
<div>
<span className="text-sm font-bold text-white" style={fontMono}>{p.course_code}</span>
<span className="text-sm text-indigo-300 capitalize ml-2" style={fontSora}>
{p.year} {p.term} {p.exam_type}
</span>
<div className="flex gap-3 mt-0.5">
{p.question_count != null && (
<span className="text-[11px] text-indigo-400" style={fontMono}>{p.question_count} Qs</span>
)}
{p.difficulty_level && (
<span className="text-[11px] text-indigo-400 capitalize" style={fontMono}>{p.difficulty_level}</span>
)}
</div>
</div>
</div>
<div className="flex items-center gap-2">
<span
className={`px-2 py-0.5 text-[10px] font-bold border ${
p.status === "ready"
? "text-emerald-400 border-emerald-400/40"
: p.status === "processing"
? "text-amber-300 border-amber-300/40"
: "text-indigo-400/60 border-indigo-400/20"
}`}
style={{ borderRadius: 0, ...fontMono }}
>
{p.status.toUpperCase()}
</span>
<svg className="w-4 h-4 text-indigo-400" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
<path strokeLinecap="round" strokeLinejoin="round" d="M8.25 4.5l7.5 7.5-7.5 7.5" />
</svg>
</div>
</button>
))}
</>
)}
</div>
)}
</div>
{/* Quick stats — real data */}
<div className="flex justify-center gap-8 mt-10">
{[
[String(papers.filter(p => p.status === "ready").length), "Past Papers"],
[String(papers.reduce((s, p) => s + (p.question_count || 0), 0)), "Questions Analyzed"],
[String(new Set(papers.filter(p => p.status === "ready").map(p => p.course_code)).size), "Courses"],
].map(([num, label]) => (
<div key={label} className="text-center">
<div className="text-2xl font-bold text-white" style={fontMono}>{num}</div>
<div className="text-xs text-indigo-300" style={fontSora}>{label}</div>
</div>
))}
</div>
</div>
{/* Decorative grid */}
<div
className="absolute inset-0 opacity-[0.04]"
style={{
backgroundImage: "linear-gradient(#fff 1px, transparent 1px), linear-gradient(90deg, #fff 1px, transparent 1px)",
backgroundSize: "40px 40px",
}}
/>
</section>
<main className="max-w-[1200px] mx-auto px-6">
{/* ══════ Features ══════ */}
<section className="py-12">
<h2
className="text-sm font-semibold text-slate-400 uppercase tracking-wider mb-6"
style={fontSora}
>
Platform Features
</h2>
<div className="grid grid-cols-4 gap-4">
{FEATURES.map((f) => (
<div
key={f.title}
className="bg-white border border-slate-200 p-5 hover:border-slate-300 transition-colors group"
style={{ borderRadius: 0 }}
>
<div
className="w-10 h-10 flex items-center justify-center text-white mb-4"
style={{ background: f.color, borderRadius: 0 }}
>
{f.icon}
</div>
<h3
className="text-sm font-bold text-slate-800 mb-1.5"
style={fontSora}
>
{f.title}
</h3>
<p className="text-xs text-slate-400 leading-relaxed" style={fontSora}>
{f.desc}
</p>
</div>
))}
</div>
</section>
{/* ══════ My Papers ══════ */}
{user && (
<section className="pb-12">
<h2 className="text-sm font-semibold text-slate-400 uppercase tracking-wider mb-6" style={fontSora}>
My Papers
</h2>
{myUploadedPapers.length === 0 && workedPapers.length === 0 ? (
<div className="bg-white border border-slate-200 px-6 py-8 text-center" style={{ borderRadius: 0 }}>
<p className="text-sm text-slate-400" style={fontSora}>No papers yet. Upload a past paper or open one to get started.</p>
</div>
) : (
<div className="grid grid-cols-2 gap-6">
{/* Uploaded */}
{myUploadedPapers.length > 0 && (
<div>
<div className="text-xs font-semibold text-slate-500 uppercase tracking-wider mb-3" style={fontSora}>
Uploaded
</div>
<div className="space-y-2">
{myUploadedPapers.map((p) => (
<Link
key={p.id}
to={p.status === "ready" ? `/paper/${p.id}` : "#"}
className="flex items-center justify-between bg-white border border-slate-200 px-4 py-3 hover:border-indigo-300 transition-colors"
style={{ borderRadius: 0 }}
>
<div>
<span className="text-sm font-bold text-slate-800" style={fontMono}>{p.course_code}</span>
<span className="text-sm text-slate-500 capitalize ml-2" style={fontSora}>{p.year} {p.term} {p.exam_type}</span>
</div>
<span className={`text-[10px] font-bold px-2 py-0.5 border ${
p.status === "ready" ? "text-emerald-600 border-emerald-300 bg-emerald-50"
: p.status === "processing" ? "text-amber-600 border-amber-300 bg-amber-50"
: "text-slate-400 border-slate-200"
}`} style={{ borderRadius: 0, ...fontMono }}>
{p.status === "processing" ? (
<span className="flex items-center gap-1">
<span className="w-2 h-2 border border-amber-500 border-t-transparent rounded-full animate-spin inline-block" />
PROCESSING
</span>
) : p.status.toUpperCase()}
</span>
</Link>
))}
</div>
</div>
)}
{/* Worked on */}
{workedPapers.length > 0 && (
<div>
<div className="text-xs font-semibold text-slate-500 uppercase tracking-wider mb-3" style={fontSora}>
Recently Worked
</div>
<div className="space-y-2">
{workedPapers.map((p) => (
<Link
key={p.id}
to={`/paper/${p.id}`}
className="flex items-center justify-between bg-white border border-slate-200 px-4 py-3 hover:border-indigo-300 transition-colors"
style={{ borderRadius: 0 }}
>
<div>
<span className="text-sm font-bold text-slate-800" style={fontMono}>{p.course_code}</span>
<span className="text-sm text-slate-500 capitalize ml-2" style={fontSora}>{p.year} {p.term} {p.exam_type}</span>
</div>
<svg className="w-4 h-4 text-slate-300" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
<path strokeLinecap="round" strokeLinejoin="round" d="M8.25 4.5l7.5 7.5-7.5 7.5" />
</svg>
</Link>
))}
</div>
</div>
)}
</div>
)}
</section>
)}
{/* ══════ CTA Banner ══════ */}
<section className="pb-16">
<div
className="p-8 flex items-center justify-between"
style={{ background: "linear-gradient(135deg, #1E1B4B, #312E81)", borderRadius: 0 }}
>
<div>
<h3 className="text-lg font-bold text-white mb-1" style={fontSora}>
Ready to ace your exams?
</h3>
<p className="text-sm text-indigo-300" style={fontSora}>
Upload a past paper and let AI do the heavy lifting.
</p>
</div>
<div className="flex gap-3">
<Link
to="/upload"
className="px-5 py-2.5 text-sm font-semibold text-white"
style={{ background: "#6366F1", borderRadius: 0, ...fontSora }}
>
Upload Paper
</Link>
<Link
to="/analytics"
className="px-5 py-2.5 text-sm font-semibold text-indigo-200 border border-indigo-400 hover:bg-indigo-900/30 transition-colors"
style={{ borderRadius: 0, ...fontSora }}
>
View Analytics
</Link>
</div>
</div>
</section>
</main>
{/* ══════ Footer ══════ */}
<footer className="border-t border-slate-200 bg-white">
<div className="max-w-[1200px] mx-auto px-6 py-6 flex items-center justify-between">
<span className="text-xs text-slate-400" style={fontSora}>
PastPaper Master &middot; HKUST &middot; 2025
</span>
<div className="flex gap-4 text-xs text-slate-400" style={fontSora}>
<span>About</span>
<span>Contact</span>
<span>Privacy</span>
</div>
</div>
</footer>
</div>
);
}

View File

@@ -0,0 +1,90 @@
import { useState } from "react";
import { supabase } from "@/lib/supabase";
export default function LoginPage() {
const [email, setEmail] = useState("");
const [password, setPassword] = useState("");
const [mode, setMode] = useState<"signin" | "signup">("signin");
const [error, setError] = useState<string | null>(null);
const [loading, setLoading] = useState(false);
const handleSubmit = async (e: React.FormEvent) => {
e.preventDefault();
setError(null);
setLoading(true);
try {
if (mode === "signin") {
const { error } = await supabase.auth.signInWithPassword({ email, password });
if (error) throw error;
} else {
const { error } = await supabase.auth.signUp({ email, password });
if (error) throw error;
// Auto sign in after signup (requires email confirm disabled in Supabase dashboard)
const { error: signInError } = await supabase.auth.signInWithPassword({ email, password });
if (signInError) throw signInError;
}
} catch (err: unknown) {
setError(err instanceof Error ? err.message : "Something went wrong");
} finally {
setLoading(false);
}
};
return (
<div className="min-h-screen bg-gray-50 flex items-center justify-center">
<div className="bg-white rounded-2xl shadow-sm border border-gray-200 p-8 w-full max-w-sm">
<div className="mb-6">
<h1 className="text-xl font-bold text-gray-900">PastPaper Master</h1>
<p className="text-sm text-gray-500 mt-1">{mode === "signin" ? "Sign in to continue" : "Create your account"}</p>
</div>
<form onSubmit={handleSubmit} className="space-y-4">
<div>
<label className="block text-xs font-medium text-gray-700 mb-1">Email</label>
<input
type="email"
value={email}
onChange={(e) => setEmail(e.target.value)}
required
className="w-full px-3 py-2 border border-gray-300 rounded-lg text-sm focus:outline-none focus:ring-2 focus:ring-blue-500 focus:border-transparent"
placeholder="you@example.com"
/>
</div>
<div>
<label className="block text-xs font-medium text-gray-700 mb-1">Password</label>
<input
type="password"
value={password}
onChange={(e) => setPassword(e.target.value)}
required
minLength={6}
className="w-full px-3 py-2 border border-gray-300 rounded-lg text-sm focus:outline-none focus:ring-2 focus:ring-blue-500 focus:border-transparent"
placeholder="••••••"
/>
</div>
{error && (
<p className="text-xs text-red-600 bg-red-50 border border-red-200 rounded-lg px-3 py-2">{error}</p>
)}
<button
type="submit"
disabled={loading}
className="w-full py-2.5 bg-blue-600 text-white text-sm font-medium rounded-lg hover:bg-blue-700 disabled:opacity-50 transition-colors"
>
{loading ? "..." : mode === "signin" ? "Sign in" : "Create account"}
</button>
</form>
<p className="text-center text-xs text-gray-500 mt-4">
{mode === "signin" ? "No account? " : "Already have one? "}
<button
onClick={() => { setMode(mode === "signin" ? "signup" : "signin"); setError(null); }}
className="text-blue-600 hover:underline font-medium"
>
{mode === "signin" ? "Sign up" : "Sign in"}
</button>
</p>
</div>
</div>
);
}

View File

@@ -0,0 +1,16 @@
import Header from "@/components/layout/Header";
import UploadForm from "@/components/upload/UploadForm";
export default function UploadPage() {
return (
<div className="min-h-screen bg-gray-50">
<Header />
<main className="py-10 px-6">
<h1 className="text-xl font-bold text-center mb-8 text-gray-800">
Upload Past Paper
</h1>
<UploadForm />
</main>
</div>
);
}

View File

@@ -0,0 +1,524 @@
import { useState, useEffect, useCallback, useRef } from "react";
import { useParams } from "react-router-dom";
import Header from "@/components/layout/Header";
import PdfViewer from "@/components/workbench/PdfViewer";
import QuestionNav from "@/components/workbench/QuestionNav";
import QuestionDetail from "@/components/workbench/QuestionDetail";
import AiTrioPanel from "@/components/workbench/AiTrioPanel";
import SimilarHistoryPanel from "@/components/workbench/SimilarHistoryPanel";
import ActionBar from "@/components/workbench/ActionBar";
import PhotoUpload from "@/components/workbench/PhotoUpload";
import VariantDetail from "@/components/workbench/VariantDetail";
import KaTeXRenderer from "@/components/shared/KaTeXRenderer";
import { usePaper } from "@/hooks/usePaper";
import { useQuestions } from "@/hooks/useQuestions";
import { generateVariant, getVariants, updateVariant, deleteVariant, recordAttempt, getPaperAttempts } from "@/lib/api";
import { groupQuestions } from "@/lib/questionGroups";
import { useAuth } from "@/contexts/AuthContext";
import type { QuestionVariant } from "@/types/api";
const WORKED_KEY = (userId: string) => `worked_papers_${userId}`;
const WORKED_THRESHOLD_MS = 3 * 60 * 1000; // 3 minutes
function markWorked(userId: string, paperId: string) {
try {
const raw = localStorage.getItem(WORKED_KEY(userId));
const ids: string[] = raw ? JSON.parse(raw) : [];
if (!ids.includes(paperId)) {
localStorage.setItem(WORKED_KEY(userId), JSON.stringify([...ids, paperId]));
}
} catch { /* silent */ }
}
export default function WorkbenchPage() {
const { id } = useParams<{ id: string }>();
const { user } = useAuth();
const { paper, loading: paperLoading, error: paperError } = usePaper(id!);
const isReady = paper?.status === "ready";
const { questions, loading: questionsLoading } = useQuestions(id!, isReady);
const [currentQuestionId, setCurrentQuestionId] = useState<string | null>(null);
const [showPhoto, setShowPhoto] = useState(false);
// Grading result per question
const [gradingResults, setGradingResults] = useState<Map<string, {
isCorrect: boolean;
feedback: string;
ocrText: string;
scoreGiven?: number;
loading?: boolean;
}>>(new Map());
// Track which grading panels are expanded
const [gradingExpanded, setGradingExpanded] = useState<Set<string>>(new Set());
// Tab state
const [activeTab, setActiveTab] = useState<"questions" | "variants">("questions");
// variants per question: questionId → QuestionVariant[]
const [variantMap, setVariantMap] = useState<Map<string, QuestionVariant[]>>(new Map());
// which question IDs have been fetched from server
const loadedRef = useRef<Set<string>>(new Set());
// generating state
const [isGenerating, setIsGenerating] = useState(false);
// Currently viewing variant (full detail view)
const [activeVariantId, setActiveVariantId] = useState<string | null>(null);
// Cooldown: ignore scroll-based updates for 2s after user clicks a question
const lastUserSelectTime = useRef(0);
const handleQuestionSelect = useCallback((questionId: string) => {
lastUserSelectTime.current = Date.now();
setCurrentQuestionId(questionId);
}, []);
const groups = groupQuestions(questions);
const currentQuestion =
questions.find((question) => question.id === currentQuestionId)
?? questions[0]
?? null;
const currentGroupKey = currentQuestion?.question_number.match(/^\d+/)?.[0] ?? null;
const paperTitle = paper
? `${paper.year} ${paper.term} ${paper.exam_type}`
: undefined;
const currentVariants = variantMap.get(currentQuestion?.id ?? "") ?? [];
const activeVariant = currentVariants.find((v) => v.id === activeVariantId) ?? null;
const handleGroupSelect = useCallback((groupKey: string) => {
lastUserSelectTime.current = Date.now();
const group = groups.find((item) => item.key === groupKey);
if (group?.questions[0]) {
setCurrentQuestionId(group.questions[0].id);
}
}, [groups]);
useEffect(() => {
if (questions.length === 0) {
setCurrentQuestionId(null);
return;
}
setCurrentQuestionId((prev) =>
prev && questions.some((question) => question.id === prev) ? prev : questions[0].id,
);
}, [questions]);
// 3-minute worked tracking
useEffect(() => {
if (!id || !user) return;
const timer = setTimeout(() => markWorked(user.id, id), WORKED_THRESHOLD_MS);
return () => clearTimeout(timer);
}, [id, user]);
// Load historical grading results
useEffect(() => {
if (!id || !user || !isReady) return;
getPaperAttempts(id).then((attempts) => {
const map = new Map<string, { isCorrect: boolean; feedback: string; ocrText: string; scoreGiven?: number }>();
for (const a of attempts) {
map.set(a.question_id, {
isCorrect: a.is_correct,
feedback: a.feedback || "",
ocrText: a.photo_ocr_text || "",
});
}
if (map.size > 0) {
setGradingResults((prev) => {
const next = new Map(prev);
for (const [k, v] of map) {
if (!next.has(k)) next.set(k, v); // don't overwrite current session
}
return next;
});
setGradingExpanded(new Set(map.keys()));
}
}).catch(() => {});
}, [id, user, isReady]);
// Load variants for current question (once per question ID)
useEffect(() => {
if (!currentQuestionId || loadedRef.current.has(currentQuestionId)) return;
loadedRef.current.add(currentQuestionId);
getVariants(currentQuestionId)
.then((data) => {
setVariantMap((prev) => new Map(prev).set(currentQuestionId, data));
})
.catch(() => {});
}, [currentQuestionId]);
// When user scrolls PDF, find the question closest to that page
// But ignore if user just clicked a question (2s cooldown)
const handlePdfPageChange = useCallback(
(page: number) => {
if (questions.length === 0) return;
if (Date.now() - lastUserSelectTime.current < 2000) return;
let best = questions[0];
for (let i = 0; i < questions.length; i++) {
if ((questions[i].page_number ?? 1) <= page) best = questions[i];
}
setCurrentQuestionId(best.id);
},
[questions],
);
// Track answer state per question for ActionBar feedback
const [answerStates, setAnswerStates] = useState<Map<string, "correct" | "wrong">>(new Map());
const handleAnswerResult = async (isCorrect: boolean, userAnswer: string) => {
if (!currentQuestion) return;
const state = isCorrect ? "correct" : "wrong";
setAnswerStates((prev) => new Map(prev).set(currentQuestion.id, state));
try {
const type = currentQuestion.question_type === "mc" ? "select" : "input";
await recordAttempt(currentQuestion.id, type, userAnswer, isCorrect);
// Wrong answer → auto generate variant
if (!isCorrect) {
handleGenerateVariant();
}
} catch {
// silent
}
};
const handleGenerateVariant = async () => {
if (!currentQuestion || isGenerating) return;
setIsGenerating(true);
setActiveTab("variants");
try {
const saved = await generateVariant(currentQuestion.id);
setVariantMap((prev) => {
const existing = prev.get(currentQuestion.id) ?? [];
return new Map(prev).set(currentQuestion.id, [saved, ...existing]);
});
} catch {
// silent
} finally {
setIsGenerating(false);
}
};
const handleToggleFavorite = async (v: QuestionVariant) => {
const updated = await updateVariant(v.id, { favorited: !v.favorited });
setVariantMap((prev) => {
const existing = prev.get(v.source_question_id) ?? [];
return new Map(prev).set(
v.source_question_id,
existing.map((item) => (item.id === v.id ? updated : item)),
);
});
};
const handleDeleteVariant = async (v: QuestionVariant) => {
await deleteVariant(v.id);
if (activeVariantId === v.id) setActiveVariantId(null);
setVariantMap((prev) => {
const existing = prev.get(v.source_question_id) ?? [];
return new Map(prev).set(
v.source_question_id,
existing.filter((item) => item.id !== v.id),
);
});
};
if (paperLoading) {
return (
<div className="min-h-screen bg-gray-50 flex items-center justify-center">
<div className="text-gray-400 text-sm">Loading...</div>
</div>
);
}
if (paperError || !paper) {
return (
<div className="min-h-screen bg-gray-50 flex items-center justify-center">
<div className="text-red-500 text-sm">{paperError ?? "Paper not found"}</div>
</div>
);
}
return (
<div className="h-screen flex flex-col">
<Header courseCode={paper.course_code} paperTitle={paperTitle} />
{/* Processing overlay */}
{paper.status === "processing" && (
<div className="flex-1 flex items-center justify-center bg-gray-50">
<div className="text-center">
<div className="inline-block w-8 h-8 border-3 border-blue-600 border-t-transparent rounded-full animate-spin mb-4" />
<p className="text-gray-600 text-sm">AI is analyzing the paper...</p>
<p className="text-gray-400 text-xs mt-1">
{paper.question_count
? `${paper.question_count} questions found, generating analysis...`
: "Extracting and structuring questions..."}
</p>
</div>
</div>
)}
{/* Error state */}
{paper.status === "error" && (
<div className="flex-1 flex items-center justify-center bg-gray-50">
<div className="text-center max-w-md">
<p className="text-red-600 font-medium mb-2">Processing Failed</p>
<p className="text-gray-500 text-sm">{paper.error_message}</p>
</div>
</div>
)}
{/* Ready — workbench */}
{paper.status === "ready" && (
<div className="flex-1 flex overflow-hidden">
{/* Left: PDF viewer */}
<div className="w-[60%] border-r border-gray-200">
<PdfViewer
fileUrl={paper.paper_file_url}
currentPage={currentQuestion?.page_number ?? 1}
onPageChange={handlePdfPageChange}
/>
</div>
{/* Right: analysis panel */}
<div className="w-[40%] flex flex-col overflow-hidden">
{questionsLoading ? (
<div className="flex-1 flex items-center justify-center text-gray-400 text-sm">
Loading questions...
</div>
) : activeVariantId && activeVariant ? (
/* ===== Variant Detail View ===== */
<>
<button
onClick={() => setActiveVariantId(null)}
className="flex items-center gap-2 px-4 py-2.5 text-sm font-medium text-blue-600 bg-gray-50 border-b border-gray-200 hover:bg-gray-100 shrink-0"
>
<span></span>
<span>Back to Questions</span>
<span className="ml-2 px-2 py-0.5 bg-purple-100 text-purple-700 text-xs rounded-full font-medium">
Variant Q{activeVariant.source_question_number}
</span>
</button>
<div className="flex-1 overflow-y-auto p-4">
<VariantDetail variant={activeVariant.variant_data} />
</div>
</>
) : (
/* ===== Normal Tab View ===== */
<>
{/* Tab bar */}
<div className="flex border-b border-gray-200 shrink-0">
<button
onClick={() => setActiveTab("questions")}
className={`flex-1 py-2.5 text-sm font-medium text-center transition-colors ${
activeTab === "questions"
? "text-gray-900 border-b-2 border-blue-600"
: "text-gray-400 hover:text-gray-600"
}`}
>
Questions
</button>
<button
onClick={() => setActiveTab("variants")}
className={`flex-1 py-2.5 text-sm font-medium text-center transition-colors flex items-center justify-center gap-1.5 ${
activeTab === "variants"
? "text-gray-900 border-b-2 border-blue-600"
: "text-gray-400 hover:text-gray-600"
}`}
>
Variants
{currentVariants.length > 0 && (
<span className="w-5 h-5 flex items-center justify-center bg-purple-500 text-white text-xs font-bold rounded-full">
{currentVariants.length}
</span>
)}
</button>
</div>
{/* Question nav — always visible */}
<QuestionNav
groups={groups}
currentGroupKey={currentGroupKey}
currentQuestionId={currentQuestion?.id ?? null}
onSelectGroup={handleGroupSelect}
onSelectQuestion={handleQuestionSelect}
/>
{/* Questions tab content */}
{activeTab === "questions" && (
<>
<div className="flex-1 overflow-y-auto p-4">
{currentQuestion && (
<>
<QuestionDetail
question={currentQuestion}
onAnswerResult={handleAnswerResult}
/>
{/* Grading result panel */}
{gradingResults.has(currentQuestion.id) && (() => {
const gr = gradingResults.get(currentQuestion.id)!;
const expanded = gradingExpanded.has(currentQuestion.id);
const toggleExpand = () => setGradingExpanded((prev) => {
const next = new Set(prev);
next.has(currentQuestion.id) ? next.delete(currentQuestion.id) : next.add(currentQuestion.id);
return next;
});
if (gr.loading) {
return (
<div className="mb-4 rounded-lg border border-blue-200 bg-blue-50 p-3">
<div className="flex items-center gap-2">
<span className="w-4 h-4 border-2 border-blue-600 border-t-transparent rounded-full animate-spin" />
<span className="text-sm font-medium text-blue-700">Grading your answer...</span>
</div>
</div>
);
}
return (
<div className={`mb-4 rounded-lg border ${gr.isCorrect ? "border-green-200" : "border-red-200"}`}>
<button
onClick={toggleExpand}
className={`w-full flex items-center justify-between px-3 py-2.5 rounded-t-lg ${gr.isCorrect ? "bg-green-50" : "bg-red-50"}`}
>
<div className="flex items-center gap-2">
<span className="text-lg">{gr.isCorrect ? "✓" : "✗"}</span>
<span className={`font-semibold text-sm ${gr.isCorrect ? "text-green-700" : "text-red-700"}`}>
AI Grading: {gr.isCorrect ? "Correct" : "Incorrect"}
{gr.scoreGiven !== undefined && `${gr.scoreGiven} pts`}
</span>
</div>
<span className="text-gray-400 text-xs">{expanded ? "▲" : "▼"}</span>
</button>
{expanded && (
<div className="p-3 border-t border-gray-100 bg-white rounded-b-lg">
{gr.ocrText && (
<details className="mb-3 bg-gray-50 rounded-lg border border-gray-200">
<summary className="px-3 py-2 text-xs font-medium text-gray-500 cursor-pointer">Your Answer (OCR)</summary>
<div className="px-3 pb-3">
<KaTeXRenderer html={gr.ocrText.replace(/\n/g, "<br/>")} className="text-xs text-gray-700" />
</div>
</details>
)}
<KaTeXRenderer html={gr.feedback} className="text-gray-700 text-sm" />
</div>
)}
</div>
);
})()}
<AiTrioPanel question={currentQuestion} />
<SimilarHistoryPanel question={currentQuestion} />
</>
)}
</div>
<ActionBar
question={currentQuestion}
onGenerateVariant={handleGenerateVariant}
isGenerating={isGenerating}
onPhotoOpen={() => setShowPhoto(true)}
answerState={currentQuestion ? answerStates.get(currentQuestion.id) ?? null : null}
/>
</>
)}
{/* Variants tab content */}
{activeTab === "variants" && (
<div className="flex-1 overflow-y-auto p-4">
<div className="mb-3">
<button
onClick={handleGenerateVariant}
disabled={!currentQuestion || isGenerating}
className="w-full py-2 rounded-lg text-sm font-medium bg-purple-50 text-purple-700 border border-purple-200 hover:bg-purple-100 disabled:opacity-50 transition-colors"
>
{isGenerating ? (
<span className="flex items-center justify-center gap-2">
<span className="w-3 h-3 border-2 border-purple-600 border-t-transparent rounded-full animate-spin" />
Generating...
</span>
) : "+ Generate Variant"}
</button>
</div>
{currentVariants.length === 0 && !isGenerating ? (
<div className="text-center py-12">
<p className="text-gray-400 text-sm">No variants yet for this question.</p>
</div>
) : (
<div className="space-y-3">
{currentVariants.map((v) => (
<div key={v.id} className="bg-gray-50 rounded-lg border border-gray-200 p-4">
<div className="flex items-center justify-between mb-2">
<span className="text-xs text-gray-400">
{new Date(v.created_at).toLocaleDateString("en-CA")}
</span>
<div className="flex items-center gap-2">
<button
onClick={() => void handleToggleFavorite(v)}
title={v.favorited ? "Unfavorite" : "Save to Error Book"}
className={`text-lg leading-none ${v.favorited ? "text-yellow-400" : "text-gray-300 hover:text-yellow-400"}`}
>
</button>
<button
onClick={() => void handleDeleteVariant(v)}
className="text-gray-300 hover:text-red-400 text-sm leading-none"
title="Delete"
>
×
</button>
</div>
</div>
<p className="text-xs text-gray-600 line-clamp-2 mb-3">
{v.variant_data.question_text?.replace(/<[^>]*>/g, "").slice(0, 140)}
</p>
<button
onClick={() => setActiveVariantId(v.id)}
className="px-3 py-1.5 bg-blue-600 text-white text-xs font-medium rounded-lg hover:bg-blue-700"
>
Practice
</button>
</div>
))}
</div>
)}
</div>
)}
</>
)}
</div>
</div>
)}
{/* Photo upload modal */}
{showPhoto && currentQuestion && (() => {
const qid = currentQuestion.id;
return (
<PhotoUpload
questionId={qid}
onClose={() => setShowPhoto(false)}
onSubmitted={async (promise) => {
// Set loading state
setGradingResults((prev) => new Map(prev).set(qid, { isCorrect: false, feedback: "", ocrText: "", loading: true }));
setGradingExpanded((prev) => new Set(prev).add(qid));
try {
const res = await promise;
const { is_correct, feedback, score_given } = res.grade;
setGradingResults((prev) => new Map(prev).set(qid, {
isCorrect: is_correct,
feedback,
ocrText: res.ocr_text,
scoreGiven: score_given,
loading: false,
}));
// Wrong → auto generate variant
if (!is_correct) {
handleGenerateVariant();
}
} catch {
setGradingResults((prev) => new Map(prev).set(qid, {
isCorrect: false,
feedback: "Grading failed. Please try again.",
ocrText: "",
loading: false,
}));
}
}}
/>
);
})()}
</div>
);
}

View File

@@ -0,0 +1,79 @@
@import "tailwindcss";
@import "katex/dist/katex.min.css";
/* ── Google Fonts: Sora (headings) + IBM Plex Mono (data) ── */
@import url("https://fonts.googleapis.com/css2?family=Sora:wght@400;500;600;700&family=IBM+Plex+Mono:wght@400;500;600&display=swap");
/* Hide scrollbar on horizontal tab rows */
.hide-scrollbar { -ms-overflow-style: none; scrollbar-width: none; }
.hide-scrollbar::-webkit-scrollbar { display: none; }
/* ── Knowledge Base HTML content styling (from SOS project) ── */
.kb-html-content h1 { font-size: 1.25rem; font-weight: 700; margin: 0.75rem 0 0.5rem; line-height: 1.3; }
.kb-html-content h2 { font-size: 1.1rem; font-weight: 600; margin: 0.75rem 0 0.4rem; color: #1e40af; border-bottom: 1px solid #e5e7eb; padding-bottom: 0.25rem; }
.kb-html-content h3 { font-size: 0.95rem; font-weight: 600; margin: 0.6rem 0 0.3rem; color: #374151; }
.kb-html-content h4 { font-size: 0.875rem; font-weight: 600; margin: 0.5rem 0 0.25rem; color: #6b7280; }
.kb-html-content p { margin: 0.3rem 0; line-height: 1.6; }
.kb-html-content p.summary { background: #eff6ff; border-left: 3px solid #3b82f6; padding: 0.5rem 0.75rem; border-radius: 0 0.25rem 0.25rem 0; color: #1e3a5f; margin-bottom: 0.75rem; }
.kb-html-content ul, .kb-html-content ol { margin: 0.3rem 0 0.3rem 1.25rem; line-height: 1.6; }
.kb-html-content ul { list-style: disc; }
.kb-html-content ol { list-style: decimal; }
.kb-html-content li { margin: 0.15rem 0; }
.kb-html-content strong { font-weight: 600; color: #1e293b; }
.kb-html-content blockquote { border-left: 3px solid #d1d5db; padding: 0.4rem 0.75rem; margin: 0.4rem 0; background: #f9fafb; color: #4b5563; font-style: italic; border-radius: 0 0.25rem 0.25rem 0; }
.kb-html-content pre { background: #1e293b; color: #e2e8f0; padding: 0.75rem; border-radius: 0.375rem; overflow-x: auto; margin: 0.4rem 0; font-size: 0.8rem; }
.kb-html-content code { font-family: ui-monospace, monospace; font-size: 0.85em; }
.kb-html-content :not(pre) > code { background: #f1f5f9; padding: 0.1rem 0.3rem; border-radius: 0.2rem; color: #be185d; }
.kb-html-content table { border-collapse: collapse; width: 100%; margin: 0.4rem 0; font-size: 0.8rem; }
.kb-html-content th, .kb-html-content td { border: 1px solid #e5e7eb; padding: 0.35rem 0.5rem; text-align: left; }
.kb-html-content th { background: #f3f4f6; font-weight: 600; }
.kb-html-content section { margin: 0.5rem 0; }
.kb-html-content .tag { display: inline-block; background: #dbeafe; color: #1e40af; padding: 0.1rem 0.5rem; border-radius: 9999px; font-size: 0.75rem; margin: 0.15rem 0.15rem; }
.kb-html-content hr { border: none; border-top: 1px solid #e5e7eb; margin: 0.75rem 0; }
/* ── Example blocks ── */
.kb-html-content .example { background: #fffbeb; border: 1px solid #fbbf24; border-radius: 0.375rem; padding: 0.75rem; margin: 0.6rem 0; }
.kb-html-content .example-title { font-weight: 700; color: #92400e; margin-bottom: 0.4rem; font-size: 0.9rem; }
.kb-html-content .example-solution { border-top: 1px dashed #d97706; padding-top: 0.4rem; }
/* ── LaTeX blocks ── */
.kb-html-content pre.latex { background: #f8fafc; color: #1e293b; border: 1px solid #e2e8f0; text-align: center; font-size: 0.9rem; padding: 0.6rem; }
.kb-html-content code.latex { background: #f1f5f9; padding: 0.1rem 0.3rem; border-radius: 0.2rem; color: #4338ca; font-size: 0.85em; }
/* ── Common error block (used in solution) ── */
.kb-html-content .common-error {
background: #fef2f2;
border: 1px solid #fca5a5;
border-left: 3px solid #ef4444;
border-radius: 0.375rem;
padding: 0.6rem 0.75rem;
margin: 0.5rem 0;
}
.kb-html-content .common-error::before {
content: "⚠ Common Mistake";
font-weight: 700;
color: #dc2626;
display: block;
margin-bottom: 0.3rem;
font-size: 0.85rem;
}
/* ── Figure description blocks ── */
.kb-html-content .figure-desc {
background: #faf5ff;
border: 1px solid #d8b4fe;
border-left: 3px solid #a855f7;
border-radius: 0.375rem;
padding: 0.6rem 0.75rem;
margin: 0.5rem 0;
}
/* ── AI Supplement blocks ── */
.kb-html-content .ai-supplement {
background: #f0fdf4;
border: 1px solid #86efac;
border-left: 3px solid #22c55e;
border-radius: 0.375rem;
padding: 0.6rem 0.75rem;
margin: 0.5rem 0;
}

169
frontend/src/types/api.ts Normal file
View File

@@ -0,0 +1,169 @@
export interface Paper {
id: string;
user_id: string | null;
course_code: string;
year: number;
term: string;
exam_type: string;
paper_file_url: string;
answer_file_url: string | null;
status: "uploaded" | "processing" | "ready" | "error";
error_message: string | null;
total_score: number | null;
question_count: number | null;
topics_summary: Record<string, number> | null;
difficulty_level: string | null;
processing_step: string | null;
processing_progress: number;
processing_total: number;
created_at: string;
updated_at: string;
}
export interface PaperSummary {
id: string;
course_code: string;
year: number;
term: string;
exam_type: string;
part_label: string | null;
}
export interface Question {
id: string;
paper_id: string;
question_number: string;
parent_question: string | null;
display_order: number;
question_type: string;
question_format?: string | null;
question_text: string;
score: number | null;
page_number: number | null;
page_y_ratio?: number | null;
options: { label: string; text: string }[] | null;
correct_option: string | null;
correct_answer: string | null;
raw_answer_text: string | null;
topics: string[] | null;
topic_primary?: string | null;
analytics_topic?: string | null;
topic_tags?: string[] | null;
skill_tags?: string[] | null;
difficulty: string | null;
knowledge_reminder: string;
ai_hint: string;
solution: string;
created_at: string;
updated_at: string;
paper?: PaperSummary;
}
export interface UploadResponse {
paper_id: string;
status: string;
message: string;
}
export interface UserAttempt {
id: string;
user_id: string;
question_id: string;
attempt_type: string;
user_answer: string | null;
photo_url: string | null;
photo_ocr_text: string | null;
is_correct: boolean | null;
feedback: string | null;
error_at_step: number | null;
in_error_book: boolean;
mastered: boolean;
created_at: string;
paper_questions?: Question;
score_given?: number | null;
}
export interface VariantQuestion {
question_text: string;
question_type: string;
options: { label: string; text: string }[] | null;
correct_answer: string;
ai_hint: string;
knowledge_reminder: string;
solution: string;
}
export interface QuestionVariant {
id: string;
user_id: string;
source_question_id: string;
source_question_number: string;
variant_data: VariantQuestion;
favorited: boolean;
created_at: string;
}
export interface GradeResult {
is_correct: boolean;
feedback: string;
error_at_step: number | null;
}
export interface SimilarQuestion {
id: string;
paper_id: string;
source: string;
question_number: string;
match_percent: number;
match_reasons?: string[];
question_type: Question["question_type"];
question_text: string;
topics: string[];
difficulty: string | null;
knowledge_reminder: string;
ai_hint: string;
solution: string;
}
export interface AnalyticsTopicQuestion {
paper_id: string;
source: string;
question_number: string;
preview: string;
difficulty: string | null;
question_type: string;
year?: number | null;
term?: string | null;
exam_type?: string | null;
topics?: string[];
}
export interface AnalyticsTopicEntry {
label: string;
count: number;
pct: number;
questions: AnalyticsTopicQuestion[];
}
export interface CourseAnalytics {
course_code: string;
kpi: {
papers: number;
questions: number;
topics: number;
difficulty: string;
};
topic_frequency: AnalyticsTopicEntry[];
question_types: Array<{
label: string;
count: number;
pct: number;
}>;
difficulty_distribution: {
easy: number;
medium: number;
hard: number;
};
high_yield_topics: string[];
all_questions: AnalyticsTopicQuestion[];
}

1
frontend/src/vite-env.d.ts vendored Normal file
View File

@@ -0,0 +1 @@
/// <reference types="vite/client" />

21
frontend/tsconfig.json Normal file
View File

@@ -0,0 +1,21 @@
{
"compilerOptions": {
"target": "ES2020",
"useDefineForClassFields": true,
"lib": ["ES2020", "DOM", "DOM.Iterable"],
"module": "ESNext",
"skipLibCheck": true,
"moduleResolution": "bundler",
"allowImportingTsExtensions": true,
"isolatedModules": true,
"moduleDetection": "force",
"noEmit": true,
"jsx": "react-jsx",
"strict": true,
"baseUrl": ".",
"paths": {
"@/*": ["src/*"]
}
},
"include": ["src"]
}

22
frontend/vite.config.ts Normal file
View File

@@ -0,0 +1,22 @@
import { defineConfig } from "vite";
import react from "@vitejs/plugin-react";
import tailwindcss from "@tailwindcss/vite";
import { resolve } from "path";
export default defineConfig({
plugins: [react(), tailwindcss()],
resolve: {
alias: {
"@": resolve(__dirname, "src"),
},
},
server: {
port: 5173,
proxy: {
"/api": {
target: "http://localhost:8000",
changeOrigin: true,
},
},
},
});

22
index 2.html Normal file

File diff suppressed because one or more lines are too long

3
memory/MEMORY.md Normal file
View File

@@ -0,0 +1,3 @@
# Memory Index
- [project_pastpaper_master.md](project_pastpaper_master.md) — PastPaper Master 项目概览与当前开发进度

View File

@@ -0,0 +1,37 @@
---
name: PastPaper Master 项目概览
description: 项目技术栈、当前开发状态、已完成工作流及下一步优先级
type: project
---
AI 辅助学习平台,支持 COMP2211 试卷练习。核心功能题目工作台、AI 三件套、相似题推荐、错题本、变式题生成。
## 技术栈
- Frontend: React 19 + TypeScript + Vite 7 + Tailwind v4
- Backend: FastAPI + Python 3.12 + uv
- DB: Supabase PostgreSQLRLS 已预留,当前用 temp user id
- LLM: GPT-4o (laozhang proxy) + Qwen-plus fallback
## 当前 DB 状态2026-04-10
COMP2211 共 7 份 status=ready 试卷250 道 subquestion 级题目,均有 knowledge_reminder / ai_hint / solution / analytics_topic / topic_tags / skill_tags。
## 已完成的工作(本次 session
**Workstream A相似题检索 + 移除 demo fallback**
- `backend/app/routers/questions.py`
- `skill_tags` 加入 SELECT 和 `question_topics()` 计算
- 修复 `isinstance(target_score, int)``(int, float)` 支持 NUMERIC 小数分
- `similarity_score()` 返回 `(score, reasons)` tuple
- 过滤阈值从 `<= 0` 改为 `< 10`
- 响应增加 `match_reasons` 字段
- `frontend/src/types/api.ts``SimilarQuestion``match_reasons?: string[]`
- `frontend/src/components/workbench/SimilarHistoryPanel.tsx`:移除全部 demo fallback改为真实 empty/error 状态,显示 match_reasons chip
## 下一步优先级(来自 HANDOFF_COMP2211.md
1. ✅ Workstream A: 相似题检索 + 移除 demo fallback — 已完成
2. Workstream B: Analytics 深化per-paper drill-down、topic 频率时序、高频话题)
3. Workstream C: LaTeX/KaTeX 渲染质量(集中归一化、剔除 OCR 噪声)
4. Workstream D: 用户上传去重(对比 course_library 已有试卷)
5. Workstream E: UI/UX passQuestionNav、状态 badge、workbench 层级)
**Why:** HANDOFF 文档中建议的开发顺序,以数据稳定性为先。
**How to apply:** 下次 session 从 Workstream BAnalytics 深化)开始。

1
pastpaper-scraper Submodule

Submodule pastpaper-scraper added at 36d4a450cd

25
pitch_script.md Normal file
View File

@@ -0,0 +1,25 @@
# KnowIt Pitch — Product Demo (Pages 5-6, ~45s)
## Transition In
> Now let me show you the product.
## Page 5 — Product Demo
> This is PastPaper Master. Search any course, download past papers, and hit "AI Analyze" — our system reads every page, extracts each question, and generates knowledge reminders, hints, and full solutions automatically.
>
> It's powered by Gemini vision and DeepSeek, with a RAG pipeline connecting papers, recordings, and courseware.
## Page 6 — Workflow
> Here's the full student workflow.
>
> **Download** papers. **AI analysis** breaks down topics and difficulty. **Upload your answers** — AI grades them instantly with detailed feedback.
>
> Wrong answers go into your **mistake book**. AI generates **variant questions** on the same topic, plus retrieves **similar questions** from other exams.
>
> And **smart flashcards** auto-generated for quick revision — already live for pharmacology students.
## Transition Out
> One closed loop — find, practice, grade, review, master. Over to [name] on the market.

View File

@@ -0,0 +1,207 @@
-- ============================================
-- PastPaper Master — 初始数据库 Schema
-- Version: 001
-- Date: 2025-03-11
-- ============================================
-- 启用必要的扩展
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
-- ============================================
-- Table 1: papers — 上传的试卷
-- ============================================
CREATE TABLE papers (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
-- 元信息(用户上传时填写)
course_code TEXT NOT NULL, -- "COMP2011"
year INTEGER NOT NULL, -- 2024
term TEXT NOT NULL CHECK (term IN ('fall', 'spring', 'summer')),
exam_type TEXT NOT NULL CHECK (exam_type IN ('midterm', 'final', 'quiz')),
-- 文件 (Supabase Storage)
paper_file_url TEXT NOT NULL, -- 试卷 PDF
answer_file_url TEXT, -- 答案 PDF可选
-- 处理状态
status TEXT NOT NULL DEFAULT 'uploaded'
CHECK (status IN ('uploaded', 'processing', 'ready', 'error')),
error_message TEXT, -- 处理失败时的错误信息
-- 提取的原始文本(缓存)
paper_extracted_text TEXT,
answer_extracted_text TEXT,
-- 整卷概览AI 生成)
total_score INTEGER,
question_count INTEGER,
topics_summary JSONB, -- {"Linked List": 40, "Recursion": 30}
difficulty_level TEXT CHECK (difficulty_level IN ('easy', 'medium', 'hard')),
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- ============================================
-- Table 2: paper_questions — 逐题数据
-- ============================================
CREATE TABLE paper_questions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
paper_id UUID NOT NULL REFERENCES papers(id) ON DELETE CASCADE,
-- 题目标识
question_number TEXT NOT NULL, -- "1", "1a", "2b"
parent_question TEXT, -- 子题的父题号: "1a" → "1"
display_order INTEGER NOT NULL, -- 显示顺序
-- 题目内容
question_type TEXT NOT NULL
CHECK (question_type IN ('mc', 'fill_blank', 'long_question')),
question_text TEXT NOT NULL, -- 题目原文
score INTEGER, -- 分值
page_number INTEGER, -- PDF 页码(左右联动)
-- 选择题专用
options JSONB, -- [{"label":"A","text":"..."},...]
correct_option TEXT, -- "B"
-- 填空题专用
correct_answer TEXT, -- 正确答案
accept_variants TEXT[], -- 等价表达 ["O(nlogn)","O(n log n)"]
-- 答案 PDF 提取的原始答案(所有题型)
raw_answer_text TEXT,
-- 知识点标签
topics TEXT[], -- ["Linked List","Pointer"]
difficulty TEXT CHECK (difficulty IN ('easy', 'medium', 'hard')),
-- AI 三件套HTML + KaTeX
knowledge_reminder TEXT, -- 知识点 Reminder
ai_hint TEXT, -- AI Hint
solution TEXT, -- Solution逐步 derivation
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- ============================================
-- Table 3: user_attempts — 用户答题记录
-- Phase 4 实现,先建好表结构
-- ============================================
CREATE TABLE user_attempts (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
question_id UUID NOT NULL REFERENCES paper_questions(id) ON DELETE CASCADE,
-- 用户的作答
attempt_type TEXT NOT NULL
CHECK (attempt_type IN ('select', 'input', 'photo')),
user_answer TEXT, -- 选项 / 输入的答案
photo_url TEXT, -- 上传的照片
photo_ocr_text TEXT, -- OCR 识别结果
-- AI 判定
is_correct BOOLEAN,
feedback TEXT, -- HTML — 逐步错误分析
error_at_step INTEGER, -- 第几步开始错
-- 错题本
in_error_book BOOLEAN NOT NULL DEFAULT false,
mastered BOOLEAN NOT NULL DEFAULT false,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- ============================================
-- 索引
-- ============================================
CREATE INDEX idx_papers_user ON papers(user_id);
CREATE INDEX idx_papers_course ON papers(course_code);
CREATE INDEX idx_papers_status ON papers(status);
CREATE INDEX idx_questions_paper ON paper_questions(paper_id);
CREATE INDEX idx_questions_type ON paper_questions(question_type);
CREATE INDEX idx_questions_topics ON paper_questions USING GIN(topics);
CREATE INDEX idx_attempts_user ON user_attempts(user_id);
CREATE INDEX idx_attempts_question ON user_attempts(question_id);
CREATE INDEX idx_attempts_errorbook ON user_attempts(user_id)
WHERE in_error_book = true;
-- ============================================
-- RLS 策略
-- ============================================
ALTER TABLE papers ENABLE ROW LEVEL SECURITY;
ALTER TABLE paper_questions ENABLE ROW LEVEL SECURITY;
ALTER TABLE user_attempts ENABLE ROW LEVEL SECURITY;
-- papers: 用户只能看自己上传的(以后加公共库时再调整)
CREATE POLICY "Users can view own papers"
ON papers FOR SELECT
USING (auth.uid() = user_id);
CREATE POLICY "Users can insert own papers"
ON papers FOR INSERT
WITH CHECK (auth.uid() = user_id);
CREATE POLICY "Users can update own papers"
ON papers FOR UPDATE
USING (auth.uid() = user_id);
CREATE POLICY "Users can delete own papers"
ON papers FOR DELETE
USING (auth.uid() = user_id);
-- paper_questions: 跟随 paper 的权限
CREATE POLICY "Users can view questions of own papers"
ON paper_questions FOR SELECT
USING (
EXISTS (
SELECT 1 FROM papers
WHERE papers.id = paper_questions.paper_id
AND papers.user_id = auth.uid()
)
);
-- service_role 用于后端写入 questions处理管线用
-- 前端不直接写 questions通过 API 触发后端处理
-- user_attempts: 用户只能看/写自己的
CREATE POLICY "Users can view own attempts"
ON user_attempts FOR SELECT
USING (auth.uid() = user_id);
CREATE POLICY "Users can insert own attempts"
ON user_attempts FOR INSERT
WITH CHECK (auth.uid() = user_id);
CREATE POLICY "Users can update own attempts"
ON user_attempts FOR UPDATE
USING (auth.uid() = user_id);
-- ============================================
-- updated_at 自动更新触发器
-- ============================================
CREATE OR REPLACE FUNCTION update_updated_at()
RETURNS TRIGGER AS $$
BEGIN
NEW.updated_at = now();
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER papers_updated_at
BEFORE UPDATE ON papers
FOR EACH ROW EXECUTE FUNCTION update_updated_at();
CREATE TRIGGER questions_updated_at
BEFORE UPDATE ON paper_questions
FOR EACH ROW EXECUTE FUNCTION update_updated_at();
-- ============================================
-- Storage bucket
-- ============================================
-- 在 Supabase Dashboard 中手动创建 bucket: "papers"
-- 或通过 API 创建(后端初始化时处理)

View File

@@ -0,0 +1,38 @@
-- ============================================
-- PastPaper Master — Shared course library fields
-- Version: 002
-- Date: 2026-03-24
-- ============================================
-- Shared library / canonical import metadata on papers
ALTER TABLE papers
ADD COLUMN IF NOT EXISTS source_kind TEXT NOT NULL DEFAULT 'user_upload'
CHECK (source_kind IN ('user_upload', 'course_library')),
ADD COLUMN IF NOT EXISTS source_exam_key TEXT,
ADD COLUMN IF NOT EXISTS part_label TEXT
CHECK (part_label IN ('A', 'B')),
ADD COLUMN IF NOT EXISTS source_question_filename TEXT,
ADD COLUMN IF NOT EXISTS source_answer_filename TEXT;
CREATE UNIQUE INDEX IF NOT EXISTS idx_papers_course_library_exam_key
ON papers(source_exam_key)
WHERE source_kind = 'course_library' AND source_exam_key IS NOT NULL;
CREATE INDEX IF NOT EXISTS idx_papers_course_lookup
ON papers(course_code, year, term, exam_type, part_label);
-- Grading results should persist awarded score
ALTER TABLE user_attempts
ADD COLUMN IF NOT EXISTS score_given INTEGER;
CREATE INDEX IF NOT EXISTS idx_attempts_errorbook_active
ON user_attempts(user_id, created_at DESC)
WHERE in_error_book = true AND mastered = false;
-- The backend and frontend already support true_false; schema must match.
ALTER TABLE paper_questions
DROP CONSTRAINT IF EXISTS paper_questions_question_type_check;
ALTER TABLE paper_questions
ADD CONSTRAINT paper_questions_question_type_check
CHECK (question_type IN ('mc', 'true_false', 'fill_blank', 'long_question'));

View File

@@ -0,0 +1,41 @@
-- ============================================
-- PastPaper Master — Question taxonomy fields
-- Version: 003
-- Date: 2026-03-24
-- ============================================
-- A question needs multiple classification layers:
-- 1) question_format: how the student interacts with it
-- 2) topic_tags / topic_primary / analytics_topic: course knowledge taxonomy
-- 3) skill_tags: what kind of thinking task the question requires
ALTER TABLE paper_questions
ADD COLUMN IF NOT EXISTS question_format TEXT
CHECK (
question_format IN (
'mc',
'true_false',
'fill_blank',
'short_answer',
'long_answer',
'coding'
)
),
ADD COLUMN IF NOT EXISTS topic_primary TEXT,
ADD COLUMN IF NOT EXISTS analytics_topic TEXT,
ADD COLUMN IF NOT EXISTS topic_tags TEXT[],
ADD COLUMN IF NOT EXISTS skill_tags TEXT[];
-- Keep the legacy topics column for backward compatibility for now.
-- New analytics and retrieval code should gradually move to analytics_topic/topic_tags.
CREATE INDEX IF NOT EXISTS idx_questions_question_format
ON paper_questions(question_format);
CREATE INDEX IF NOT EXISTS idx_questions_analytics_topic
ON paper_questions(analytics_topic);
CREATE INDEX IF NOT EXISTS idx_questions_topic_tags
ON paper_questions USING GIN(topic_tags);
CREATE INDEX IF NOT EXISTS idx_questions_skill_tags
ON paper_questions USING GIN(skill_tags);

View File

@@ -0,0 +1,30 @@
-- ============================================
-- PastPaper Master — Decouple course library papers from auth users
-- Version: 004
-- Date: 2026-03-24
-- ============================================
-- Course-library papers should not depend on a concrete auth.users row.
-- User-uploaded papers still keep user_id populated.
ALTER TABLE papers
ALTER COLUMN user_id DROP NOT NULL;
-- Keep existing FK so user-owned papers can still reference auth.users,
-- while course-library rows simply use NULL.
-- Tighten the intended invariant with a check constraint:
-- - user_upload rows must have user_id
-- - course_library rows must not have user_id
ALTER TABLE papers
DROP CONSTRAINT IF EXISTS papers_source_kind_user_id_check;
ALTER TABLE papers
ADD CONSTRAINT papers_source_kind_user_id_check
CHECK (
(source_kind = 'user_upload' AND user_id IS NOT NULL)
OR
(source_kind = 'course_library' AND user_id IS NULL)
);
-- Existing RLS policies continue to apply to user-owned rows.
-- Course-library rows are accessed through the backend service role.

View File

@@ -0,0 +1,27 @@
-- ============================================
-- PastPaper Master — Allow legacy long_question format alias
-- Version: 005
-- Date: 2026-03-24
-- ============================================
--
-- Some existing seeds and older generated SQL used `long_question` in the
-- `question_format` column, while the 003 taxonomy migration introduced
-- `long_answer` as the canonical value. Allow both temporarily so historical
-- inserts do not fail. New generators should continue emitting `long_answer`.
ALTER TABLE paper_questions
DROP CONSTRAINT IF EXISTS paper_questions_question_format_check;
ALTER TABLE paper_questions
ADD CONSTRAINT paper_questions_question_format_check
CHECK (
question_format IN (
'mc',
'true_false',
'fill_blank',
'short_answer',
'long_answer',
'long_question',
'coding'
)
);

View File

@@ -0,0 +1,17 @@
-- ============================================
-- PastPaper Master — Make score fields numeric
-- Version: 006
-- Date: 2026-04-10
-- ============================================
ALTER TABLE paper_questions
ALTER COLUMN score TYPE NUMERIC
USING score::NUMERIC;
ALTER TABLE papers
ALTER COLUMN total_score TYPE NUMERIC
USING total_score::NUMERIC;
ALTER TABLE user_attempts
ALTER COLUMN score_given TYPE NUMERIC
USING score_given::NUMERIC;

View File

@@ -0,0 +1,36 @@
-- 007: Full-text search on paper_questions.question_text
--
-- Adds a tsvector generated column (auto-maintained by PostgreSQL on every
-- INSERT/UPDATE), a GIN index for fast @@ queries, and a batch-scoring RPC
-- used by the similar-question retrieval endpoint.
ALTER TABLE paper_questions
ADD COLUMN IF NOT EXISTS search_text tsvector
GENERATED ALWAYS AS (
to_tsvector('english', coalesce(question_text, ''))
) STORED;
CREATE INDEX IF NOT EXISTS idx_pq_search_text
ON paper_questions USING gin(search_text);
-- text_similarity_scores(query_text, candidate_ids)
-- Returns one row per candidate ID with a ts_rank_cd score normalised by
-- unique word count (normalization flag = 1). Questions that share no
-- lexemes with the query still appear in the result with score = 0 so the
-- caller always gets a complete score map for every candidate.
CREATE OR REPLACE FUNCTION text_similarity_scores(
query_text text,
candidate_ids uuid[]
)
RETURNS TABLE (question_id uuid, text_score float4)
LANGUAGE sql STABLE AS $$
SELECT
id,
ts_rank_cd(
search_text,
plainto_tsquery('english', query_text),
1 -- normalise by unique word count
)::float4
FROM paper_questions
WHERE id = ANY(candidate_ids);
$$;

View File

@@ -0,0 +1,2 @@
ALTER TABLE paper_questions
ADD COLUMN IF NOT EXISTS page_y_ratio NUMERIC;

View File

@@ -0,0 +1,27 @@
-- 008: Replace __SUPABASE_STORAGE_PUBLIC_BASE_URL__ placeholder in paper URLs
--
-- The course-library seed (comp2211_course_library_papers.sql) was inserted
-- without substituting the placeholder. This migration replaces it with the
-- real Supabase Storage public base URL for the `papers` bucket.
UPDATE papers
SET paper_file_url = REPLACE(
paper_file_url,
'__SUPABASE_STORAGE_PUBLIC_BASE_URL__',
'https://pvcxipwovpwrurebouwg.supabase.co/storage/v1/object/public/papers'
)
WHERE paper_file_url LIKE '%__SUPABASE_STORAGE_PUBLIC_BASE_URL__%';
UPDATE papers
SET answer_file_url = REPLACE(
answer_file_url,
'__SUPABASE_STORAGE_PUBLIC_BASE_URL__',
'https://pvcxipwovpwrurebouwg.supabase.co/storage/v1/object/public/papers'
)
WHERE answer_file_url LIKE '%__SUPABASE_STORAGE_PUBLIC_BASE_URL__%';
-- Verify: should return 0 rows
SELECT id, course_code, year, term, exam_type, paper_file_url, answer_file_url
FROM papers
WHERE paper_file_url LIKE '%__SUPABASE_STORAGE_PUBLIC_BASE_URL__%'
OR answer_file_url LIKE '%__SUPABASE_STORAGE_PUBLIC_BASE_URL__%';

Some files were not shown because too many files have changed in this diff Show More