docs: bootstrap tutor planning

This commit is contained in:
user
2026-04-26 15:35:26 +09:00
commit 9edaddd092
25 changed files with 2205 additions and 0 deletions

View File

@@ -0,0 +1,157 @@
# Tutor Platform Architecture
## System Shape
The platform is a web service built around workflow-driven tutoring and
structured learner memory.
```text
Web App
Student interview practice
Review plan
Readiness map
Challenge ladder
Material ingestion
Asset review
API Backend
Go service
Auth and accounts
Learning sessions
Interview questions
Learner memory
Ontology and source evidence
Asset generation jobs
Workflow Runtime
internalized agent-farm-go workflow substrate
YAML/config-authored workflow definitions
diagnostic interview
answer grading
memory extraction
ontology analysis
review-plan generation
asset prompt generation
progression and challenge selection
LLM Kernel
third-one
default model_key: deepseek-v4-flash
Memory and Knowledge
learner memory tables
ontology graph tables
source evidence ledger
generated asset lineage
```
## Workflow Responsibilities
Use a Go backend as the product service boundary and internalize
`agent-farm-go` workflow patterns there. Workflow behavior should still be
configuration-first: prefer YAML/config composition for agent behavior and only
add code when a capability cannot be expressed through existing workflow or
runtime-loadable node patterns.
Implementation should follow the engineering rules in
`docs/planning/ENGINEERING.md`: no manually authored source file over 600 lines,
SOLID responsibility boundaries, KISS implementation choices, and YAGNI for
future-only abstractions.
Initial workflow set:
- `diagnose_job_seeker`
- `generate_interview_question`
- `grade_interview_answer`
- `ask_followup_question`
- `extract_learning_memory`
- `build_review_plan`
- `select_next_challenge`
- `update_readiness_map`
- `award_learning_progress`
- `ingest_learning_material`
- `build_learning_ontology`
- `detect_ontology_gaps`
- `generate_teaching_asset_prompt`
- `verify_generated_learning_asset`
## Gamification Strategy
Game-inspired engagement should live on top of learner memory and evidence, not
beside it. The product should not award progress just for time spent. Progress
is earned through answer quality, misconception repair, review completion, and
successful transfer to harder interview scenarios.
Core progression surfaces:
- readiness map by role and concept
- challenge ladder per concept
- short daily interview loops
- boss questions for integrated concept clusters
- strong-answer portfolio
- interview-date campaign plan
Progression decisions should read from learner memory and grading evidence.
They should be exposed as workflow outputs so the service can explain why a
question, reward, or unlock appeared.
## LLM Runtime
Use `third-one` as the bounded model execution kernel. The default target is
`deepseek-v4-flash` through runtime configuration. Product workflows should pass
explicit task contracts and consume typed outputs rather than relying on freeform
assistant prose.
The Go backend should call the workflow/runtime layer through narrow typed
interfaces. Product domain code should not shell out ad hoc from handlers or
parse arbitrary assistant text to mutate learner state.
## Memory Strategy
Do not make RAG the product center. Retrieval can support evidence lookup, but
the durable product memory should be structured:
- learner profile
- concept mastery
- misconceptions
- practice evidence
- intervention history
- spaced review schedule
- readiness progression
- challenge history
MemPalace can inform temporal, scoped, evidence-preserving memory design.
Graphify can inform ontology extraction from mixed source materials. The service
should own its privacy, review, tenant, and deletion semantics directly.
## Ontology Strategy
Uploaded materials should produce a learning graph:
- concepts
- prerequisites
- examples
- interview questions
- rubrics
- source evidence
- missing areas
- generated candidate assets
Every inferred or generated node should carry provenance and review state.
## Visual Asset Strategy
Use image generation behind a provider abstraction. Product language may call
the desired provider key `gpt-image-v2`, but implementation must confirm the
current OpenAI model identifier and API surface before production wiring.
Generated asset types:
- concept diagrams
- slide-like lesson slices
- interview explanation cards
- worksheet visuals
- analogy images
Each asset should store prompt, source concept, source evidence, model config,
generation time, review state, and usage context.

View File

@@ -0,0 +1,121 @@
# Engineering Principles
## Purpose
This document defines how the tutor platform should be implemented once coding
begins. The product is expected to grow across web UI, API backend, workflow
orchestration, learner memory, ontology processing, and generated learning
assets. Without explicit constraints, those areas can easily collapse into large
files and overbuilt abstractions.
## Core Rules
### 0. Backend language
The backend is Go.
This decision aligns the service with `agent-farm-go` so workflow orchestration
can become an internal product capability rather than a loosely attached
automation script. Keep the Go service modular and avoid turning the backend
into one large workflow coordinator.
### 1. File size limit
No source file should exceed 600 lines.
When a file approaches the limit, split it by responsibility. Good split points:
- route registration vs handler logic
- handler logic vs service logic
- service orchestration vs domain rules
- domain rules vs persistence adapter
- workflow contract definitions vs workflow execution
- UI page shell vs reusable components
Exemptions:
- generated files
- lockfiles
- vendored files
- external source snapshots
- large fixture data
### 2. SOLID
Apply SOLID pragmatically:
- Single responsibility: learner memory, ontology, grading, progression, and
asset generation should not live in one service.
- Open/closed: add new interview tracks or asset types through data/config and
narrow extension points where practical.
- Liskov substitution: adapters should honor shared contracts without hidden
behavior changes.
- Interface segregation: avoid giant service interfaces.
- Dependency inversion: domain logic should not depend directly on provider SDKs
or database details.
### 3. KISS
Prefer the simplest implementation that proves the current product loop:
```text
question -> answer -> grading -> memory update -> next challenge
```
Do not introduce queues, distributed workers, plugin systems, or complex
multi-agent orchestration until the MVP loop needs them.
### 4. YAGNI
Do not build features only because they may be useful later.
Examples to defer until proven necessary:
- multi-school LMS administration
- marketplace course publishing
- company-specific interview packs
- generalized ontology editor
- multiple image providers
- complex economy systems
- social leaderboards
## Product Module Boundaries
Initial implementation should keep these responsibilities distinct:
- `auth`: users, sessions, identity providers.
- `interview`: questions, rubrics, answers, grading records.
- `learner_memory`: profiles, concept mastery, misconceptions, evidence.
- `ontology`: concepts, prerequisites, source evidence, generated candidates.
- `progression`: readiness maps, challenge ladders, boss questions, streaks.
- `workflows`: typed contracts and calls into `agent-farm-go` / `third-one`.
- `assets`: generated diagrams, lesson slices, prompt lineage, review state.
The names can change with the chosen stack, but the responsibilities should stay
separate.
## Workflow Contracts
LLM workflow outputs should be typed and inspectable. Avoid relying on freeform
assistant prose for product state changes.
First contracts to define:
- `DiagnosticResult`
- `GradedAnswer`
- `MemoryUpdateCandidate`
- `NextChallenge`
- `ReadinessUpdate`
- `OntologyGap`
- `TeachingAssetPrompt`
## Review Checklist
Before considering an implementation slice complete:
- No manually authored source file exceeds 600 lines.
- New behavior maps to an OpenSpec requirement or updates OpenSpec.
- The implementation keeps responsibilities separated.
- The simplest useful design was chosen.
- No future-only abstraction was added.
- Tests or smoke checks prove the touched behavior.

View File

@@ -0,0 +1,166 @@
# Learning Gamification Design
## Source Reference
This note adapts ideas from the user-provided game-design summary:
Attached game-design markdown summary provided by the user.
The product should use game design to create healthy learning persistence, not
exploitative compulsion. The goal is a strong achievement loop that makes users
want to return because they feel measurable progress toward interview readiness.
## Design Translation
### Experience before content volume
The source material emphasizes that content is surface and experience is the
core. For this product, a large question bank is not enough. The core experience
must be:
- "I know what I am weak at."
- "The next question is exactly the right challenge."
- "I can feel myself becoming interview-ready."
- "Every session ends with a clear win and a next step."
### Flow and adaptive difficulty
Learning sessions should target a flow band:
- too easy: boredom and low trust
- too hard: shame, avoidance, and churn
- just above current ability: useful struggle and pride
The tutor should adjust difficulty using learner memory:
- lower difficulty after repeated failure
- increase specificity after vague but correct answers
- add time pressure only after concept mastery is stable
- switch from recall to applied scenario questions as mastery rises
- insert recovery questions after a hard miss
### Learning main loop
Use a repeated loop similar to challenge, action, reward:
```text
Readiness goal
-> interview question
-> user answer
-> rubric feedback
-> follow-up or correction
-> memory update
-> visible progress
-> next best challenge
```
This loop should be short enough to complete in 5-10 minutes, with optional
longer sessions composed from multiple loops.
### Expectation curve
Each session needs a visible promise:
- today's target concept
- expected time
- reward or unlock
- interview readiness impact
- next milestone preview
The product should maintain open loops carefully:
- show what the next unlock or milestone is
- avoid creating many unfinished tasks at once
- close each session with a concrete result
### Growth lines
Use two growth lines:
1. Permanent mastery growth
- concept mastery
- misconception resolved
- interview skill badges
- portfolio of strong answers
2. Seasonal or campaign growth
- weekly interview sprint
- target-company prep campaign
- stack-specific challenge ladder
- mock interview streak
Permanent growth provides long-term identity. Campaign growth provides freshness
without erasing real learning progress.
## Product Systems
### Readiness Map
A role-specific map that shows concept readiness:
- unknown
- fragile
- improving
- interview-ready
- strong signal
### Challenge Ladder
Each concept gets a ladder:
1. define
2. explain tradeoffs
3. debug a scenario
4. design under constraints
5. answer under interview pressure
### Boss Questions
After a cluster of concepts is stable, the user gets a boss-style integrated
question. Example:
"Design a rate-limited API endpoint with database transactions, cache behavior,
failure handling, and test strategy."
### Reward Types
Prefer meaningful rewards:
- readiness percentage
- concept unlocks
- strong answer saved to portfolio
- mock interview token
- new scenario type unlocked
- visual certificate for a completed track
- generated review card or diagram
Avoid rewards that are disconnected from learning value.
### Session Ending
Every session should end with a strong closure:
- one thing improved
- one misconception discovered or resolved
- one recommended next step
- one visible progress change
## Safety Rules
- Do not use gambling-like random rewards as the primary motivator.
- Do not punish users for missing a day.
- Do not hide progress behind manipulative scarcity.
- Do not optimize only for time-on-site.
- Do not create shame-based leaderboards.
- Prefer mastery, autonomy, competence, and readiness over compulsion.
## MVP Gamification Features
- Daily 10-minute interview loop.
- Role readiness map.
- Concept challenge ladder.
- Streak with grace days, not punishment.
- Boss question after each concept cluster.
- Strong-answer portfolio.
- Session-end progress summary.
- Review campaign for interview date countdown.

269
docs/planning/PRD.md Normal file
View File

@@ -0,0 +1,269 @@
# Tutor Platform PRD
## Product Thesis
Build a web service that helps software job seekers prepare for technical
interviews through adaptive tutoring, interview-question practice, and a
student-specific learning memory. The first market is developers preparing for
employment or career transition. The platform should later expand to general
students by reusing the same curriculum, ontology, assessment, and tutoring
workflow foundations.
This is not a classic RAG chatbot. Source materials are ingested as evidence,
analyzed into a learning ontology, checked for missing or weak areas, verified,
and then transformed into structured learning material, practice questions, and
teaching aids.
## Target Users
### Primary: software job seekers
- Bootcamp graduates preparing for interviews.
- Junior developers preparing for first jobs.
- Developers changing stacks.
- Experienced developers preparing for system design, backend, frontend, data,
AI, DevOps, or language-specific interviews.
### Secondary: general students, later
- Students who need adaptive study plans.
- Teachers or parents who need progress summaries.
- Institutions that want a private learning memory and curriculum engine.
## Problem
Job seekers have abundant content but weak feedback loops:
- They do not know which concepts they truly understand.
- They memorize interview answers without building transferable understanding.
- Existing tools give generic questions, not diagnosis-based practice.
- Study materials are fragmented across notes, PDFs, slides, videos, blogs, and
repositories.
- Progress is hard to measure across concepts, mistakes, and repeated sessions.
## Product Goals
- Provide interview-first tutoring for software job seekers.
- Build a durable learner model from every practice answer and tutor session.
- Convert uploaded materials into a concept ontology and verified study assets.
- Detect missing, weak, outdated, or unverified parts of a learning corpus.
- Generate practice questions, explanations, review plans, and visual teaching
aids from the verified ontology.
- Use game-inspired progress loops to make learning feel rewarding, repeatable,
and visibly connected to interview readiness.
- Keep the architecture reusable for future general-student learning flows.
## Technology Direction
- Backend: Go.
- Workflow substrate: internalize `agent-farm-go` patterns and execution
contracts into the backend boundary instead of treating workflow execution as
a loose external script.
- LLM kernel: `third-one`, defaulting to `deepseek-v4-flash` through runtime
configuration.
- Frontend: web-first. The exact frontend stack remains open until the first UI
implementation slice, but it should stay lightweight and product-focused.
## Non-Goals for MVP
- Full school LMS replacement.
- Marketplace for courses.
- Automatic certification or hiring decisions.
- Broad multi-subject K-12 coverage.
- Unverified autonomous content publishing.
## MVP Scope
The first MVP should prove one loop:
1. User chooses a target role and stack.
2. Platform runs a diagnostic technical interview.
3. Tutor asks follow-up questions based on weak answers.
4. System extracts concept mastery, misconceptions, and evidence.
5. User receives a focused review plan.
6. User repeats practice with generated interview questions.
7. User sees visible readiness progress, next unlocks, and a recommended next
challenge.
Recommended first track:
- Backend developer interview preparation.
- Topics: HTTP, REST, databases, transactions, caching, concurrency, testing,
system design basics, Go or JavaScript/TypeScript depending on first content
corpus.
## Core User Flows
### Diagnostic interview
The user selects role, stack, target company type, and interview date. The
system asks a short series of adaptive questions, grades answers, identifies
weak concepts, and creates an initial study map.
### Practice session
The tutor asks one interview question at a time, requests the user's answer,
grades it against a rubric, asks follow-ups, and records learning evidence.
### Review plan
After each session, the system creates a concise plan:
- concepts to review
- mistakes to fix
- next practice questions
- suggested study order
- estimated readiness
### Gamified learning routine
The user follows a short loop:
1. choose or accept today's target
2. answer one interview question
3. receive rubric feedback
4. handle one follow-up or correction
5. see memory and readiness progress
6. unlock the next challenge or review card
The loop should feel like a game challenge ladder, but its rewards must be tied
to real learning evidence. The product should favor mastery, autonomy, and
readiness over empty points or exploitative streak pressure.
### Material ingestion
The user or operator uploads PDFs, notes, slides, docs, links, code snippets, or
existing interview-question sets. The system analyzes them into a concept graph,
detects missing prerequisites, flags weak evidence, and proposes generated
study assets.
### Teaching-aid generation
For concepts that need visual explanation, the system generates images,
slide-like lesson slices, diagrams, and worksheet-style teaching aids through
the configured image generation provider.
## Functional Requirements
### Interview question engine
- The system SHALL generate role-specific technical interview questions.
- The system SHALL support difficulty levels and follow-up questions.
- The system SHALL grade answers with rubric-backed evidence.
- The system SHALL separate factual correctness, depth, communication clarity,
and production judgment.
- The system SHALL keep original user answers as evidence for later review.
### Learner memory
- The system SHALL maintain a per-user learner profile.
- The system SHALL track concept mastery over time.
- The system SHALL track recurring misconceptions and weak reasoning patterns.
- The system SHALL store evidence for every memory update.
- The system SHALL distinguish durable memory from temporary session context.
### Ontology builder
- The system SHALL ingest source materials into a learning ontology.
- The system SHALL represent concepts, prerequisites, examples, questions,
rubrics, and source evidence as separate entities.
- The system SHALL detect missing prerequisite concepts.
- The system SHALL flag generated or inferred content that lacks source support.
- The system SHALL support human review before promoted learning assets become
canonical.
### Tutor workflows
- The system SHALL run tutoring behavior through configurable LLM workflows.
- The system SHALL use `agent-farm-go` as the workflow orchestration substrate.
- The backend SHALL be implemented in Go so the service can internalize
`agent-farm-go` workflow patterns and contracts directly.
- The system SHALL use `third-one` as the LLM execution kernel.
- The default LLM runtime SHALL target `deepseek-v4-flash` unless changed by
configuration.
- Workflow outputs SHALL prefer typed JSON contracts for grading, memory
extraction, ontology updates, and review-plan generation.
### Visual teaching assets
- The system SHALL generate educational visual assets for selected concepts.
- The system SHALL support slide-like lesson slices, diagrams, worksheets, and
interview explanation cards.
- Image generation SHALL be behind a provider/model configuration key. The
initial product intent is `gpt-image-v2`; implementation must verify the
actual OpenAI model identifier before wiring production calls.
- Generated assets SHALL keep source links, prompt lineage, and review state.
### Engagement and progression
- The system SHALL expose a role-specific readiness map.
- The system SHALL organize concepts into challenge ladders from definition to
pressure-tested interview answers.
- The system SHALL provide short daily or session-based learning loops.
- The system SHALL use adaptive difficulty to keep questions near the user's
current ability.
- The system SHALL provide meaningful rewards such as concept readiness,
strong-answer portfolio entries, boss questions, review cards, or visual
completion assets.
- The system SHALL avoid gambling-like random rewards, shame-based leaderboards,
and punitive streak loss.
## Memory Model
The memory layer should store structured learning state, not just retrieved
text chunks.
Core memory objects:
- `LearnerProfile`: target role, stack, timeline, preferences.
- `ConceptMastery`: concept-level state such as unknown, fragile, improving, or
mastered.
- `Misconception`: recurring wrong model or reasoning pattern.
- `Evidence`: original answer, quiz result, source passage, or tutor note that
supports a memory update.
- `Intervention`: explanation, hint, visual, analogy, or practice type that was
tried and whether it helped.
- `ReviewSchedule`: when and why a concept should be revisited.
MemPalace is useful as a reference for scoped, temporal, evidence-preserving
memory. Graphify is useful as a reference for building queryable knowledge
graphs from mixed materials. The product memory should still be implemented as
an application-owned data model because learner privacy, tenant boundaries,
review states, and deletion policies are product requirements.
## Success Metrics
- A user can complete a diagnostic interview in under 15 minutes.
- The system produces concept weaknesses that match human reviewer judgment.
- Generated follow-up questions target the user's actual weak points.
- The user can see progress across repeated practice sessions.
- Users complete repeated 5-10 minute learning loops without needing manual
planning.
- Readiness progress corresponds to actual graded answer evidence.
- Every durable memory update has inspectable evidence.
- Uploaded material produces a useful concept graph and a list of missing or
weak areas.
- Generated learning assets are reviewable before becoming canonical.
## Risks
- The tutor may overstate correctness or readiness.
- Generated ontology edges may look plausible but lack evidence.
- Job seekers may want company-specific interview prep before the core learning
loop is reliable.
- Image generation model names and API capabilities may change.
- Learner data can become sensitive, especially when expanding to minors or
school contexts.
## Open Questions
- Which stack should be the first interview track: backend Go, backend
Java/Spring, frontend React, or full-stack TypeScript?
- Should users upload their resume first, or should the first session start
with role/stack selection only?
- How much human review is required before generated ontology content becomes
canonical?
- Should teacher/operator review exist in MVP, or only after the job-seeker loop
is proven?
- Which progression surface should ship first: readiness map, challenge ladder,
strong-answer portfolio, or interview-date campaign?