docs: bootstrap tutor planning

2026-04-26 15:35:26 +09:00
commit 9edaddd092
25 changed files with 2205 additions and 0 deletions
--- a/docs/planning/ARCHITECTURE.md
+++ b/docs/planning/ARCHITECTURE.md
@@ -0,0 +1,157 @@
+# Tutor Platform Architecture
+
+## System Shape
+
+The platform is a web service built around workflow-driven tutoring and
+structured learner memory.
+
+```text
+Web App
+  Student interview practice
+  Review plan
+  Readiness map
+  Challenge ladder
+  Material ingestion
+  Asset review
+
+API Backend
+  Go service
+  Auth and accounts
+  Learning sessions
+  Interview questions
+  Learner memory
+  Ontology and source evidence
+  Asset generation jobs
+
+Workflow Runtime
+  internalized agent-farm-go workflow substrate
+  YAML/config-authored workflow definitions
+  diagnostic interview
+  answer grading
+  memory extraction
+  ontology analysis
+  review-plan generation
+  asset prompt generation
+  progression and challenge selection
+
+LLM Kernel
+  third-one
+  default model_key: deepseek-v4-flash
+
+Memory and Knowledge
+  learner memory tables
+  ontology graph tables
+  source evidence ledger
+  generated asset lineage
+```
+
+## Workflow Responsibilities
+
+Use a Go backend as the product service boundary and internalize
+`agent-farm-go` workflow patterns there. Workflow behavior should still be
+configuration-first: prefer YAML/config composition for agent behavior and only
+add code when a capability cannot be expressed through existing workflow or
+runtime-loadable node patterns.
+
+Implementation should follow the engineering rules in
+`docs/planning/ENGINEERING.md`: no manually authored source file over 600 lines,
+SOLID responsibility boundaries, KISS implementation choices, and YAGNI for
+future-only abstractions.
+
+Initial workflow set:
+
+- `diagnose_job_seeker`
+- `generate_interview_question`
+- `grade_interview_answer`
+- `ask_followup_question`
+- `extract_learning_memory`
+- `build_review_plan`
+- `select_next_challenge`
+- `update_readiness_map`
+- `award_learning_progress`
+- `ingest_learning_material`
+- `build_learning_ontology`
+- `detect_ontology_gaps`
+- `generate_teaching_asset_prompt`
+- `verify_generated_learning_asset`
+
+## Gamification Strategy
+
+Game-inspired engagement should live on top of learner memory and evidence, not
+beside it. The product should not award progress just for time spent. Progress
+is earned through answer quality, misconception repair, review completion, and
+successful transfer to harder interview scenarios.
+
+Core progression surfaces:
+
+- readiness map by role and concept
+- challenge ladder per concept
+- short daily interview loops
+- boss questions for integrated concept clusters
+- strong-answer portfolio
+- interview-date campaign plan
+
+Progression decisions should read from learner memory and grading evidence.
+They should be exposed as workflow outputs so the service can explain why a
+question, reward, or unlock appeared.
+
+## LLM Runtime
+
+Use `third-one` as the bounded model execution kernel. The default target is
+`deepseek-v4-flash` through runtime configuration. Product workflows should pass
+explicit task contracts and consume typed outputs rather than relying on freeform
+assistant prose.
+
+The Go backend should call the workflow/runtime layer through narrow typed
+interfaces. Product domain code should not shell out ad hoc from handlers or
+parse arbitrary assistant text to mutate learner state.
+
+## Memory Strategy
+
+Do not make RAG the product center. Retrieval can support evidence lookup, but
+the durable product memory should be structured:
+
+- learner profile
+- concept mastery
+- misconceptions
+- practice evidence
+- intervention history
+- spaced review schedule
+- readiness progression
+- challenge history
+
+MemPalace can inform temporal, scoped, evidence-preserving memory design.
+Graphify can inform ontology extraction from mixed source materials. The service
+should own its privacy, review, tenant, and deletion semantics directly.
+
+## Ontology Strategy
+
+Uploaded materials should produce a learning graph:
+
+- concepts
+- prerequisites
+- examples
+- interview questions
+- rubrics
+- source evidence
+- missing areas
+- generated candidate assets
+
+Every inferred or generated node should carry provenance and review state.
+
+## Visual Asset Strategy
+
+Use image generation behind a provider abstraction. Product language may call
+the desired provider key `gpt-image-v2`, but implementation must confirm the
+current OpenAI model identifier and API surface before production wiring.
+
+Generated asset types:
+
+- concept diagrams
+- slide-like lesson slices
+- interview explanation cards
+- worksheet visuals
+- analogy images
+
+Each asset should store prompt, source concept, source evidence, model config,
+generation time, review state, and usage context.
--- a/docs/planning/ENGINEERING.md
+++ b/docs/planning/ENGINEERING.md
@@ -0,0 +1,121 @@
+# Engineering Principles
+
+## Purpose
+
+This document defines how the tutor platform should be implemented once coding
+begins. The product is expected to grow across web UI, API backend, workflow
+orchestration, learner memory, ontology processing, and generated learning
+assets. Without explicit constraints, those areas can easily collapse into large
+files and overbuilt abstractions.
+
+## Core Rules
+
+### 0. Backend language
+
+The backend is Go.
+
+This decision aligns the service with `agent-farm-go` so workflow orchestration
+can become an internal product capability rather than a loosely attached
+automation script. Keep the Go service modular and avoid turning the backend
+into one large workflow coordinator.
+
+### 1. File size limit
+
+No source file should exceed 600 lines.
+
+When a file approaches the limit, split it by responsibility. Good split points:
+
+- route registration vs handler logic
+- handler logic vs service logic
+- service orchestration vs domain rules
+- domain rules vs persistence adapter
+- workflow contract definitions vs workflow execution
+- UI page shell vs reusable components
+
+Exemptions:
+
+- generated files
+- lockfiles
+- vendored files
+- external source snapshots
+- large fixture data
+
+### 2. SOLID
+
+Apply SOLID pragmatically:
+
+- Single responsibility: learner memory, ontology, grading, progression, and
+  asset generation should not live in one service.
+- Open/closed: add new interview tracks or asset types through data/config and
+  narrow extension points where practical.
+- Liskov substitution: adapters should honor shared contracts without hidden
+  behavior changes.
+- Interface segregation: avoid giant service interfaces.
+- Dependency inversion: domain logic should not depend directly on provider SDKs
+  or database details.
+
+### 3. KISS
+
+Prefer the simplest implementation that proves the current product loop:
+
+```text
+question -> answer -> grading -> memory update -> next challenge
+```
+
+Do not introduce queues, distributed workers, plugin systems, or complex
+multi-agent orchestration until the MVP loop needs them.
+
+### 4. YAGNI
+
+Do not build features only because they may be useful later.
+
+Examples to defer until proven necessary:
+
+- multi-school LMS administration
+- marketplace course publishing
+- company-specific interview packs
+- generalized ontology editor
+- multiple image providers
+- complex economy systems
+- social leaderboards
+
+## Product Module Boundaries
+
+Initial implementation should keep these responsibilities distinct:
+
+- `auth`: users, sessions, identity providers.
+- `interview`: questions, rubrics, answers, grading records.
+- `learner_memory`: profiles, concept mastery, misconceptions, evidence.
+- `ontology`: concepts, prerequisites, source evidence, generated candidates.
+- `progression`: readiness maps, challenge ladders, boss questions, streaks.
+- `workflows`: typed contracts and calls into `agent-farm-go` / `third-one`.
+- `assets`: generated diagrams, lesson slices, prompt lineage, review state.
+
+The names can change with the chosen stack, but the responsibilities should stay
+separate.
+
+## Workflow Contracts
+
+LLM workflow outputs should be typed and inspectable. Avoid relying on freeform
+assistant prose for product state changes.
+
+First contracts to define:
+
+- `DiagnosticResult`
+- `GradedAnswer`
+- `MemoryUpdateCandidate`
+- `NextChallenge`
+- `ReadinessUpdate`
+- `OntologyGap`
+- `TeachingAssetPrompt`
+
+## Review Checklist
+
+Before considering an implementation slice complete:
+
+- No manually authored source file exceeds 600 lines.
+- New behavior maps to an OpenSpec requirement or updates OpenSpec.
+- The implementation keeps responsibilities separated.
+- The simplest useful design was chosen.
+- No future-only abstraction was added.
+- Tests or smoke checks prove the touched behavior.
--- a/docs/planning/GAMIFICATION.md
+++ b/docs/planning/GAMIFICATION.md
@@ -0,0 +1,166 @@
+# Learning Gamification Design
+
+## Source Reference
+
+This note adapts ideas from the user-provided game-design summary:
+
+Attached game-design markdown summary provided by the user.
+
+The product should use game design to create healthy learning persistence, not
+exploitative compulsion. The goal is a strong achievement loop that makes users
+want to return because they feel measurable progress toward interview readiness.
+
+## Design Translation
+
+### Experience before content volume
+
+The source material emphasizes that content is surface and experience is the
+core. For this product, a large question bank is not enough. The core experience
+must be:
+
+- "I know what I am weak at."
+- "The next question is exactly the right challenge."
+- "I can feel myself becoming interview-ready."
+- "Every session ends with a clear win and a next step."
+
+### Flow and adaptive difficulty
+
+Learning sessions should target a flow band:
+
+- too easy: boredom and low trust
+- too hard: shame, avoidance, and churn
+- just above current ability: useful struggle and pride
+
+The tutor should adjust difficulty using learner memory:
+
+- lower difficulty after repeated failure
+- increase specificity after vague but correct answers
+- add time pressure only after concept mastery is stable
+- switch from recall to applied scenario questions as mastery rises
+- insert recovery questions after a hard miss
+
+### Learning main loop
+
+Use a repeated loop similar to challenge, action, reward:
+
+```text
+Readiness goal
+  -> interview question
+  -> user answer
+  -> rubric feedback
+  -> follow-up or correction
+  -> memory update
+  -> visible progress
+  -> next best challenge
+```
+
+This loop should be short enough to complete in 5-10 minutes, with optional
+longer sessions composed from multiple loops.
+
+### Expectation curve
+
+Each session needs a visible promise:
+
+- today's target concept
+- expected time
+- reward or unlock
+- interview readiness impact
+- next milestone preview
+
+The product should maintain open loops carefully:
+
+- show what the next unlock or milestone is
+- avoid creating many unfinished tasks at once
+- close each session with a concrete result
+
+### Growth lines
+
+Use two growth lines:
+
+1. Permanent mastery growth
+   - concept mastery
+   - misconception resolved
+   - interview skill badges
+   - portfolio of strong answers
+
+2. Seasonal or campaign growth
+   - weekly interview sprint
+   - target-company prep campaign
+   - stack-specific challenge ladder
+   - mock interview streak
+
+Permanent growth provides long-term identity. Campaign growth provides freshness
+without erasing real learning progress.
+
+## Product Systems
+
+### Readiness Map
+
+A role-specific map that shows concept readiness:
+
+- unknown
+- fragile
+- improving
+- interview-ready
+- strong signal
+
+### Challenge Ladder
+
+Each concept gets a ladder:
+
+1. define
+2. explain tradeoffs
+3. debug a scenario
+4. design under constraints
+5. answer under interview pressure
+
+### Boss Questions
+
+After a cluster of concepts is stable, the user gets a boss-style integrated
+question. Example:
+
+"Design a rate-limited API endpoint with database transactions, cache behavior,
+failure handling, and test strategy."
+
+### Reward Types
+
+Prefer meaningful rewards:
+
+- readiness percentage
+- concept unlocks
+- strong answer saved to portfolio
+- mock interview token
+- new scenario type unlocked
+- visual certificate for a completed track
+- generated review card or diagram
+
+Avoid rewards that are disconnected from learning value.
+
+### Session Ending
+
+Every session should end with a strong closure:
+
+- one thing improved
+- one misconception discovered or resolved
+- one recommended next step
+- one visible progress change
+
+## Safety Rules
+
+- Do not use gambling-like random rewards as the primary motivator.
+- Do not punish users for missing a day.
+- Do not hide progress behind manipulative scarcity.
+- Do not optimize only for time-on-site.
+- Do not create shame-based leaderboards.
+- Prefer mastery, autonomy, competence, and readiness over compulsion.
+
+## MVP Gamification Features
+
+- Daily 10-minute interview loop.
+- Role readiness map.
+- Concept challenge ladder.
+- Streak with grace days, not punishment.
+- Boss question after each concept cluster.
+- Strong-answer portfolio.
+- Session-end progress summary.
+- Review campaign for interview date countdown.
--- a/docs/planning/PRD.md
+++ b/docs/planning/PRD.md
@@ -0,0 +1,269 @@
+# Tutor Platform PRD
+
+## Product Thesis
+
+Build a web service that helps software job seekers prepare for technical
+interviews through adaptive tutoring, interview-question practice, and a
+student-specific learning memory. The first market is developers preparing for
+employment or career transition. The platform should later expand to general
+students by reusing the same curriculum, ontology, assessment, and tutoring
+workflow foundations.
+
+This is not a classic RAG chatbot. Source materials are ingested as evidence,
+analyzed into a learning ontology, checked for missing or weak areas, verified,
+and then transformed into structured learning material, practice questions, and
+teaching aids.
+
+## Target Users
+
+### Primary: software job seekers
+
+- Bootcamp graduates preparing for interviews.
+- Junior developers preparing for first jobs.
+- Developers changing stacks.
+- Experienced developers preparing for system design, backend, frontend, data,
+  AI, DevOps, or language-specific interviews.
+
+### Secondary: general students, later
+
+- Students who need adaptive study plans.
+- Teachers or parents who need progress summaries.
+- Institutions that want a private learning memory and curriculum engine.
+
+## Problem
+
+Job seekers have abundant content but weak feedback loops:
+
+- They do not know which concepts they truly understand.
+- They memorize interview answers without building transferable understanding.
+- Existing tools give generic questions, not diagnosis-based practice.
+- Study materials are fragmented across notes, PDFs, slides, videos, blogs, and
+  repositories.
+- Progress is hard to measure across concepts, mistakes, and repeated sessions.
+
+## Product Goals
+
+- Provide interview-first tutoring for software job seekers.
+- Build a durable learner model from every practice answer and tutor session.
+- Convert uploaded materials into a concept ontology and verified study assets.
+- Detect missing, weak, outdated, or unverified parts of a learning corpus.
+- Generate practice questions, explanations, review plans, and visual teaching
+  aids from the verified ontology.
+- Use game-inspired progress loops to make learning feel rewarding, repeatable,
+  and visibly connected to interview readiness.
+- Keep the architecture reusable for future general-student learning flows.
+
+## Technology Direction
+
+- Backend: Go.
+- Workflow substrate: internalize `agent-farm-go` patterns and execution
+  contracts into the backend boundary instead of treating workflow execution as
+  a loose external script.
+- LLM kernel: `third-one`, defaulting to `deepseek-v4-flash` through runtime
+  configuration.
+- Frontend: web-first. The exact frontend stack remains open until the first UI
+  implementation slice, but it should stay lightweight and product-focused.
+
+## Non-Goals for MVP
+
+- Full school LMS replacement.
+- Marketplace for courses.
+- Automatic certification or hiring decisions.
+- Broad multi-subject K-12 coverage.
+- Unverified autonomous content publishing.
+
+## MVP Scope
+
+The first MVP should prove one loop:
+
+1. User chooses a target role and stack.
+2. Platform runs a diagnostic technical interview.
+3. Tutor asks follow-up questions based on weak answers.
+4. System extracts concept mastery, misconceptions, and evidence.
+5. User receives a focused review plan.
+6. User repeats practice with generated interview questions.
+7. User sees visible readiness progress, next unlocks, and a recommended next
+   challenge.
+
+Recommended first track:
+
+- Backend developer interview preparation.
+- Topics: HTTP, REST, databases, transactions, caching, concurrency, testing,
+  system design basics, Go or JavaScript/TypeScript depending on first content
+  corpus.
+
+## Core User Flows
+
+### Diagnostic interview
+
+The user selects role, stack, target company type, and interview date. The
+system asks a short series of adaptive questions, grades answers, identifies
+weak concepts, and creates an initial study map.
+
+### Practice session
+
+The tutor asks one interview question at a time, requests the user's answer,
+grades it against a rubric, asks follow-ups, and records learning evidence.
+
+### Review plan
+
+After each session, the system creates a concise plan:
+
+- concepts to review
+- mistakes to fix
+- next practice questions
+- suggested study order
+- estimated readiness
+
+### Gamified learning routine
+
+The user follows a short loop:
+
+1. choose or accept today's target
+2. answer one interview question
+3. receive rubric feedback
+4. handle one follow-up or correction
+5. see memory and readiness progress
+6. unlock the next challenge or review card
+
+The loop should feel like a game challenge ladder, but its rewards must be tied
+to real learning evidence. The product should favor mastery, autonomy, and
+readiness over empty points or exploitative streak pressure.
+
+### Material ingestion
+
+The user or operator uploads PDFs, notes, slides, docs, links, code snippets, or
+existing interview-question sets. The system analyzes them into a concept graph,
+detects missing prerequisites, flags weak evidence, and proposes generated
+study assets.
+
+### Teaching-aid generation
+
+For concepts that need visual explanation, the system generates images,
+slide-like lesson slices, diagrams, and worksheet-style teaching aids through
+the configured image generation provider.
+
+## Functional Requirements
+
+### Interview question engine
+
+- The system SHALL generate role-specific technical interview questions.
+- The system SHALL support difficulty levels and follow-up questions.
+- The system SHALL grade answers with rubric-backed evidence.
+- The system SHALL separate factual correctness, depth, communication clarity,
+  and production judgment.
+- The system SHALL keep original user answers as evidence for later review.
+
+### Learner memory
+
+- The system SHALL maintain a per-user learner profile.
+- The system SHALL track concept mastery over time.
+- The system SHALL track recurring misconceptions and weak reasoning patterns.
+- The system SHALL store evidence for every memory update.
+- The system SHALL distinguish durable memory from temporary session context.
+
+### Ontology builder
+
+- The system SHALL ingest source materials into a learning ontology.
+- The system SHALL represent concepts, prerequisites, examples, questions,
+  rubrics, and source evidence as separate entities.
+- The system SHALL detect missing prerequisite concepts.
+- The system SHALL flag generated or inferred content that lacks source support.
+- The system SHALL support human review before promoted learning assets become
+  canonical.
+
+### Tutor workflows
+
+- The system SHALL run tutoring behavior through configurable LLM workflows.
+- The system SHALL use `agent-farm-go` as the workflow orchestration substrate.
+- The backend SHALL be implemented in Go so the service can internalize
+  `agent-farm-go` workflow patterns and contracts directly.
+- The system SHALL use `third-one` as the LLM execution kernel.
+- The default LLM runtime SHALL target `deepseek-v4-flash` unless changed by
+  configuration.
+- Workflow outputs SHALL prefer typed JSON contracts for grading, memory
+  extraction, ontology updates, and review-plan generation.
+
+### Visual teaching assets
+
+- The system SHALL generate educational visual assets for selected concepts.
+- The system SHALL support slide-like lesson slices, diagrams, worksheets, and
+  interview explanation cards.
+- Image generation SHALL be behind a provider/model configuration key. The
+  initial product intent is `gpt-image-v2`; implementation must verify the
+  actual OpenAI model identifier before wiring production calls.
+- Generated assets SHALL keep source links, prompt lineage, and review state.
+
+### Engagement and progression
+
+- The system SHALL expose a role-specific readiness map.
+- The system SHALL organize concepts into challenge ladders from definition to
+  pressure-tested interview answers.
+- The system SHALL provide short daily or session-based learning loops.
+- The system SHALL use adaptive difficulty to keep questions near the user's
+  current ability.
+- The system SHALL provide meaningful rewards such as concept readiness,
+  strong-answer portfolio entries, boss questions, review cards, or visual
+  completion assets.
+- The system SHALL avoid gambling-like random rewards, shame-based leaderboards,
+  and punitive streak loss.
+
+## Memory Model
+
+The memory layer should store structured learning state, not just retrieved
+text chunks.
+
+Core memory objects:
+
+- `LearnerProfile`: target role, stack, timeline, preferences.
+- `ConceptMastery`: concept-level state such as unknown, fragile, improving, or
+  mastered.
+- `Misconception`: recurring wrong model or reasoning pattern.
+- `Evidence`: original answer, quiz result, source passage, or tutor note that
+  supports a memory update.
+- `Intervention`: explanation, hint, visual, analogy, or practice type that was
+  tried and whether it helped.
+- `ReviewSchedule`: when and why a concept should be revisited.
+
+MemPalace is useful as a reference for scoped, temporal, evidence-preserving
+memory. Graphify is useful as a reference for building queryable knowledge
+graphs from mixed materials. The product memory should still be implemented as
+an application-owned data model because learner privacy, tenant boundaries,
+review states, and deletion policies are product requirements.
+
+## Success Metrics
+
+- A user can complete a diagnostic interview in under 15 minutes.
+- The system produces concept weaknesses that match human reviewer judgment.
+- Generated follow-up questions target the user's actual weak points.
+- The user can see progress across repeated practice sessions.
+- Users complete repeated 5-10 minute learning loops without needing manual
+  planning.
+- Readiness progress corresponds to actual graded answer evidence.
+- Every durable memory update has inspectable evidence.
+- Uploaded material produces a useful concept graph and a list of missing or
+  weak areas.
+- Generated learning assets are reviewable before becoming canonical.
+
+## Risks
+
+- The tutor may overstate correctness or readiness.
+- Generated ontology edges may look plausible but lack evidence.
+- Job seekers may want company-specific interview prep before the core learning
+  loop is reliable.
+- Image generation model names and API capabilities may change.
+- Learner data can become sensitive, especially when expanding to minors or
+  school contexts.
+
+## Open Questions
+
+- Which stack should be the first interview track: backend Go, backend
+  Java/Spring, frontend React, or full-stack TypeScript?
+- Should users upload their resume first, or should the first session start
+  with role/stack selection only?
+- How much human review is required before generated ontology content becomes
+  canonical?
+- Should teacher/operator review exist in MVP, or only after the job-seeker loop
+  is proven?
+- Which progression surface should ship first: readiness map, challenge ladder,
+  strong-answer portfolio, or interview-date campaign?