feat: add ontology material ingestion

This commit is contained in:
user
2026-04-26 17:49:35 +09:00
parent a413f1ef15
commit 4936cdf4c9
19 changed files with 766 additions and 13 deletions

View File

@@ -49,12 +49,12 @@ interview-ready after each short practice loop.
### Ontology and Learning Materials
- [ ] **ONTO-01**: User or operator can upload learning materials.
- [ ] **ONTO-02**: System creates source-backed ontology candidate nodes and
- [x] **ONTO-01**: User or operator can upload learning materials.
- [x] **ONTO-02**: System creates source-backed ontology candidate nodes and
edges.
- [ ] **ONTO-03**: System detects missing prerequisites and weakly supported
- [x] **ONTO-03**: System detects missing prerequisites and weakly supported
concepts.
- [ ] **ONTO-04**: Generated or inferred content is marked as candidate until
- [x] **ONTO-04**: Generated or inferred content is marked as candidate until
reviewed.
### Teaching Assets
@@ -98,7 +98,7 @@ interview-ready after each short practice loop.
| INT-01..INT-06 | Phase 2 | Complete |
| MEM-01..MEM-05 | Phase 3 | Complete |
| PROG-01..PROG-05 | Phase 4 | Complete |
| ONTO-01..ONTO-04 | Phase 5 | Pending |
| ONTO-01..ONTO-04 | Phase 5 | Complete |
| ASSET-01..ASSET-03 | Phase 6 | Pending |
**Coverage:**
@@ -108,4 +108,4 @@ interview-ready after each short practice loop.
---
*Requirements defined: 2026-04-26*
*Last updated: 2026-04-26 after Phase 4 execution.*
*Last updated: 2026-04-26 after Phase 5 execution.*

View File

@@ -7,7 +7,7 @@ See: `.planning/PROJECT.md` (updated 2026-04-26)
**Core value:** The user should feel and prove that they are becoming more
interview-ready after each short practice loop.
**Current focus:** Phase 5 planning: Ontology and Learning Materials.
**Current focus:** Phase 6 planning: Teaching Assets.
## Current Decisions
@@ -29,14 +29,16 @@ interview-ready after each short practice loop.
schedules.
- Phase 4 progression is implemented and verified with readiness map and next
challenge APIs derived from learner memory evidence.
- Phase 5 ontology material ingestion is implemented and verified with
source-backed candidate concepts, prerequisite edges, and candidate gaps.
## Next Actions
1. Plan Phase 5 ontology and learning material ingestion with GSD.
1. Plan Phase 6 teaching asset prompt generation with GSD.
2. Keep `docs/planning/WORKFLOW_CONTRACTS.md` aligned with Go structs during
future workflow implementation.
3. Decide the MVP ontology storage boundary before accepting uploaded source
materials.
3. Verify the production OpenAI image model identifier before real asset
generation calls.
## Validation Log
@@ -56,6 +58,9 @@ interview-ready after each short practice loop.
- 2026-04-26: Phase 4 implementation verified with `go test ./...`,
`openspec validate bootstrap-job-tutor-platform --strict`, live readiness and
next-challenge smoke, and Go source line-count check.
- 2026-04-26: Phase 5 implementation verified with `go test ./...`,
`openspec validate bootstrap-job-tutor-platform --strict`, live material
ingestion and ontology snapshot smoke, and Go source line-count check.
---
*State initialized: 2026-04-26.*

View File

@@ -0,0 +1,37 @@
# Phase 5 Context: Ontology and Learning Materials
**Status:** Ready for execution
**Started:** 2026-04-26
## Goal
Accept learning material input and produce source-backed ontology candidates.
## Inputs
- OpenSpec `learning-ontology` requirements.
- Existing workflow contracts for `OntologyGap`.
- Backend Developer Interview seed concepts.
## Decisions
- Use an in-memory ontology store for MVP proof.
- Accept JSON material ingestion before multipart file upload.
- Mark all generated nodes, edges, and gaps as `candidate`.
- Preserve source evidence for every supported ontology candidate.
## Boundaries
In scope:
- Material ingestion API.
- Source-backed ontology candidate nodes and edges.
- Gap detection for missing prerequisites and weak evidence.
- Ontology snapshot API.
Out of scope:
- File storage.
- PDF/PPT parsing.
- Human review UI.
- Canonical promotion workflow.

View File

@@ -0,0 +1,42 @@
# Phase 5 Plan: Ontology and Learning Materials
**Status:** Ready for execution
**Phase Goal:** Ingest learning materials into source-backed ontology candidates.
## Requirements Covered
- ONTO-01: User or operator can upload learning materials.
- ONTO-02: System creates source-backed ontology candidate nodes and edges.
- ONTO-03: System detects missing prerequisites and weakly supported concepts.
- ONTO-04: Generated or inferred content is marked as candidate until reviewed.
## Tasks
### 1. Add ontology package
- Define material, concept candidate, edge candidate, gap, and snapshot types.
- Add in-memory store and service.
### 2. Implement deterministic MVP analyzer
- Extract known backend interview concept candidates from material text.
- Create prerequisite edges for supported concept pairs.
- Create gap candidates for missing prerequisites and weak evidence.
### 3. Add HTTP endpoints
- `POST /api/v1/materials`
- `GET /api/v1/ontology`
### 4. Add tests and verification
- Test material ingestion creates source-backed candidates.
- Test gaps are candidate-only.
- Test HTTP ingestion and ontology snapshot flow.
- Run Go tests, OpenSpec validation, line-count check, and smoke.
## Out of Scope
- Multipart upload.
- Real document parsers.
- Human review promotion.

View File

@@ -0,0 +1,28 @@
# Phase 5 Research: Ontology and Learning Materials
## Findings
The first useful ontology proof does not need heavy parsing. It needs a clean
boundary that proves uploaded material can become inspectable candidate
knowledge with provenance.
The MVP should:
- store material metadata and source text
- extract concept candidates from known backend interview concepts
- create prerequisite edges from a small deterministic rule set
- identify weak concepts when source support is thin
- never mark generated or inferred content as canonical
## Recommended Shape
- `internal/ontology` owns material ingestion, candidate storage, and snapshot.
- HTTP exposes JSON ingestion first.
- Evidence references use the existing workflow shared type.
- Gap records distinguish source-backed weakness from generated inference.
## Risks
- Overbuilding parsers too early would violate YAGNI.
- Treating keyword extraction as canonical knowledge would violate OpenSpec.
- A future parser can replace the analyzer behind the same service boundary.

View File

@@ -0,0 +1,36 @@
# Phase 5 Summary
**Status:** Complete
**Completed:** 2026-04-26
## Delivered
- Added `internal/ontology` for materials, concept candidates, edge candidates,
gaps, and snapshots.
- Added deterministic MVP analyzer for known backend interview concepts.
- Added source evidence to every supported concept and edge candidate.
- Added candidate-only gap records for missing prerequisites and weak evidence.
- Added HTTP endpoints:
- `POST /api/v1/materials`
- `GET /api/v1/ontology`
- Added ontology unit tests and HTTP flow tests.
## Verification
```powershell
gofmt -w cmd internal
go test ./...
openspec validate bootstrap-job-tutor-platform --strict
```
Additional smoke check:
- Material ingestion followed by ontology snapshot returned candidate concepts,
edges, and gaps.
## Deferred
- Multipart uploads.
- PPT/PDF/document parsing.
- Human review and canonical promotion.
- Graph database persistence.

View File

@@ -0,0 +1,29 @@
# Phase 5 Verification
## Verdict
PASS
## Requirement Coverage
- ONTO-01: PASS. JSON material ingestion API accepts operator-provided learning
material.
- ONTO-02: PASS. Ingestion creates source-backed candidate concepts and
prerequisite edges.
- ONTO-03: PASS. The analyzer creates candidate gaps for missing prerequisites
and weak source evidence.
- ONTO-04: PASS. All generated ontology candidates and gaps use `candidate`
review state.
## Evidence
- `go test ./...` passed.
- `openspec validate bootstrap-job-tutor-platform --strict` passed.
- Live material ingestion and ontology snapshot smoke passed.
- Go source line-count check passed.
## Residual Risk
The analyzer is deterministic and intentionally shallow. It proves the product
boundary but should later be replaced or supplemented with parser-backed and
LLM-assisted extraction.