Skip to content

Retrain ML model on real feedback (replace synthetic bootstrap) #92

@ringo380

Description

@ringo380

Context

The currently-ACTIVE model (query_grader, version 20260521_021306) was trained as a bootstrap on synthetic/seed TrainingData to give the issue-#5 monitoring pipeline an ACTIVE target. Its metrics reflect that:

  • Training accuracy: 0.804
  • Validation accuracy: 0.712 (just over the 0.7 auto-deploy threshold)
  • Test accuracy: 0.095 — effectively no real predictive power.

It satisfies "an ACTIVE model exists so monitoring evaluates," but it is not a quality predictor. The low-confidence / low-user-agreement alerts it generates reflect exactly this.

Goal

Replace the synthetic bootstrap with a model trained on accumulated real user feedback once enough has been collected.

Tasks

  • Define a minimum real-feedback sample threshold before retraining (current ML_MIN_TRAINING_SAMPLES=50 counts synthetic seed rows — may want a separate "validated real feedback" gate).
  • Confirm process_ml_feedbackTrainingData flow is converting real QueryFeedback into training samples.
  • Establish a retrain cadence (the ML_AUTO_RETRAIN / beat machinery exists) and a quality gate higher than the bootstrap's 0.7 val / 0.095 test.
  • Re-evaluate model_type: bootstrap is QUERY_GRADER; the grading path expects HYBRID_SCORER (per CLAUDE.md) once HybridQueryGrader is wired in.

Notes

  • Training runs in-container: railway ssh -s querygrade-worker "python manage.py train_ml_model …" (local railway run can't reach postgres.railway.internal).
  • Related: model-artifact durability (Move ML model artifacts to object storage (model registry) #91) becomes relevant once predictions actually load the model.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions