Problem
The hf_deploy/ directory is a near-copy of trainer/ and workers/. Any bug fix or new feature applied to the local training stack must be manually replicated in the HF deploy copy. This has already caused drift and will worsen over time.
Affected paths
hf_deploy/trainer/ — duplicates trainer/
hf_deploy/workers/ — duplicates workers/
Proposed fix
Extract a shared, pip-installable Python package (e.g. tuneos-core) that contains:
trainer/ (finetune, dataset, callbacks, config, loader, lora, qlora, merge, evaluate)
workers/ (celery_app, train_task, merge_task, status)
Both the local app and the HF Space Dockerfile would then install from this package (either via PyPI or a local path install in the monorepo).
Acceptance criteria
Problem
The
hf_deploy/directory is a near-copy oftrainer/andworkers/. Any bug fix or new feature applied to the local training stack must be manually replicated in the HF deploy copy. This has already caused drift and will worsen over time.Affected paths
hf_deploy/trainer/— duplicatestrainer/hf_deploy/workers/— duplicatesworkers/Proposed fix
Extract a shared, pip-installable Python package (e.g.
tuneos-core) that contains:trainer/(finetune, dataset, callbacks, config, loader, lora, qlora, merge, evaluate)workers/(celery_app, train_task, merge_task, status)Both the local app and the HF Space Dockerfile would then install from this package (either via PyPI or a local path install in the monorepo).
Acceptance criteria
hf_deploy/trainer/andhf_deploy/workers/are deletedpyproject.tomlin both contextshf_deploy/Dockerfile updated to install the shared package