We implemented a ResNet-18 classifier trained from scratch (no pretrained weights) with a custom classification head. The architecture consists of:
- Backbone: ResNet-18 (PyTorch, weights=None)
- Classifier Head:
- Linear(512 → 256) + BatchNorm1d + ReLU + Dropout(0.3)
- Linear(256 → 128) + BatchNorm1d + ReLU + Dropout(0.3)
- Linear(128 → 2) for binary classification
- Optimizer: AdamW with weight decay (5e-4)
- Learning Rate: 0.0005 (optimized for from-scratch training)
- Scheduler: CosineAnnealingWarmRestarts (T_0=10, T_mult=2)
- Batch Size: 16
- Epochs: 100
- Regularization: Dropout (0.3), Weight Decay (5e-4), Gradient Clipping (max_norm=1.0)
- Device: CPU
Moderate augmentation strategy to balance generalization and feature preservation:
- Random resize crop (136→128)
- Random horizontal flip (p=0.5)
- Random rotation (±10°)
- Random affine (shear=8, scale=0.85-1.15)
- Color jitter (brightness/contrast/saturation=0.15)
- Random erasing (p=0.05)
- Training: 4,328 labeled samples (after data cleaning)
- Chihuahua: ~100 samples
- Muffin: ~100 samples
- Undefined: 4,133 samples (excluded from training via zero weights)
- Validation: 1,184 samples (640 chihuahua + 544 muffin)
Using the 3LC Dashboard, we identified several critical data quality problems:
- Mislabeled Samples: Found chihuahua images incorrectly labeled as muffins and vice versa
- Poor Quality Images: Blurry, corrupted, or irrelevant images affecting model performance
- Ambiguous Cases: Borderline cases that needed manual review
-
Embeddings Visualization with Lasso Tool:
- Used 3D PaCMAP embeddings to visualize data clusters in the 3LC Dashboard
- Lasso Tool for Cluster Labeling: Selected clusters of similar images using the lasso tool
- Auto-labeled entire clusters that clearly belonged to one class (chihuahua or muffin)
- Identified mislabeled samples that appeared in wrong clusters
- Found ambiguous or unrelated images that didn't fit any clear cluster
-
Manual Label Correction:
- Went through sets of images systematically to verify and correct labels
- Fixed mislabeled samples identified through cluster analysis
- Corrected edge cases where auto-labeling might have been incorrect
- Ensured high-quality labels for training data
-
Data Cleaning:
- Removed ambiguous images that couldn't be clearly classified
- Removed unrelated images that didn't belong to either class
- Cleaned dataset from 4,733 → 4,328 samples
- Created new table versions after each improvement iteration
- Cluster-Based Auto-Labeling: Used 3LC lasso tool to identify and auto-label clusters of similar images
- Manual Label Corrections: Systematically reviewed and corrected ~50+ mislabeled samples
- Data Cleaning: Removed ~400 ambiguous or unrelated images that didn't fit clear class boundaries
- Table Versioning: Created multiple table versions tracking each improvement iteration
Workflow Efficiency: The lasso tool was particularly powerful for efficiently labeling large groups of similar images at once, significantly speeding up the data improvement process compared to manual labeling of individual samples.
Configuration:
- Epochs: 10
- Learning Rate: 0.0001
- Batch Size: 16
- No learning rate scheduling
- Basic data augmentation
Results:
- Best Validation Accuracy: 80.83%
- Training Progression:
- Epoch 1: 56.50%
- Epoch 2: 60.56%
- Epoch 3: 71.88%
- Epoch 4: 74.83%
- Epoch 5: 80.83% (best)
- Epochs 6-10: Declined (overfitting)
Issues Identified:
- Accuracy plateaued around 80-82%
- Model overfitting after epoch 5
- No learning rate adaptation
Changes:
- Learning rate: 0.0001 → 0.0005
- Added CosineAnnealingWarmRestarts scheduler (T_0=10, T_mult=2)
- Added gradient clipping (max_norm=1.0)
- Enhanced data augmentation
- Added BatchNorm layers in classifier
- Increased weight decay to 5e-4
Results:
- Best Validation Accuracy: 88-90% (initial runs with 30 epochs)
- Improved stability with learning rate scheduling
- Better convergence with optimized LR
- Learning rate restarts helped escape plateaus
Remaining Issues:
- Still plateauing around 88-90% with limited epochs
- Needed extended training and data quality improvements
Changes:
- Extended training to 100 epochs for full convergence
- Fixed mislabeled samples using 3LC Dashboard
- Removed poor-quality images
- Strategic labeling of undefined samples
- Cleaned dataset: 4,733 → 4,328 samples
- Cosine annealing with warm restarts (T_0=10, T_mult=2)
Results:
- Best Validation Accuracy: 96.37% (Epoch 62)
- Training Progression:
- Epoch 1: 84.63%
- Epoch 4: 90.12%
- Epoch 6: 91.47%
- Epoch 9: 92.91%
- Epoch 22: 93.41%
- Epoch 24: 94.09%
- Epoch 29: 95.27%
- Epoch 48: 95.52%
- Epoch 54: 95.69%
- Epoch 57: 96.20%
- Epoch 62: 96.37% (BEST)
- Epoch 68: 96.28%
- Epoch 100: 96.28%
Key Insight: Extended training with learning rate restarts enabled the model to escape local minima and achieve significantly higher accuracy. Data quality improvements combined with proper hyperparameter tuning and extended training yielded the best results.
| Metric | Value |
|---|---|
| Best Validation Accuracy | 96.37% |
| Training Samples (after cleaning) | 4,328 |
| Validation Samples | 1,184 |
| Total Epochs | 100 |
| Best Model Epoch | 62 |
| Model Size | ResNet-18 (~11M parameters) |
| Device | CPU |
| Training Time | ~4-5 hours (100 epochs) |
Early Training (Epochs 1-10):
- Epoch 1: 84.63% (LR: 0.000488)
- Epoch 2: 86.06%
- Epoch 4: 90.12% (first major jump)
- Epoch 6: 91.47%
- Epoch 9: 92.91%
- Epoch 10: 91.81% (LR restart to 0.000500)
Mid Training (Epochs 11-30):
- Epoch 22: 93.41%
- Epoch 23: 93.50%
- Epoch 24: 94.09%
- Epoch 26: 94.76%
- Epoch 29: 95.27%
- Epoch 30: 94.76% (LR restart to 0.000500)
Late Training (Epochs 31-100):
- Epoch 43: 95.10%
- Epoch 48: 95.52%
- Epoch 54: 95.69%
- Epoch 57: 96.20%
- Epoch 62: 96.37% ⭐ BEST MODEL
- Epoch 68: 96.28%
- Epoch 70: 96.03% (LR restart to 0.000500)
- Epoch 89: 96.03%
- Epoch 98: 96.20%
- Epoch 100: 96.28%
Key Observations:
- Learning rate restarts at epochs 10, 30, 70 helped escape plateaus
- Steady improvement from 84% → 96% over 100 epochs
- Model maintained high accuracy (96%+) in later epochs
Note: Screenshots should be added showing:
- Embeddings Visualization: 3D scatter plot showing class clusters and outliers
- Metrics Dashboard: Per-sample loss, accuracy, and confidence distributions
- Confusion Analysis: Misclassified samples with predictions
- Training Curves: Validation accuracy and loss over epochs
- Data Quality View: Before/after data cleaning comparison
- Per-Sample Loss Distribution: Identified high-loss samples for review
- Confidence Scores: Used to prioritize undefined sample labeling
- Embedding Clusters: Revealed data quality issues and class boundaries
- Prediction Accuracy: Tracked improvement across iterations
Final Training Execution:
- Command:
python train.py - Device: CPU
- Total Training Time: ~4-5 hours for 100 epochs
- Training Speed: ~3.2 it/s (iterations per second)
- Validation Speed: ~8.0 it/s
- Best Model Saved:
resnet18_classifier_best.pth(Epoch 62) - Metrics Collection: Completed successfully after training
- Embeddings Reduction: PaCMAP applied for visualization
Training Stability:
- Consistent improvement throughout 100 epochs
- No significant overfitting observed
- Learning rate restarts (epochs 10, 30, 70) helped maintain momentum
- Final epochs (90-100) maintained 96%+ accuracy
-
Parquet File Corruption:
- Issue: Initial table registration caused Parquet corruption errors
- Root Cause: NumPy type incompatibility with PyTorch/PyArrow
- Solution: Explicit type conversion to Python native types (int, float, str)
- Learning: Always use native Python types when writing to Parquet for compatibility
-
Accuracy Plateau:
- Issue: Model stuck at 88-90% accuracy despite hyperparameter tuning
- Root Cause: Data quality issues (mislabeled samples, poor images)
- Solution: Used 3LC Dashboard to identify and fix data problems
- Learning: Data quality > hyperparameter tuning for limited data scenarios
-
Training from Scratch:
- Issue: Slow convergence and instability without pretrained weights
- Solution: Lower learning rate (0.0005), gradient clipping, BatchNorm
- Learning: Training from scratch requires more careful hyperparameter selection
-
NumPy/PyArrow Compatibility:
- Issue: NumPy 2.x incompatibility with pandas/pyarrow
- Solution: Upgraded pandas to 2.3.3+ which supports NumPy 2.x
- Learning: Keep dependencies updated and check compatibility
-
Data-Centric AI is Powerful:
- Fixing 50 mislabeled samples improved accuracy more than extensive hyperparameter tuning
- Model feedback (via 3LC) is the best guide for data improvement
- Quality > Quantity: 200 clean samples > 2000 noisy samples
-
Hyperparameter Sensitivity:
- Small LR changes (0.0001 → 0.0005) made significant differences
- Learning rate scheduling (cosine annealing) helps escape plateaus
- Batch size affects convergence speed and stability
-
Training from Scratch Requires:
- Lower learning rates than transfer learning (0.0005 optimal)
- Many more epochs (100 vs 10-30) for full convergence
- Better regularization (gradient clipping, BatchNorm)
- Learning rate scheduling with restarts to escape plateaus
- Patience and careful monitoring
-
3LC Dashboard Value:
- Embeddings visualization revealed data issues not visible in metrics alone
- Lasso tool enabled efficient cluster-based labeling - much faster than individual sample labeling
- Cluster analysis helped identify mislabeled samples and ambiguous cases
- Manual review complemented automated cluster labeling for accuracy
- Iterative workflow (train → analyze → fix → retrain) is highly effective
-
Data Expansion:
- Label more undefined samples using active learning
- Target low-confidence predictions for manual labeling
- Aim to reach 500+ labeled samples per class
-
Model Architecture:
- Experiment with deeper ResNet variants (ResNet-34, ResNet-50)
- Try EfficientNet or Vision Transformer architectures
- Ensemble multiple models for improved robustness
-
Advanced Training Techniques:
- Implement mixup or cutmix augmentation
- Add label smoothing for better calibration
- Experiment with focal loss for hard example mining
-
Hyperparameter Optimization:
- Automated hyperparameter search (Optuna, Ray Tune)
- Learning rate finder for optimal initial LR
- Architecture search for optimal classifier head
-
Active Learning Pipeline:
- Automate the train → analyze → label → retrain loop
- Use uncertainty sampling to prioritize labeling
- Implement semi-supervised learning techniques
-
Data Augmentation:
- Advanced augmentations (AutoAugment, RandAugment)
- Domain-specific augmentations for dogs/food
- Test-time augmentation for inference
-
Model Interpretability:
- Grad-CAM visualizations for decision explanation
- Attention mechanisms to understand model focus
- Feature importance analysis
-
Production Considerations:
- Model quantization for deployment
- ONNX export for cross-platform compatibility
- API development for real-time inference
- Data Expansion: Already achieved 96.37% - could potentially reach 97-98% with more labeled data
- Architecture Improvements: Deeper models (ResNet-34/50) could push to 97-98%
- Advanced Techniques: Ensemble methods could achieve 98%+ accuracy
- GPU Training: Would significantly reduce training time (currently ~4-5 hours on CPU)
This project successfully demonstrated the power of data-centric AI by achieving 96.37% accuracy on a challenging binary classification task with limited labeled data. The key success factors were:
- ✅ Systematic hyperparameter optimization (LR=0.0005, AdamW, weight decay)
- ✅ Data quality improvements using 3LC Dashboard (cleaned 4,328 samples)
- ✅ Extended training with learning rate scheduling (100 epochs, cosine annealing)
- ✅ Proper regularization (dropout, gradient clipping, BatchNorm)
- ✅ Iterative train-fix-retrain workflow enabled by 3LC
The project highlights that combining data quality improvements with extended training and proper hyperparameter tuning can achieve excellent results even when training from scratch with limited data. The learning rate restarts were particularly effective in helping the model escape accuracy plateaus.
Project Repository: [GitHub Link]
3LC Dashboard: [Dashboard Link]
Best Model Accuracy: 96.37% (Epoch 62)
Training Time: ~4-5 hours (100 epochs, CPU)
Final Validation Loss: 0.1634