Skip to content

Awlee314/neural-net

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 

Repository files navigation

Neural Network from Scratch — with Interactive Digit Recognizer

A feedforward neural network built from scratch in Python using only NumPy, trained on MNIST to recognize handwritten digits, paired with a Tkinter GUI that lets you draw a digit and watch the network classify it in real time.

No PyTorch or TensorFlow. Every layer, gradient, and optimizer built from scratch.


What this project demonstrates

  • Forward and backward propagation implemented from first principles
  • Manual derivation of gradients for Linear, ReLU, Sigmoid, Softmax, Cross-Entropy, and MSE
  • Two optimizers from scratch: SGD and Adam
  • Mini-batch training loop with shuffling and per-epoch loss tracking
  • He weight initialization and numerically stable Softmax
  • End-to-end project: data loading, training, evaluation, weight persistence, and a live interactive demo

The model reaches ~98% accuracy on the MNIST test set after 10 epochs with Adam and roughly ~95% acuracy with SGD.


Demo

The GUI window has three panels:

┌──────────────┬──────────────┬──────────────────┐
│              │              │                  │
│   Draw a     │   Network    │   Network        │
│   digit      │   sees       │   visualization  │
│   (280×280)  │   (28×28     │   (layers,       │
│              │   centered)  │   activations)   │
│              │              │                  │
└──────────────┴──────────────┴──────────────────┘
  • Left: draw a digit with the mouse
  • Middle: see the preprocessed 28×28 image the network actually receives
  • Right: see the network structure light up with the activation pattern
  • Prediction updates live every time you release the mouse

Project structure

neural_net_from_scratch/
├── neural_net/
│   ├── layers.py          # Linear, ReLU, Sigmoid, Softmax
│   ├── losses.py          # CrossEntropyLoss, MSELoss
│   ├── optimizers.py      # SGD, Adam
│   ├── network.py         # Network class (composes layers)
│   ├── train.py           # mini-batch training loop
│   ├── test.py            # XOR sanity check
│   ├── MNIST_test.py      # full MNIST training script
│   └── GUI.py             # Tkinter digit recognizer
├── mnist_weights.npz      # saved trained weights
└── README.md

Architecture

The default network used in the demo:

Input (784)  →  Linear(784, 128)  →  ReLU
             →  Linear(128, 64)   →  ReLU
             →  Linear(64, 10)    →  Softmax  →  Output (10 class probabilities)

Trained with Cross-Entropy loss and Adam (lr=0.001, batch size 64, 10 epochs).


The math, briefly

Forward pass

Each Linear layer computes:

output = W · input + b

Activations apply element-wise nonlinearities (ReLU clips negatives to zero; Softmax normalizes to probabilities).

Backward pass (the interesting part)

Gradients are computed via the chain rule, layer by layer in reverse:

Linear layer:

grad_W = grad_out @ x.T
grad_b = sum(grad_out, axis=batch)
grad_input = W.T @ grad_out     # passed back to previous layer

Softmax + Cross-Entropy combined simplifies beautifully:

grad = predictions − one_hot_targets

This is why every classification network pairs Softmax with Cross-Entropy — the gradient collapses to "how far off was each predicted probability from the truth," with no Jacobian to compute.

Adam optimizer

Maintains running averages of gradients and squared gradients per parameter, bias-corrects them, and uses them to adapt the per-parameter learning rate:

m = β₁ · m + (1 − β₁) · grad
v = β₂ · v + (1 − β₂) · grad²

m̂ = m / (1 − β₁ᵗ)        # bias correction
v̂ = v / (1 − β₂ᵗ)

weight −= lr · m̂ / (√v̂ + ε)

Getting started

Install

pip install numpy scipy matplotlib pillow scikit-learn

tkinter ships with standard Python on Windows and macOS. On Linux:

sudo apt-get install python3-tk

Train the model

cd neural_net
python MNIST_test.py

This will:

  • Download MNIST (via sklearn.datasets.fetch_openml)
  • Train the network for 10 epochs with Adam
  • Save the trained weights to mnist_weights.npz
  • Plot training loss curves

Expected output:

Epoch 1, Loss 0.7995, correct prob 0.4496
Epoch 2, Loss 0.3492, correct prob 0.7053
Epoch 3, Loss 0.2911, correct prob 0.7474
...
Epoch 10, Loss 0.1421, correct prob 0.8675
Test accuracy: 96.8%

Run the interactive demo

After training (which generates mnist_weights.npz):

python GUI.py

Draw a digit in the left panel. The prediction updates as soon as you release the mouse.

Verify correctness on XOR

Before MNIST, the implementation is validated on XOR — the smallest non-linear classification problem:

python test.py

If the network can't learn XOR (loss should drop below 0.1 within 500 epochs), backprop is broken.


Preprocessing pipeline

Real handwriting looks nothing like MNIST out of the box. The GUI's preprocessing function bridges the gap:

  1. Extract pixels from the drawing canvas
  2. Find bounding box of the drawn digit
  3. Crop to that bounding box
  4. Resize to fit in a 20×20 region while preserving aspect ratio
  5. Place in a 28×28 black canvas
  6. Shift by center-of-mass so the digit's mass lands at pixel (14, 14)
  7. Normalize to [0, 1] and reshape to (784, 1)

The "Network sees" panel shows the result of this pipeline, which is genuinely educational — you can immediately see why a digit drawn in the corner might be misclassified without proper centering.


What we learned

Building this taught us, in the most concrete way possible:

  • Why backprop is just the chain rule, applied layer by layer
  • Why initialization matters — try zeros and watch the network refuse to train
  • Why Softmax + Cross-Entropy are paired — the gradient simplifies dramatically
  • Why batch normalization, Adam, and dropout exist — by feeling the problems they solve
  • The gap between benchmark accuracy and real-world performance — the network hits 97% on MNIST and still struggles with my actual handwriting until preprocessing is done right

Known limitations

  • No GPU support — pure NumPy, single-threaded. Training takes ~2 minutes on a modern CPU
  • No convolutions yet — fully-connected only; would need a Conv2D layer for state-of-the-art accuracy
  • Sensitive to drawing style — strokes much thicker or thinner than MNIST's distribution reduce accuracy
  • No data augmentation — the network sees only the original 60,000 training images

Acknowledgments

  • 3Blue1Brown's Neural Networks series — the best intuition for backpropagation anywhere
  • CS231n (Stanford) — the gold-standard reference for the math

About

Neural Net from scratch using only numpy for matrix calculations.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages