Neural Network from Scratch — with Interactive Digit Recognizer

A feedforward neural network built from scratch in Python using only NumPy, trained on MNIST to recognize handwritten digits, paired with a Tkinter GUI that lets you draw a digit and watch the network classify it in real time.

No PyTorch or TensorFlow. Every layer, gradient, and optimizer built from scratch.

What this project demonstrates

Forward and backward propagation implemented from first principles
Manual derivation of gradients for Linear, ReLU, Sigmoid, Softmax, Cross-Entropy, and MSE
Two optimizers from scratch: SGD and Adam
Mini-batch training loop with shuffling and per-epoch loss tracking
He weight initialization and numerically stable Softmax
End-to-end project: data loading, training, evaluation, weight persistence, and a live interactive demo

The model reaches ~98% accuracy on the MNIST test set after 10 epochs with Adam and roughly ~95% acuracy with SGD.

Demo

The GUI window has three panels:

┌──────────────┬──────────────┬──────────────────┐
│              │              │                  │
│   Draw a     │   Network    │   Network        │
│   digit      │   sees       │   visualization  │
│   (280×280)  │   (28×28     │   (layers,       │
│              │   centered)  │   activations)   │
│              │              │                  │
└──────────────┴──────────────┴──────────────────┘

Left: draw a digit with the mouse
Middle: see the preprocessed 28×28 image the network actually receives
Right: see the network structure light up with the activation pattern
Prediction updates live every time you release the mouse

Project structure

neural_net_from_scratch/
├── neural_net/
│   ├── layers.py          # Linear, ReLU, Sigmoid, Softmax
│   ├── losses.py          # CrossEntropyLoss, MSELoss
│   ├── optimizers.py      # SGD, Adam
│   ├── network.py         # Network class (composes layers)
│   ├── train.py           # mini-batch training loop
│   ├── test.py            # XOR sanity check
│   ├── MNIST_test.py      # full MNIST training script
│   └── GUI.py             # Tkinter digit recognizer
├── mnist_weights.npz      # saved trained weights
└── README.md

Architecture

The default network used in the demo:

Input (784)  →  Linear(784, 128)  →  ReLU
             →  Linear(128, 64)   →  ReLU
             →  Linear(64, 10)    →  Softmax  →  Output (10 class probabilities)

Trained with Cross-Entropy loss and Adam (lr=0.001, batch size 64, 10 epochs).

The math, briefly

Forward pass

Each Linear layer computes:

output = W · input + b

Activations apply element-wise nonlinearities (ReLU clips negatives to zero; Softmax normalizes to probabilities).

Backward pass (the interesting part)

Gradients are computed via the chain rule, layer by layer in reverse:

Linear layer:

grad_W = grad_out @ x.T
grad_b = sum(grad_out, axis=batch)
grad_input = W.T @ grad_out     # passed back to previous layer

Softmax + Cross-Entropy combined simplifies beautifully:

grad = predictions − one_hot_targets

This is why every classification network pairs Softmax with Cross-Entropy — the gradient collapses to "how far off was each predicted probability from the truth," with no Jacobian to compute.

Adam optimizer

Maintains running averages of gradients and squared gradients per parameter, bias-corrects them, and uses them to adapt the per-parameter learning rate:

m = β₁ · m + (1 − β₁) · grad
v = β₂ · v + (1 − β₂) · grad²

m̂ = m / (1 − β₁ᵗ)        # bias correction
v̂ = v / (1 − β₂ᵗ)

weight −= lr · m̂ / (√v̂ + ε)

Getting started

Install

pip install numpy scipy matplotlib pillow scikit-learn

tkinter ships with standard Python on Windows and macOS. On Linux:

sudo apt-get install python3-tk

Train the model

cd neural_net
python MNIST_test.py

This will:

Download MNIST (via sklearn.datasets.fetch_openml)
Train the network for 10 epochs with Adam
Save the trained weights to mnist_weights.npz
Plot training loss curves

Expected output:

Epoch 1, Loss 0.7995, correct prob 0.4496
Epoch 2, Loss 0.3492, correct prob 0.7053
Epoch 3, Loss 0.2911, correct prob 0.7474
...
Epoch 10, Loss 0.1421, correct prob 0.8675
Test accuracy: 96.8%

Run the interactive demo

After training (which generates mnist_weights.npz):

python GUI.py

Draw a digit in the left panel. The prediction updates as soon as you release the mouse.

Verify correctness on XOR

Before MNIST, the implementation is validated on XOR — the smallest non-linear classification problem:

python test.py

If the network can't learn XOR (loss should drop below 0.1 within 500 epochs), backprop is broken.

Preprocessing pipeline

Real handwriting looks nothing like MNIST out of the box. The GUI's preprocessing function bridges the gap:

Extract pixels from the drawing canvas
Find bounding box of the drawn digit
Crop to that bounding box
Resize to fit in a 20×20 region while preserving aspect ratio
Place in a 28×28 black canvas
Shift by center-of-mass so the digit's mass lands at pixel (14, 14)
Normalize to [0, 1] and reshape to (784, 1)

The "Network sees" panel shows the result of this pipeline, which is genuinely educational — you can immediately see why a digit drawn in the corner might be misclassified without proper centering.

What we learned

Building this taught us, in the most concrete way possible:

Why backprop is just the chain rule, applied layer by layer
Why initialization matters — try zeros and watch the network refuse to train
Why Softmax + Cross-Entropy are paired — the gradient simplifies dramatically
Why batch normalization, Adam, and dropout exist — by feeling the problems they solve
The gap between benchmark accuracy and real-world performance — the network hits 97% on MNIST and still struggles with my actual handwriting until preprocessing is done right

Known limitations

No GPU support — pure NumPy, single-threaded. Training takes ~2 minutes on a modern CPU
No convolutions yet — fully-connected only; would need a Conv2D layer for state-of-the-art accuracy
Sensitive to drawing style — strokes much thicker or thinner than MNIST's distribution reduce accuracy
No data augmentation — the network sees only the original 60,000 training images

Acknowledgments

3Blue1Brown's Neural Networks series — the best intuition for backpropagation anywhere
CS231n (Stanford) — the gold-standard reference for the math

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
neural_net		neural_net
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Neural Network from Scratch — with Interactive Digit Recognizer

What this project demonstrates

Demo

Project structure

Architecture

The math, briefly

Forward pass

Backward pass (the interesting part)

Adam optimizer

Getting started

Install

Train the model

Run the interactive demo

Verify correctness on XOR

Preprocessing pipeline

What we learned

Known limitations

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Neural Network from Scratch — with Interactive Digit Recognizer

What this project demonstrates

Demo

Project structure

Architecture

The math, briefly

Forward pass

Backward pass (the interesting part)

Adam optimizer

Getting started

Install

Train the model

Run the interactive demo

Verify correctness on XOR

Preprocessing pipeline

What we learned

Known limitations

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages