A feedforward neural network built from scratch in Python using only NumPy, trained on MNIST to recognize handwritten digits, paired with a Tkinter GUI that lets you draw a digit and watch the network classify it in real time.
No PyTorch or TensorFlow. Every layer, gradient, and optimizer built from scratch.
- Forward and backward propagation implemented from first principles
- Manual derivation of gradients for Linear, ReLU, Sigmoid, Softmax, Cross-Entropy, and MSE
- Two optimizers from scratch: SGD and Adam
- Mini-batch training loop with shuffling and per-epoch loss tracking
- He weight initialization and numerically stable Softmax
- End-to-end project: data loading, training, evaluation, weight persistence, and a live interactive demo
The model reaches ~98% accuracy on the MNIST test set after 10 epochs with Adam and roughly ~95% acuracy with SGD.
The GUI window has three panels:
┌──────────────┬──────────────┬──────────────────┐
│ │ │ │
│ Draw a │ Network │ Network │
│ digit │ sees │ visualization │
│ (280×280) │ (28×28 │ (layers, │
│ │ centered) │ activations) │
│ │ │ │
└──────────────┴──────────────┴──────────────────┘
- Left: draw a digit with the mouse
- Middle: see the preprocessed 28×28 image the network actually receives
- Right: see the network structure light up with the activation pattern
- Prediction updates live every time you release the mouse
neural_net_from_scratch/
├── neural_net/
│ ├── layers.py # Linear, ReLU, Sigmoid, Softmax
│ ├── losses.py # CrossEntropyLoss, MSELoss
│ ├── optimizers.py # SGD, Adam
│ ├── network.py # Network class (composes layers)
│ ├── train.py # mini-batch training loop
│ ├── test.py # XOR sanity check
│ ├── MNIST_test.py # full MNIST training script
│ └── GUI.py # Tkinter digit recognizer
├── mnist_weights.npz # saved trained weights
└── README.md
The default network used in the demo:
Input (784) → Linear(784, 128) → ReLU
→ Linear(128, 64) → ReLU
→ Linear(64, 10) → Softmax → Output (10 class probabilities)
Trained with Cross-Entropy loss and Adam (lr=0.001, batch size 64, 10 epochs).
Each Linear layer computes:
output = W · input + b
Activations apply element-wise nonlinearities (ReLU clips negatives to zero; Softmax normalizes to probabilities).
Gradients are computed via the chain rule, layer by layer in reverse:
Linear layer:
grad_W = grad_out @ x.T
grad_b = sum(grad_out, axis=batch)
grad_input = W.T @ grad_out # passed back to previous layer
Softmax + Cross-Entropy combined simplifies beautifully:
grad = predictions − one_hot_targets
This is why every classification network pairs Softmax with Cross-Entropy — the gradient collapses to "how far off was each predicted probability from the truth," with no Jacobian to compute.
Maintains running averages of gradients and squared gradients per parameter, bias-corrects them, and uses them to adapt the per-parameter learning rate:
m = β₁ · m + (1 − β₁) · grad
v = β₂ · v + (1 − β₂) · grad²
m̂ = m / (1 − β₁ᵗ) # bias correction
v̂ = v / (1 − β₂ᵗ)
weight −= lr · m̂ / (√v̂ + ε)
pip install numpy scipy matplotlib pillow scikit-learntkinter ships with standard Python on Windows and macOS. On Linux:
sudo apt-get install python3-tkcd neural_net
python MNIST_test.pyThis will:
- Download MNIST (via
sklearn.datasets.fetch_openml) - Train the network for 10 epochs with Adam
- Save the trained weights to
mnist_weights.npz - Plot training loss curves
Expected output:
Epoch 1, Loss 0.7995, correct prob 0.4496
Epoch 2, Loss 0.3492, correct prob 0.7053
Epoch 3, Loss 0.2911, correct prob 0.7474
...
Epoch 10, Loss 0.1421, correct prob 0.8675
Test accuracy: 96.8%
After training (which generates mnist_weights.npz):
python GUI.pyDraw a digit in the left panel. The prediction updates as soon as you release the mouse.
Before MNIST, the implementation is validated on XOR — the smallest non-linear classification problem:
python test.pyIf the network can't learn XOR (loss should drop below 0.1 within 500 epochs), backprop is broken.
Real handwriting looks nothing like MNIST out of the box. The GUI's preprocessing function bridges the gap:
- Extract pixels from the drawing canvas
- Find bounding box of the drawn digit
- Crop to that bounding box
- Resize to fit in a 20×20 region while preserving aspect ratio
- Place in a 28×28 black canvas
- Shift by center-of-mass so the digit's mass lands at pixel (14, 14)
- Normalize to [0, 1] and reshape to
(784, 1)
The "Network sees" panel shows the result of this pipeline, which is genuinely educational — you can immediately see why a digit drawn in the corner might be misclassified without proper centering.
Building this taught us, in the most concrete way possible:
- Why backprop is just the chain rule, applied layer by layer
- Why initialization matters — try zeros and watch the network refuse to train
- Why Softmax + Cross-Entropy are paired — the gradient simplifies dramatically
- Why batch normalization, Adam, and dropout exist — by feeling the problems they solve
- The gap between benchmark accuracy and real-world performance — the network hits 97% on MNIST and still struggles with my actual handwriting until preprocessing is done right
- No GPU support — pure NumPy, single-threaded. Training takes ~2 minutes on a modern CPU
- No convolutions yet — fully-connected only; would need a Conv2D layer for state-of-the-art accuracy
- Sensitive to drawing style — strokes much thicker or thinner than MNIST's distribution reduce accuracy
- No data augmentation — the network sees only the original 60,000 training images
- 3Blue1Brown's Neural Networks series — the best intuition for backpropagation anywhere
- CS231n (Stanford) — the gold-standard reference for the math