From b37dbdddba0c1b18f068f7cbce13e41f6f0707b6 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Thu, 5 Mar 2026 21:33:10 +0000 Subject: [PATCH 1/3] Initial plan From 6face0be5a50f9401318c1915a4dfb5803b46b12 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Thu, 5 Mar 2026 21:39:01 +0000 Subject: [PATCH 2/3] Add three Jupyter notebook assignments with three-part structure each Co-authored-by: sing-git <183478851+sing-git@users.noreply.github.com> --- README.md | 30 +- assignments/Assignment1_Perceptrons_MLP.ipynb | 509 ++++++++++++++ assignments/Assignment2_CNN.ipynb | 542 +++++++++++++++ assignments/Assignment3_RNN_LSTM.ipynb | 652 ++++++++++++++++++ 4 files changed, 1732 insertions(+), 1 deletion(-) create mode 100644 assignments/Assignment1_Perceptrons_MLP.ipynb create mode 100644 assignments/Assignment2_CNN.ipynb create mode 100644 assignments/Assignment3_RNN_LSTM.ipynb diff --git a/README.md b/README.md index 4bf8b58..5cf18ed 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,30 @@ # DeepLearning_FS26 -DeepLearning_FS26 + +Deep Learning course — Spring Semester 2026 (FS26). + +## Assignments + +Each assignment is provided as a Jupyter notebook and includes three parts: + +| Part | Description | +|------|-------------| +| **Part 1: Task Description** | Description of the problem to be solved, details of tasks and possible solutions, and expected outputs/plots that hint at how solutions could look | +| **Part 2: Implementation** | Guided code cells where you implement the required models and algorithms | +| **Part 3: Experiments & Analysis** | Experiments to run, visualizations to produce, and written reflection questions | + +### Assignment Overview + +| # | Notebook | Topic | +|---|----------|-------| +| 1 | [Assignment1_Perceptrons_MLP.ipynb](assignments/Assignment1_Perceptrons_MLP.ipynb) | Perceptrons and Multi-Layer Perceptrons | +| 2 | [Assignment2_CNN.ipynb](assignments/Assignment2_CNN.ipynb) | Convolutional Neural Networks (CIFAR-10) | +| 3 | [Assignment3_RNN_LSTM.ipynb](assignments/Assignment3_RNN_LSTM.ipynb) | Recurrent Neural Networks and LSTMs | + +## Getting Started + +```bash +pip install torch torchvision numpy matplotlib scikit-learn jupyter +jupyter notebook +``` + +Open the notebook for the assignment you are working on from the `assignments/` folder. diff --git a/assignments/Assignment1_Perceptrons_MLP.ipynb b/assignments/Assignment1_Perceptrons_MLP.ipynb new file mode 100644 index 0000000..b24323e --- /dev/null +++ b/assignments/Assignment1_Perceptrons_MLP.ipynb @@ -0,0 +1,509 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Assignment 1: Perceptrons and Multi-Layer Perceptrons\n", + "\n", + "**Deep Learning FS26**\n", + "\n", + "---" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Part 1: Task Description\n", + "\n", + "### Problem Description\n", + "\n", + "In this assignment you will implement a Perceptron and a Multi-Layer Perceptron (MLP) from scratch using NumPy, and then explore the same models using PyTorch. The goal is to deepen your understanding of how neural networks learn through forward propagation and backpropagation.\n", + "\n", + "### Tasks Overview\n", + "\n", + "1. **Perceptron Implementation**\n", + " - Implement a single Perceptron with a step activation function.\n", + " - Train it on a linearly separable binary classification dataset (e.g., AND / OR gate).\n", + " - Observe the decision boundary before and after training.\n", + "\n", + "2. **Multi-Layer Perceptron (MLP)**\n", + " - Implement a 2-layer MLP with sigmoid activations from scratch.\n", + " - Train it using stochastic gradient descent (SGD) and backpropagation.\n", + " - Solve the XOR problem, which a single perceptron cannot solve.\n", + "\n", + "3. **PyTorch MLP**\n", + " - Reimplement the MLP using `torch.nn.Module`.\n", + " - Train using `torch.optim.SGD` and `nn.BCELoss`.\n", + " - Compare loss curves with your manual implementation.\n", + "\n", + "### Possible Solutions\n", + "\n", + "- The Perceptron should correctly classify all AND/OR inputs after training (accuracy = 100%).\n", + "- The MLP should learn the XOR function — a task impossible for a single-layer network.\n", + "- The loss should decrease monotonically (or near-monotonically) across epochs.\n", + "\n", + "### Expected Plots\n", + "\n", + "Below are hints for the kinds of visualizations your solutions should produce:\n", + "\n", + "- **Decision boundary plot** for the Perceptron: A linear boundary separating the two classes in 2D.\n", + " - The boundary should clearly separate class 0 and class 1 after training.\n", + "\n", + "- **Loss curve**: A plot of training loss vs. epoch showing a decreasing trend.\n", + " - The y-axis is the loss (BCE or MSE), and the x-axis is the epoch number.\n", + "\n", + "- **XOR decision boundary**: A non-linear boundary produced by the MLP separating the four XOR points.\n", + "\n", + "```\n", + "Example: Expected loss curve shape\n", + "\n", + "Loss\n", + "1.0 |*\n", + " | *\n", + "0.5 | **\n", + " | ****\n", + "0.0 |__________***__\n", + " 0 50 100 Epoch\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Part 2: Implementation\n", + "\n", + "### Setup & Imports" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "import torch\n", + "import torch.nn as nn\n", + "import torch.optim as optim\n", + "\n", + "# For reproducibility\n", + "np.random.seed(42)\n", + "torch.manual_seed(42)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Task 1: Perceptron from Scratch" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Dataset: AND gate\n", + "X_and = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])\n", + "y_and = np.array([0, 0, 0, 1]) # AND labels\n", + "\n", + "class Perceptron:\n", + " def __init__(self, n_features, lr=0.1):\n", + " # TODO: Initialize weights and bias\n", + " self.weights = np.zeros(n_features)\n", + " self.bias = 0.0\n", + " self.lr = lr\n", + "\n", + " def step(self, x):\n", + " # TODO: Implement step activation function\n", + " return 1 if x >= 0 else 0\n", + "\n", + " def predict(self, X):\n", + " # TODO: Compute weighted sum + bias, apply activation\n", + " linear_output = np.dot(X, self.weights) + self.bias\n", + " return np.array([self.step(x) for x in linear_output])\n", + "\n", + " def fit(self, X, y, epochs=100):\n", + " # TODO: Implement the perceptron learning rule\n", + " for epoch in range(epochs):\n", + " for xi, yi in zip(X, y):\n", + " y_pred = self.step(np.dot(xi, self.weights) + self.bias)\n", + " error = yi - y_pred\n", + " self.weights += self.lr * error * xi\n", + " self.bias += self.lr * error\n", + "\n", + "# Train\n", + "perceptron = Perceptron(n_features=2)\n", + "perceptron.fit(X_and, y_and, epochs=100)\n", + "preds = perceptron.predict(X_and)\n", + "print(\"AND gate predictions:\", preds)\n", + "print(\"Accuracy:\", np.mean(preds == y_and))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Visualize decision boundary\n", + "def plot_decision_boundary(model, X, y, title):\n", + " x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5\n", + " y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5\n", + " xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01),\n", + " np.arange(y_min, y_max, 0.01))\n", + " Z = model.predict(np.c_[xx.ravel(), yy.ravel()])\n", + " Z = Z.reshape(xx.shape)\n", + " plt.figure(figsize=(6, 5))\n", + " plt.contourf(xx, yy, Z, alpha=0.4, cmap=plt.cm.RdYlBu)\n", + " plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.RdYlBu, edgecolors='k', s=100)\n", + " plt.title(title)\n", + " plt.xlabel(\"Feature 1\")\n", + " plt.ylabel(\"Feature 2\")\n", + " plt.tight_layout()\n", + " plt.show()\n", + "\n", + "plot_decision_boundary(perceptron, X_and, y_and, \"Perceptron Decision Boundary (AND gate)\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Task 2: MLP from Scratch (XOR problem)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Dataset: XOR\n", + "X_xor = np.array([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=float)\n", + "y_xor = np.array([[0], [1], [1], [0]], dtype=float)\n", + "\n", + "class MLP:\n", + " def __init__(self, n_input, n_hidden, n_output, lr=0.5):\n", + " # TODO: Initialize weights and biases (Xavier-style)\n", + " self.W1 = np.random.randn(n_input, n_hidden) * 0.5\n", + " self.b1 = np.zeros((1, n_hidden))\n", + " self.W2 = np.random.randn(n_hidden, n_output) * 0.5\n", + " self.b2 = np.zeros((1, n_output))\n", + " self.lr = lr\n", + "\n", + " def sigmoid(self, z):\n", + " return 1 / (1 + np.exp(-z))\n", + "\n", + " def sigmoid_deriv(self, a):\n", + " return a * (1 - a)\n", + "\n", + " def forward(self, X):\n", + " # TODO: Implement forward pass\n", + " self.z1 = X @ self.W1 + self.b1\n", + " self.a1 = self.sigmoid(self.z1)\n", + " self.z2 = self.a1 @ self.W2 + self.b2\n", + " self.a2 = self.sigmoid(self.z2)\n", + " return self.a2\n", + "\n", + " def backward(self, X, y):\n", + " # TODO: Implement backpropagation\n", + " m = X.shape[0]\n", + " dL_da2 = self.a2 - y\n", + " da2_dz2 = self.sigmoid_deriv(self.a2)\n", + " delta2 = dL_da2 * da2_dz2\n", + "\n", + " dW2 = self.a1.T @ delta2 / m\n", + " db2 = delta2.mean(axis=0, keepdims=True)\n", + "\n", + " delta1 = (delta2 @ self.W2.T) * self.sigmoid_deriv(self.a1)\n", + " dW1 = X.T @ delta1 / m\n", + " db1 = delta1.mean(axis=0, keepdims=True)\n", + "\n", + " self.W2 -= self.lr * dW2\n", + " self.b2 -= self.lr * db2\n", + " self.W1 -= self.lr * dW1\n", + " self.b1 -= self.lr * db1\n", + "\n", + " def fit(self, X, y, epochs=5000):\n", + " losses = []\n", + " for epoch in range(epochs):\n", + " y_pred = self.forward(X)\n", + " loss = np.mean((y_pred - y) ** 2)\n", + " losses.append(loss)\n", + " self.backward(X, y)\n", + " return losses\n", + "\n", + " def predict(self, X):\n", + " return (self.forward(X) >= 0.5).astype(int)\n", + "\n", + "mlp = MLP(n_input=2, n_hidden=4, n_output=1, lr=1.0)\n", + "losses = mlp.fit(X_xor, y_xor, epochs=5000)\n", + "\n", + "preds = mlp.predict(X_xor)\n", + "print(\"XOR predictions:\", preds.flatten())\n", + "print(\"Expected: \", y_xor.flatten().astype(int))\n", + "print(\"Accuracy:\", np.mean(preds == y_xor))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Plot training loss curve\n", + "plt.figure(figsize=(8, 4))\n", + "plt.plot(losses, color='steelblue', linewidth=1.5)\n", + "plt.xlabel(\"Epoch\")\n", + "plt.ylabel(\"MSE Loss\")\n", + "plt.title(\"Training Loss — MLP on XOR (NumPy)\")\n", + "plt.grid(True, alpha=0.3)\n", + "plt.tight_layout()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Task 3: PyTorch MLP" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Prepare data as PyTorch tensors\n", + "X_t = torch.FloatTensor(X_xor)\n", + "y_t = torch.FloatTensor(y_xor)\n", + "\n", + "class TorchMLP(nn.Module):\n", + " def __init__(self, n_input, n_hidden, n_output):\n", + " super(TorchMLP, self).__init__()\n", + " # TODO: Define layers\n", + " self.layer1 = nn.Linear(n_input, n_hidden)\n", + " self.layer2 = nn.Linear(n_hidden, n_output)\n", + " self.sigmoid = nn.Sigmoid()\n", + "\n", + " def forward(self, x):\n", + " # TODO: Implement forward pass\n", + " x = self.sigmoid(self.layer1(x))\n", + " x = self.sigmoid(self.layer2(x))\n", + " return x\n", + "\n", + "model = TorchMLP(n_input=2, n_hidden=4, n_output=1)\n", + "optimizer = optim.SGD(model.parameters(), lr=1.0)\n", + "criterion = nn.BCELoss()\n", + "\n", + "torch_losses = []\n", + "for epoch in range(5000):\n", + " optimizer.zero_grad()\n", + " y_pred = model(X_t)\n", + " loss = criterion(y_pred, y_t)\n", + " loss.backward()\n", + " optimizer.step()\n", + " torch_losses.append(loss.item())\n", + "\n", + "with torch.no_grad():\n", + " preds_torch = (model(X_t) >= 0.5).float()\n", + "print(\"PyTorch XOR predictions:\", preds_torch.flatten().int().numpy())\n", + "print(\"Accuracy:\", (preds_torch == y_t).float().mean().item())" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Compare loss curves: NumPy vs PyTorch\n", + "plt.figure(figsize=(10, 4))\n", + "plt.plot(losses, label='NumPy MLP (MSE)', color='steelblue', linewidth=1.5)\n", + "plt.plot(torch_losses, label='PyTorch MLP (BCE)', color='coral', linewidth=1.5, linestyle='--')\n", + "plt.xlabel(\"Epoch\")\n", + "plt.ylabel(\"Loss\")\n", + "plt.title(\"Training Loss Comparison — NumPy vs PyTorch\")\n", + "plt.legend()\n", + "plt.grid(True, alpha=0.3)\n", + "plt.tight_layout()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Part 3: Experiments and Analysis\n", + "\n", + "In this section you will run experiments to better understand the behavior of perceptrons and MLPs." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Experiment 1: Effect of Learning Rate\n", + "\n", + "Train your NumPy MLP on the XOR problem with different learning rates (`0.01`, `0.1`, `1.0`, `5.0`) and compare the resulting loss curves." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "learning_rates = [0.01, 0.1, 1.0, 5.0]\n", + "plt.figure(figsize=(10, 5))\n", + "\n", + "for lr in learning_rates:\n", + " # TODO: Train MLP with each learning rate and plot the loss\n", + " np.random.seed(42)\n", + " model_lr = MLP(n_input=2, n_hidden=4, n_output=1, lr=lr)\n", + " lr_losses = model_lr.fit(X_xor, y_xor, epochs=5000)\n", + " plt.plot(lr_losses, label=f'lr={lr}', linewidth=1.5)\n", + "\n", + "plt.xlabel(\"Epoch\")\n", + "plt.ylabel(\"MSE Loss\")\n", + "plt.title(\"Effect of Learning Rate on Training Loss\")\n", + "plt.legend()\n", + "plt.grid(True, alpha=0.3)\n", + "plt.tight_layout()\n", + "plt.show()\n", + "\n", + "# TODO: Describe what you observe below\n", + "# - Which learning rate converges fastest?\n", + "# - Which causes instability?\n", + "# YOUR ANSWER HERE:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Experiment 2: Effect of Hidden Layer Size\n", + "\n", + "Train your MLP with different numbers of hidden units (`2`, `4`, `8`, `16`) and compare accuracy and loss." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "hidden_sizes = [2, 4, 8, 16]\n", + "plt.figure(figsize=(10, 5))\n", + "\n", + "for h in hidden_sizes:\n", + " # TODO: Train MLP with each hidden size and plot the loss\n", + " np.random.seed(42)\n", + " model_h = MLP(n_input=2, n_hidden=h, n_output=1, lr=1.0)\n", + " h_losses = model_h.fit(X_xor, y_xor, epochs=5000)\n", + " final_acc = np.mean(model_h.predict(X_xor) == y_xor)\n", + " plt.plot(h_losses, label=f'hidden={h}, acc={final_acc:.2f}', linewidth=1.5)\n", + "\n", + "plt.xlabel(\"Epoch\")\n", + "plt.ylabel(\"MSE Loss\")\n", + "plt.title(\"Effect of Hidden Layer Size on Training Loss\")\n", + "plt.legend()\n", + "plt.grid(True, alpha=0.3)\n", + "plt.tight_layout()\n", + "plt.show()\n", + "\n", + "# TODO: Describe what you observe below\n", + "# YOUR ANSWER HERE:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Experiment 3: XOR Decision Boundary Visualization\n", + "\n", + "Visualize the decision boundary of your trained MLP on the XOR problem." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# TODO: Plot the MLP decision boundary for XOR\n", + "np.random.seed(42)\n", + "mlp_final = MLP(n_input=2, n_hidden=4, n_output=1, lr=1.0)\n", + "mlp_final.fit(X_xor, y_xor, epochs=5000)\n", + "\n", + "x_min, x_max = -0.5, 1.5\n", + "y_min, y_max = -0.5, 1.5\n", + "xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01),\n", + " np.arange(y_min, y_max, 0.01))\n", + "Z = mlp_final.predict(np.c_[xx.ravel(), yy.ravel()])\n", + "Z = Z.reshape(xx.shape)\n", + "\n", + "plt.figure(figsize=(6, 5))\n", + "plt.contourf(xx, yy, Z, alpha=0.4, cmap=plt.cm.RdYlBu)\n", + "plt.scatter(X_xor[:, 0], X_xor[:, 1], c=y_xor.flatten(),\n", + " cmap=plt.cm.RdYlBu, edgecolors='k', s=200, zorder=3)\n", + "for i, (xi, yi) in enumerate(zip(X_xor, y_xor)):\n", + " plt.annotate(f'XOR={int(yi[0])}', xy=xi, xytext=(xi[0]+0.05, xi[1]+0.05))\n", + "plt.title(\"MLP Decision Boundary (XOR Problem)\")\n", + "plt.xlabel(\"Input 1\")\n", + "plt.ylabel(\"Input 2\")\n", + "plt.tight_layout()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Summary Questions\n", + "\n", + "Answer the following questions in the cell below:\n", + "\n", + "1. Why can a single Perceptron not solve the XOR problem?\n", + "2. What role does the hidden layer play in solving non-linearly separable problems?\n", + "3. How does the learning rate affect convergence speed and stability?\n", + "4. What happens if you initialize all weights to zero? Why?" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Your Answers:**\n", + "\n", + "1. *TODO: Your answer here*\n", + "\n", + "2. *TODO: Your answer here*\n", + "\n", + "3. *TODO: Your answer here*\n", + "\n", + "4. *TODO: Your answer here*" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "name": "python", + "version": "3.10.0" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/assignments/Assignment2_CNN.ipynb b/assignments/Assignment2_CNN.ipynb new file mode 100644 index 0000000..69970da --- /dev/null +++ b/assignments/Assignment2_CNN.ipynb @@ -0,0 +1,542 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Assignment 2: Convolutional Neural Networks\n", + "\n", + "**Deep Learning FS26**\n", + "\n", + "---" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Part 1: Task Description\n", + "\n", + "### Problem Description\n", + "\n", + "In this assignment you will implement and train Convolutional Neural Networks (CNNs) for image classification. You will build a CNN from scratch using PyTorch, apply it to the CIFAR-10 dataset, and study the effect of different architectural choices.\n", + "\n", + "### Tasks Overview\n", + "\n", + "1. **Manual Convolution**\n", + " - Implement a 2D convolution operation manually using NumPy (without `torch.nn.Conv2d`).\n", + " - Apply common filters (edge detection, blur) to a sample image.\n", + " - Visualize the filtered outputs.\n", + "\n", + "2. **CNN Architecture in PyTorch**\n", + " - Build a CNN with the following structure:\n", + " - Conv layer → ReLU → MaxPool → Conv layer → ReLU → MaxPool → Flatten → FC → ReLU → FC → Softmax\n", + " - Train on the **CIFAR-10** dataset.\n", + " - Use `CrossEntropyLoss` and the `Adam` optimizer.\n", + "\n", + "3. **Feature Map Visualization**\n", + " - Visualize the feature maps (activation outputs) produced by the first convolutional layer.\n", + " - Interpret what patterns different filters detect.\n", + "\n", + "### Possible Solutions\n", + "\n", + "- The manual convolution should produce the same result as `torch.nn.functional.conv2d` for verification.\n", + "- Your CNN should reach at least **60% test accuracy** on CIFAR-10 after 10 epochs.\n", + "- Feature maps in early layers should highlight edges, colors, and textures.\n", + "\n", + "### Expected Plots\n", + "\n", + "- **Filter response images**: Side-by-side comparison of the original image and the filtered output (edge detection, Gaussian blur).\n", + "\n", + " ```\n", + " [Original Image] [Edge Filter] [Blur Filter]\n", + " ```\n", + "\n", + "- **Training & Validation Curves**: Loss and accuracy per epoch for both training and validation sets.\n", + "\n", + " ```\n", + " Accuracy\n", + " 1.0 | ___train\n", + " | ___/\n", + " 0.6 | ___/ ___val\n", + " | / ___/\n", + " 0.0 |/________\n", + " 0 5 10 Epoch\n", + " ```\n", + "\n", + "- **Feature map grid**: A grid of 16–32 feature maps from the first conv layer for a single input image." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Part 2: Implementation\n", + "\n", + "### Setup & Imports" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "import torch\n", + "import torch.nn as nn\n", + "import torch.nn.functional as F\n", + "import torch.optim as optim\n", + "import torchvision\n", + "import torchvision.transforms as transforms\n", + "from torch.utils.data import DataLoader\n", + "\n", + "# Device setup\n", + "device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n", + "print(f'Using device: {device}')\n", + "\n", + "torch.manual_seed(42)\n", + "np.random.seed(42)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Task 1: Manual 2D Convolution" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def conv2d_manual(image, kernel, stride=1, padding=0):\n", + " \"\"\"\n", + " Perform a 2D convolution on a 2D image with the given kernel.\n", + " \n", + " Args:\n", + " image: 2D numpy array of shape (H, W)\n", + " kernel: 2D numpy array of shape (kH, kW)\n", + " stride: convolution stride (default 1)\n", + " padding: zero-padding size (default 0)\n", + " \n", + " Returns:\n", + " output: 2D numpy array with the convolution result\n", + " \"\"\"\n", + " # TODO: Implement 2D convolution\n", + " H, W = image.shape\n", + " kH, kW = kernel.shape\n", + "\n", + " if padding > 0:\n", + " image = np.pad(image, padding, mode='constant')\n", + "\n", + " out_H = (H + 2 * padding - kH) // stride + 1\n", + " out_W = (W + 2 * padding - kW) // stride + 1\n", + " output = np.zeros((out_H, out_W))\n", + "\n", + " for i in range(out_H):\n", + " for j in range(out_W):\n", + " region = image[i*stride:i*stride+kH, j*stride:j*stride+kW]\n", + " output[i, j] = np.sum(region * kernel)\n", + "\n", + " return output\n", + "\n", + "\n", + "# Define standard filters\n", + "edge_kernel = np.array([[-1, -1, -1],\n", + " [-1, 8, -1],\n", + " [-1, -1, -1]])\n", + "\n", + "blur_kernel = np.ones((5, 5)) / 25.0\n", + "\n", + "# Load a sample image using torchvision\n", + "sample_dataset = torchvision.datasets.CIFAR10(\n", + " root='./data', train=True, download=True,\n", + " transform=transforms.ToTensor()\n", + ")\n", + "sample_img, sample_label = sample_dataset[0]\n", + "# Convert to grayscale for convolution demo\n", + "sample_gray = sample_img.mean(dim=0).numpy() # (32, 32)\n", + "\n", + "# Apply filters\n", + "edge_output = conv2d_manual(sample_gray, edge_kernel, padding=1)\n", + "blur_output = conv2d_manual(sample_gray, blur_kernel, padding=2)\n", + "\n", + "# Visualize\n", + "classes = ['airplane','automobile','bird','cat','deer','dog','frog','horse','ship','truck']\n", + "fig, axes = plt.subplots(1, 3, figsize=(12, 4))\n", + "axes[0].imshow(sample_gray, cmap='gray')\n", + "axes[0].set_title(f'Original ({classes[sample_label]})')\n", + "axes[1].imshow(edge_output, cmap='gray')\n", + "axes[1].set_title('Edge Detection Filter')\n", + "axes[2].imshow(blur_output, cmap='gray')\n", + "axes[2].set_title('Gaussian Blur Filter')\n", + "for ax in axes:\n", + " ax.axis('off')\n", + "plt.suptitle('Manual 2D Convolution Results', fontsize=14)\n", + "plt.tight_layout()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Task 2: CNN on CIFAR-10" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Data loading with augmentation\n", + "transform_train = transforms.Compose([\n", + " transforms.RandomHorizontalFlip(),\n", + " transforms.RandomCrop(32, padding=4),\n", + " transforms.ToTensor(),\n", + " transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),\n", + "])\n", + "\n", + "transform_test = transforms.Compose([\n", + " transforms.ToTensor(),\n", + " transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),\n", + "])\n", + "\n", + "trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform_train)\n", + "testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform_test)\n", + "\n", + "trainloader = DataLoader(trainset, batch_size=128, shuffle=True, num_workers=2)\n", + "testloader = DataLoader(testset, batch_size=128, shuffle=False, num_workers=2)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "class SimpleCNN(nn.Module):\n", + " def __init__(self):\n", + " super(SimpleCNN, self).__init__()\n", + " # TODO: Define convolutional and fully connected layers\n", + " self.conv1 = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, padding=1)\n", + " self.conv2 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding=1)\n", + " self.pool = nn.MaxPool2d(kernel_size=2, stride=2)\n", + " self.fc1 = nn.Linear(64 * 8 * 8, 256)\n", + " self.fc2 = nn.Linear(256, 10)\n", + " self.dropout = nn.Dropout(p=0.5)\n", + "\n", + " def forward(self, x):\n", + " # TODO: Implement forward pass\n", + " x = self.pool(F.relu(self.conv1(x))) # (B, 32, 16, 16)\n", + " x = self.pool(F.relu(self.conv2(x))) # (B, 64, 8, 8)\n", + " x = x.view(x.size(0), -1) # Flatten\n", + " x = F.relu(self.fc1(x))\n", + " x = self.dropout(x)\n", + " x = self.fc2(x)\n", + " return x\n", + "\n", + "\n", + "def train_epoch(model, loader, optimizer, criterion, device):\n", + " model.train()\n", + " total_loss, correct, total = 0, 0, 0\n", + " for inputs, targets in loader:\n", + " inputs, targets = inputs.to(device), targets.to(device)\n", + " optimizer.zero_grad()\n", + " outputs = model(inputs)\n", + " loss = criterion(outputs, targets)\n", + " loss.backward()\n", + " optimizer.step()\n", + " total_loss += loss.item() * inputs.size(0)\n", + " _, predicted = outputs.max(1)\n", + " correct += predicted.eq(targets).sum().item()\n", + " total += inputs.size(0)\n", + " return total_loss / total, correct / total\n", + "\n", + "\n", + "def evaluate(model, loader, criterion, device):\n", + " model.eval()\n", + " total_loss, correct, total = 0, 0, 0\n", + " with torch.no_grad():\n", + " for inputs, targets in loader:\n", + " inputs, targets = inputs.to(device), targets.to(device)\n", + " outputs = model(inputs)\n", + " loss = criterion(outputs, targets)\n", + " total_loss += loss.item() * inputs.size(0)\n", + " _, predicted = outputs.max(1)\n", + " correct += predicted.eq(targets).sum().item()\n", + " total += inputs.size(0)\n", + " return total_loss / total, correct / total\n", + "\n", + "\n", + "# Train the model\n", + "cnn = SimpleCNN().to(device)\n", + "optimizer = optim.Adam(cnn.parameters(), lr=1e-3)\n", + "criterion = nn.CrossEntropyLoss()\n", + "\n", + "num_epochs = 10\n", + "train_losses, val_losses = [], []\n", + "train_accs, val_accs = [], []\n", + "\n", + "for epoch in range(1, num_epochs + 1):\n", + " tr_loss, tr_acc = train_epoch(cnn, trainloader, optimizer, criterion, device)\n", + " va_loss, va_acc = evaluate(cnn, testloader, criterion, device)\n", + " train_losses.append(tr_loss)\n", + " val_losses.append(va_loss)\n", + " train_accs.append(tr_acc)\n", + " val_accs.append(va_acc)\n", + " print(f'Epoch {epoch:02d} | '\n", + " f'Train Loss: {tr_loss:.3f}, Train Acc: {tr_acc:.3f} | '\n", + " f'Val Loss: {va_loss:.3f}, Val Acc: {va_acc:.3f}')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Plot training curves\n", + "fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))\n", + "\n", + "ax1.plot(range(1, num_epochs+1), train_losses, label='Train Loss', color='steelblue')\n", + "ax1.plot(range(1, num_epochs+1), val_losses, label='Val Loss', color='coral', linestyle='--')\n", + "ax1.set_xlabel('Epoch')\n", + "ax1.set_ylabel('Cross-Entropy Loss')\n", + "ax1.set_title('Loss Curves')\n", + "ax1.legend()\n", + "ax1.grid(True, alpha=0.3)\n", + "\n", + "ax2.plot(range(1, num_epochs+1), [a*100 for a in train_accs], label='Train Acc', color='steelblue')\n", + "ax2.plot(range(1, num_epochs+1), [a*100 for a in val_accs], label='Val Acc', color='coral', linestyle='--')\n", + "ax2.set_xlabel('Epoch')\n", + "ax2.set_ylabel('Accuracy (%)')\n", + "ax2.set_title('Accuracy Curves')\n", + "ax2.legend()\n", + "ax2.grid(True, alpha=0.3)\n", + "\n", + "plt.suptitle('CNN Training on CIFAR-10', fontsize=14)\n", + "plt.tight_layout()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Task 3: Feature Map Visualization" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Extract feature maps from the first conv layer\n", + "cnn.eval()\n", + "\n", + "# Pick a test image\n", + "test_img_tensor, test_label = testset[0]\n", + "test_img_tensor = test_img_tensor.unsqueeze(0).to(device)\n", + "\n", + "# Hook to capture intermediate activations\n", + "feature_maps = {}\n", + "\n", + "def get_activation(name):\n", + " def hook(model, input, output):\n", + " feature_maps[name] = output.detach()\n", + " return hook\n", + "\n", + "hook_handle = cnn.conv1.register_forward_hook(get_activation('conv1'))\n", + "\n", + "with torch.no_grad():\n", + " _ = cnn(test_img_tensor)\n", + "\n", + "hook_handle.remove()\n", + "\n", + "# Visualize feature maps\n", + "fmaps = feature_maps['conv1'].squeeze(0).cpu().numpy() # (32, H, W)\n", + "n_maps = min(16, fmaps.shape[0])\n", + "\n", + "fig, axes = plt.subplots(4, 4, figsize=(10, 10))\n", + "for i, ax in enumerate(axes.flat):\n", + " if i < n_maps:\n", + " ax.imshow(fmaps[i], cmap='viridis')\n", + " ax.set_title(f'Filter {i+1}', fontsize=9)\n", + " ax.axis('off')\n", + "\n", + "plt.suptitle(f'Conv1 Feature Maps — \"{classes[test_label]}\"', fontsize=14)\n", + "plt.tight_layout()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Part 3: Experiments and Analysis" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Experiment 1: Effect of Batch Normalization\n", + "\n", + "Add Batch Normalization (`nn.BatchNorm2d`) after each convolutional layer and compare training speed and final accuracy against the baseline CNN." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "class CNNWithBatchNorm(nn.Module):\n", + " def __init__(self):\n", + " super(CNNWithBatchNorm, self).__init__()\n", + " # TODO: Add BatchNorm layers after each conv layer\n", + " self.conv1 = nn.Conv2d(3, 32, 3, padding=1)\n", + " self.bn1 = nn.BatchNorm2d(32)\n", + " self.conv2 = nn.Conv2d(32, 64, 3, padding=1)\n", + " self.bn2 = nn.BatchNorm2d(64)\n", + " self.pool = nn.MaxPool2d(2, 2)\n", + " self.fc1 = nn.Linear(64 * 8 * 8, 256)\n", + " self.fc2 = nn.Linear(256, 10)\n", + " self.dropout = nn.Dropout(p=0.5)\n", + "\n", + " def forward(self, x):\n", + " x = self.pool(F.relu(self.bn1(self.conv1(x))))\n", + " x = self.pool(F.relu(self.bn2(self.conv2(x))))\n", + " x = x.view(x.size(0), -1)\n", + " x = F.relu(self.fc1(x))\n", + " x = self.dropout(x)\n", + " return self.fc2(x)\n", + "\n", + "\n", + "cnn_bn = CNNWithBatchNorm().to(device)\n", + "optimizer_bn = optim.Adam(cnn_bn.parameters(), lr=1e-3)\n", + "\n", + "bn_train_losses, bn_val_losses = [], []\n", + "bn_train_accs, bn_val_accs = [], []\n", + "\n", + "for epoch in range(1, num_epochs + 1):\n", + " tr_loss, tr_acc = train_epoch(cnn_bn, trainloader, optimizer_bn, criterion, device)\n", + " va_loss, va_acc = evaluate(cnn_bn, testloader, criterion, device)\n", + " bn_train_losses.append(tr_loss)\n", + " bn_val_losses.append(va_loss)\n", + " bn_train_accs.append(tr_acc)\n", + " bn_val_accs.append(va_acc)\n", + " print(f'[BN] Epoch {epoch:02d} | Train Acc: {tr_acc:.3f} | Val Acc: {va_acc:.3f}')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Compare baseline CNN vs CNN with BatchNorm\n", + "plt.figure(figsize=(10, 4))\n", + "plt.plot([a*100 for a in val_accs], label='Baseline CNN', color='steelblue', linewidth=2)\n", + "plt.plot([a*100 for a in bn_val_accs], label='CNN + BatchNorm', color='coral', linewidth=2, linestyle='--')\n", + "plt.xlabel('Epoch')\n", + "plt.ylabel('Validation Accuracy (%)')\n", + "plt.title('Validation Accuracy: Baseline vs Batch Normalization')\n", + "plt.legend()\n", + "plt.grid(True, alpha=0.3)\n", + "plt.tight_layout()\n", + "plt.show()\n", + "\n", + "# TODO: Describe what you observe\n", + "# YOUR ANSWER HERE:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Experiment 2: Confusion Matrix\n", + "\n", + "Compute and plot the confusion matrix for the trained CNN on the test set." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay\n", + "\n", + "# TODO: Collect all predictions on the test set\n", + "all_preds, all_targets = [], []\n", + "cnn.eval()\n", + "with torch.no_grad():\n", + " for inputs, targets in testloader:\n", + " inputs = inputs.to(device)\n", + " outputs = cnn(inputs)\n", + " _, predicted = outputs.max(1)\n", + " all_preds.extend(predicted.cpu().numpy())\n", + " all_targets.extend(targets.numpy())\n", + "\n", + "cm = confusion_matrix(all_targets, all_preds)\n", + "fig, ax = plt.subplots(figsize=(10, 8))\n", + "disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=classes)\n", + "disp.plot(ax=ax, cmap='Blues', colorbar=False)\n", + "plt.title('Confusion Matrix — CIFAR-10 Test Set', fontsize=14)\n", + "plt.tight_layout()\n", + "plt.show()\n", + "\n", + "# TODO: Which classes are most often confused? Why?\n", + "# YOUR ANSWER HERE:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Summary Questions\n", + "\n", + "1. What is the receptive field of a 3×3 conv → 3×3 conv stack? How does it differ from a single 5×5 conv?\n", + "2. Why is MaxPooling used in CNNs? What is its effect on spatial resolution and translation invariance?\n", + "3. How does BatchNorm affect training stability and generalization?\n", + "4. Which CIFAR-10 classes are hardest to classify? Can you explain why?" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Your Answers:**\n", + "\n", + "1. *TODO: Your answer here*\n", + "\n", + "2. *TODO: Your answer here*\n", + "\n", + "3. *TODO: Your answer here*\n", + "\n", + "4. *TODO: Your answer here*" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "name": "python", + "version": "3.10.0" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/assignments/Assignment3_RNN_LSTM.ipynb b/assignments/Assignment3_RNN_LSTM.ipynb new file mode 100644 index 0000000..a700fb9 --- /dev/null +++ b/assignments/Assignment3_RNN_LSTM.ipynb @@ -0,0 +1,652 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Assignment 3: Recurrent Neural Networks and LSTMs\n", + "\n", + "**Deep Learning FS26**\n", + "\n", + "---" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Part 1: Task Description\n", + "\n", + "### Problem Description\n", + "\n", + "In this assignment you will implement Recurrent Neural Networks (RNNs) and Long Short-Term Memory networks (LSTMs) using PyTorch. You will apply them to **sequence modeling** tasks: character-level language modeling and time-series forecasting. The goal is to understand how recurrent architectures handle temporal dependencies and the advantages of LSTMs over vanilla RNNs.\n", + "\n", + "### Tasks Overview\n", + "\n", + "1. **Vanilla RNN: Time-Series Forecasting**\n", + " - Implement a single-layer RNN using `nn.RNN` to predict future values of a synthetic sine wave.\n", + " - Use a sliding window approach to create training sequences.\n", + " - Evaluate the model's ability to generalize to unseen time steps.\n", + "\n", + "2. **LSTM: Time-Series Forecasting**\n", + " - Replace the RNN cell with an LSTM (`nn.LSTM`) and compare performance.\n", + " - Demonstrate that LSTM handles longer-range dependencies more effectively.\n", + "\n", + "3. **Character-Level Language Model**\n", + " - Train an LSTM on a small text corpus to learn character-level language patterns.\n", + " - Use the trained model to **generate** new text sequences by sampling from the predicted character distribution.\n", + "\n", + "### Possible Solutions\n", + "\n", + "- The RNN and LSTM should both converge on the sine wave task, but LSTM typically achieves lower MSE for longer sequences.\n", + "- The generated text from the language model should start to resemble the style and vocabulary of the training corpus after sufficient training.\n", + "- You should observe the **vanishing gradient** problem with vanilla RNNs for long sequence lengths.\n", + "\n", + "### Expected Plots\n", + "\n", + "- **Time-series prediction plot**: The ground truth sine curve versus the model's prediction overlaid on the same axes.\n", + "\n", + " ```\n", + " Value\n", + " 1.0 | /\\ Ground Truth\n", + " | / \\\n", + " 0.0 |/ \\____/ __Prediction\n", + " -1.0 | \\/\n", + " 0 50 100 150 200 t\n", + " ```\n", + "\n", + "- **RNN vs LSTM loss comparison**: Training loss curves for both models on the same axes, showing that LSTM converges faster and to a lower loss.\n", + "\n", + "- **Generated text sample**: A sequence of characters generated by the language model, visible in the notebook output.\n", + "\n", + "- **Hidden state heatmap**: A heatmap of the hidden state activations over time steps for a single input sequence, revealing what the recurrent cell has \"remembered\"." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Part 2: Implementation\n", + "\n", + "### Setup & Imports" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "import torch\n", + "import torch.nn as nn\n", + "import torch.optim as optim\n", + "from torch.utils.data import DataLoader, TensorDataset\n", + "\n", + "device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n", + "print(f'Using device: {device}')\n", + "\n", + "torch.manual_seed(42)\n", + "np.random.seed(42)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Task 1: Vanilla RNN — Sine Wave Prediction" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Generate a noisy sine wave\n", + "T = 1000\n", + "t = np.linspace(0, 8 * np.pi, T)\n", + "signal = np.sin(t) + 0.1 * np.random.randn(T)\n", + "\n", + "# Normalize\n", + "signal = (signal - signal.mean()) / signal.std()\n", + "\n", + "plt.figure(figsize=(12, 3))\n", + "plt.plot(t, signal, color='steelblue', linewidth=1)\n", + "plt.title('Input Signal (Noisy Sine Wave)')\n", + "plt.xlabel('Time')\n", + "plt.ylabel('Amplitude')\n", + "plt.grid(True, alpha=0.3)\n", + "plt.tight_layout()\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def create_sequences(data, seq_length):\n", + " \"\"\"\n", + " Create (input, target) pairs using a sliding window.\n", + " Input: data[i : i+seq_length]\n", + " Target: data[i+seq_length]\n", + " \"\"\"\n", + " xs, ys = [], []\n", + " for i in range(len(data) - seq_length):\n", + " xs.append(data[i:i+seq_length])\n", + " ys.append(data[i+seq_length])\n", + " return np.array(xs), np.array(ys)\n", + "\n", + "SEQ_LEN = 30\n", + "TRAIN_SIZE = 800\n", + "\n", + "X, y = create_sequences(signal, SEQ_LEN)\n", + "X_train, y_train = X[:TRAIN_SIZE], y[:TRAIN_SIZE]\n", + "X_test, y_test = X[TRAIN_SIZE:], y[TRAIN_SIZE:]\n", + "\n", + "# Convert to PyTorch tensors: shape (batch, seq_len, input_size=1)\n", + "X_train_t = torch.FloatTensor(X_train).unsqueeze(-1).to(device)\n", + "y_train_t = torch.FloatTensor(y_train).unsqueeze(-1).to(device)\n", + "X_test_t = torch.FloatTensor(X_test).unsqueeze(-1).to(device)\n", + "y_test_t = torch.FloatTensor(y_test).unsqueeze(-1).to(device)\n", + "\n", + "train_ds = TensorDataset(X_train_t, y_train_t)\n", + "train_dl = DataLoader(train_ds, batch_size=32, shuffle=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "class SimpleRNN(nn.Module):\n", + " def __init__(self, input_size=1, hidden_size=64, num_layers=1):\n", + " super(SimpleRNN, self).__init__()\n", + " # TODO: Define RNN layer and output layer\n", + " self.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first=True)\n", + " self.fc = nn.Linear(hidden_size, 1)\n", + "\n", + " def forward(self, x):\n", + " # TODO: Forward pass — use only the last hidden state\n", + " out, _ = self.rnn(x) # out: (batch, seq_len, hidden)\n", + " out = self.fc(out[:, -1, :]) # Last time step\n", + " return out\n", + "\n", + "\n", + "def train_sequence_model(model, train_dl, num_epochs=50):\n", + " criterion = nn.MSELoss()\n", + " optimizer = optim.Adam(model.parameters(), lr=1e-3)\n", + " losses = []\n", + " for epoch in range(num_epochs):\n", + " model.train()\n", + " epoch_loss = 0\n", + " for xb, yb in train_dl:\n", + " optimizer.zero_grad()\n", + " pred = model(xb)\n", + " loss = criterion(pred, yb)\n", + " loss.backward()\n", + " nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0) # Gradient clipping\n", + " optimizer.step()\n", + " epoch_loss += loss.item()\n", + " avg_loss = epoch_loss / len(train_dl)\n", + " losses.append(avg_loss)\n", + " if epoch % 10 == 0:\n", + " print(f'Epoch {epoch:3d} | Loss: {avg_loss:.5f}')\n", + " return losses\n", + "\n", + "\n", + "rnn_model = SimpleRNN(hidden_size=64).to(device)\n", + "print('Training Vanilla RNN...')\n", + "rnn_losses = train_sequence_model(rnn_model, train_dl, num_epochs=50)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Task 2: LSTM — Sine Wave Prediction" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "class SimpleLSTM(nn.Module):\n", + " def __init__(self, input_size=1, hidden_size=64, num_layers=1):\n", + " super(SimpleLSTM, self).__init__()\n", + " # TODO: Replace RNN with LSTM\n", + " self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)\n", + " self.fc = nn.Linear(hidden_size, 1)\n", + "\n", + " def forward(self, x):\n", + " # TODO: Forward pass — use only the last hidden state\n", + " out, _ = self.lstm(x)\n", + " out = self.fc(out[:, -1, :])\n", + " return out\n", + "\n", + "\n", + "lstm_model = SimpleLSTM(hidden_size=64).to(device)\n", + "print('Training LSTM...')\n", + "lstm_losses = train_sequence_model(lstm_model, train_dl, num_epochs=50)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Compare RNN vs LSTM loss curves\n", + "plt.figure(figsize=(10, 4))\n", + "plt.plot(rnn_losses, label='Vanilla RNN', color='steelblue', linewidth=2)\n", + "plt.plot(lstm_losses, label='LSTM', color='coral', linewidth=2, linestyle='--')\n", + "plt.xlabel('Epoch')\n", + "plt.ylabel('MSE Loss')\n", + "plt.title('Training Loss: Vanilla RNN vs LSTM')\n", + "plt.legend()\n", + "plt.grid(True, alpha=0.3)\n", + "plt.tight_layout()\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Visualize predictions on test set\n", + "rnn_model.eval()\n", + "lstm_model.eval()\n", + "\n", + "with torch.no_grad():\n", + " rnn_preds = rnn_model(X_test_t).cpu().numpy().flatten()\n", + " lstm_preds = lstm_model(X_test_t).cpu().numpy().flatten()\n", + "\n", + "ground_truth = y_test\n", + "time_idx = np.arange(len(ground_truth))\n", + "\n", + "plt.figure(figsize=(12, 4))\n", + "plt.plot(time_idx, ground_truth, label='Ground Truth', color='black', linewidth=1.5)\n", + "plt.plot(time_idx, rnn_preds, label='RNN Prediction', color='steelblue', linestyle='--', linewidth=1)\n", + "plt.plot(time_idx, lstm_preds, label='LSTM Prediction', color='coral', linestyle=':', linewidth=1.5)\n", + "plt.xlabel('Time Step')\n", + "plt.ylabel('Amplitude')\n", + "plt.title('Sine Wave Prediction: RNN vs LSTM')\n", + "plt.legend()\n", + "plt.grid(True, alpha=0.3)\n", + "plt.tight_layout()\n", + "plt.show()\n", + "\n", + "rnn_mse = np.mean((ground_truth - rnn_preds) ** 2)\n", + "lstm_mse = np.mean((ground_truth - lstm_preds) ** 2)\n", + "print(f'RNN Test MSE: {rnn_mse:.5f}')\n", + "print(f'LSTM Test MSE: {lstm_mse:.5f}')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Task 3: Character-Level Language Model" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Small text corpus — a sample of English text\n", + "corpus = \"\"\"\n", + "Deep learning is part of a broader family of machine learning methods based on artificial neural networks\n", + "with representation learning. Learning can be supervised, semi-supervised or unsupervised. Deep learning\n", + "architectures such as deep neural networks, recurrent neural networks, convolutional neural networks and\n", + "transformers have been applied to fields including computer vision, speech recognition, natural language\n", + "processing, machine translation, bioinformatics, drug design, and medical image analysis where they have\n", + "produced results comparable to and in some cases surpassing human expert performance.\n", + "\"\"\".strip().lower()\n", + "\n", + "# Build character vocabulary\n", + "chars = sorted(set(corpus))\n", + "char_to_idx = {ch: idx for idx, ch in enumerate(chars)}\n", + "idx_to_char = {idx: ch for ch, idx in char_to_idx.items()}\n", + "vocab_size = len(chars)\n", + "print(f'Vocabulary size: {vocab_size}')\n", + "print(f'Vocab: {chars}')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Encode the corpus\n", + "CHAR_SEQ_LEN = 40\n", + "encoded = [char_to_idx[ch] for ch in corpus]\n", + "\n", + "X_chars, y_chars = [], []\n", + "for i in range(len(encoded) - CHAR_SEQ_LEN):\n", + " X_chars.append(encoded[i:i+CHAR_SEQ_LEN])\n", + " y_chars.append(encoded[i+CHAR_SEQ_LEN])\n", + "\n", + "X_chars = torch.LongTensor(X_chars).to(device)\n", + "y_chars = torch.LongTensor(y_chars).to(device)\n", + "\n", + "char_ds = TensorDataset(X_chars, y_chars)\n", + "char_dl = DataLoader(char_ds, batch_size=64, shuffle=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "class CharLSTM(nn.Module):\n", + " def __init__(self, vocab_size, embed_dim, hidden_size, num_layers=2):\n", + " super(CharLSTM, self).__init__()\n", + " # TODO: Embedding layer + LSTM + output FC layer\n", + " self.embedding = nn.Embedding(vocab_size, embed_dim)\n", + " self.lstm = nn.LSTM(embed_dim, hidden_size, num_layers, batch_first=True, dropout=0.3)\n", + " self.fc = nn.Linear(hidden_size, vocab_size)\n", + "\n", + " def forward(self, x, hidden=None):\n", + " # x: (batch, seq_len) of character indices\n", + " x = self.embedding(x) # (batch, seq_len, embed_dim)\n", + " out, hidden = self.lstm(x, hidden)\n", + " out = self.fc(out[:, -1, :]) # Predict next char from last time step\n", + " return out, hidden\n", + "\n", + "\n", + "char_model = CharLSTM(vocab_size, embed_dim=32, hidden_size=128, num_layers=2).to(device)\n", + "char_optimizer = optim.Adam(char_model.parameters(), lr=1e-3)\n", + "char_criterion = nn.CrossEntropyLoss()\n", + "\n", + "char_losses = []\n", + "num_char_epochs = 100\n", + "\n", + "for epoch in range(1, num_char_epochs + 1):\n", + " char_model.train()\n", + " epoch_loss = 0\n", + " for xb, yb in char_dl:\n", + " char_optimizer.zero_grad()\n", + " logits, _ = char_model(xb)\n", + " loss = char_criterion(logits, yb)\n", + " loss.backward()\n", + " nn.utils.clip_grad_norm_(char_model.parameters(), max_norm=1.0)\n", + " char_optimizer.step()\n", + " epoch_loss += loss.item()\n", + " avg_loss = epoch_loss / len(char_dl)\n", + " char_losses.append(avg_loss)\n", + " if epoch % 20 == 0:\n", + " print(f'Epoch {epoch:4d} | Char Loss: {avg_loss:.4f}')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Text generation function\n", + "def generate_text(model, seed_text, length=200, temperature=1.0):\n", + " \"\"\"\n", + " Generate text by sampling from the model's predicted character distribution.\n", + " \n", + " Args:\n", + " model: trained CharLSTM\n", + " seed_text: starting string (at least CHAR_SEQ_LEN characters)\n", + " length: number of characters to generate\n", + " temperature: sampling temperature (higher = more random)\n", + " \"\"\"\n", + " model.eval()\n", + " generated = seed_text\n", + " current_seq = [char_to_idx.get(ch, 0) for ch in seed_text[-CHAR_SEQ_LEN:]]\n", + "\n", + " with torch.no_grad():\n", + " for _ in range(length):\n", + " x = torch.LongTensor([current_seq]).to(device)\n", + " logits, _ = model(x)\n", + " # Apply temperature and sample\n", + " probs = torch.softmax(logits / temperature, dim=-1).squeeze()\n", + " next_idx = torch.multinomial(probs, 1).item()\n", + " generated += idx_to_char[next_idx]\n", + " current_seq = current_seq[1:] + [next_idx]\n", + "\n", + " return generated\n", + "\n", + "\n", + "# Generate text with different temperatures\n", + "seed = corpus[:CHAR_SEQ_LEN]\n", + "print('=== Generated Text (temperature=0.5) ===')\n", + "print(generate_text(char_model, seed, length=200, temperature=0.5))\n", + "print()\n", + "print('=== Generated Text (temperature=1.0) ===')\n", + "print(generate_text(char_model, seed, length=200, temperature=1.0))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Plot character-level language model loss\n", + "plt.figure(figsize=(8, 4))\n", + "plt.plot(char_losses, color='mediumseagreen', linewidth=1.5)\n", + "plt.xlabel('Epoch')\n", + "plt.ylabel('Cross-Entropy Loss')\n", + "plt.title('Character-Level Language Model Training Loss')\n", + "plt.grid(True, alpha=0.3)\n", + "plt.tight_layout()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Part 3: Experiments and Analysis" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Experiment 1: Effect of Sequence Length on RNN Performance\n", + "\n", + "Train the vanilla RNN and LSTM on sequence lengths of `[10, 30, 60, 100]`. Report the test MSE for each sequence length and discuss how the models cope with longer dependencies." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "seq_lengths = [10, 30, 60, 100]\n", + "rnn_mse_results, lstm_mse_results = [], []\n", + "\n", + "for seq_len in seq_lengths:\n", + " # Prepare data\n", + " X_s, y_s = create_sequences(signal, seq_len)\n", + " split = 800\n", + " X_tr = torch.FloatTensor(X_s[:split]).unsqueeze(-1).to(device)\n", + " y_tr = torch.FloatTensor(y_s[:split]).unsqueeze(-1).to(device)\n", + " X_te = torch.FloatTensor(X_s[split:]).unsqueeze(-1).to(device)\n", + " y_te = torch.FloatTensor(y_s[split:]).unsqueeze(-1).to(device)\n", + "\n", + " ds = TensorDataset(X_tr, y_tr)\n", + " dl = DataLoader(ds, batch_size=32, shuffle=True)\n", + "\n", + " # TODO: Train RNN and LSTM, record test MSE\n", + " torch.manual_seed(42)\n", + " rnn_s = SimpleRNN(hidden_size=64).to(device)\n", + " train_sequence_model(rnn_s, dl, num_epochs=30)\n", + " rnn_s.eval()\n", + " with torch.no_grad():\n", + " rnn_p = rnn_s(X_te).cpu().numpy().flatten()\n", + " rnn_mse_results.append(np.mean((y_s[split:] - rnn_p) ** 2))\n", + "\n", + " torch.manual_seed(42)\n", + " lstm_s = SimpleLSTM(hidden_size=64).to(device)\n", + " train_sequence_model(lstm_s, dl, num_epochs=30)\n", + " lstm_s.eval()\n", + " with torch.no_grad():\n", + " lstm_p = lstm_s(X_te).cpu().numpy().flatten()\n", + " lstm_mse_results.append(np.mean((y_s[split:] - lstm_p) ** 2))\n", + "\n", + " print(f'SeqLen={seq_len:3d} | RNN MSE: {rnn_mse_results[-1]:.5f} | LSTM MSE: {lstm_mse_results[-1]:.5f}')\n", + "\n", + "# Plot\n", + "plt.figure(figsize=(8, 4))\n", + "plt.plot(seq_lengths, rnn_mse_results, 'o-', label='Vanilla RNN', color='steelblue')\n", + "plt.plot(seq_lengths, lstm_mse_results, 's--', label='LSTM', color='coral')\n", + "plt.xlabel('Sequence Length')\n", + "plt.ylabel('Test MSE')\n", + "plt.title('Test MSE vs. Sequence Length')\n", + "plt.legend()\n", + "plt.grid(True, alpha=0.3)\n", + "plt.tight_layout()\n", + "plt.show()\n", + "\n", + "# TODO: Interpret these results\n", + "# YOUR ANSWER HERE:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Experiment 2: Hidden State Heatmap\n", + "\n", + "Visualize the LSTM hidden state activations over time for a single input sequence. This reveals which time steps cause the strongest response in the recurrent cells." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Extract all hidden states (not just the last one)\n", + "class LSTMWithAllHidden(nn.Module):\n", + " def __init__(self, input_size=1, hidden_size=64):\n", + " super().__init__()\n", + " self.lstm = nn.LSTM(input_size, hidden_size, batch_first=True)\n", + " self.fc = nn.Linear(hidden_size, 1)\n", + "\n", + " def forward(self, x):\n", + " all_hidden, _ = self.lstm(x) # all_hidden: (batch, seq, hidden)\n", + " return self.fc(all_hidden[:, -1, :]), all_hidden\n", + "\n", + "\n", + "# Train this model briefly\n", + "torch.manual_seed(42)\n", + "lstm_vis = LSTMWithAllHidden(hidden_size=32).to(device)\n", + "vis_optimizer = optim.Adam(lstm_vis.parameters(), lr=1e-3)\n", + "vis_criterion = nn.MSELoss()\n", + "\n", + "for epoch in range(30):\n", + " lstm_vis.train()\n", + " for xb, yb in train_dl:\n", + " vis_optimizer.zero_grad()\n", + " pred, _ = lstm_vis(xb)\n", + " loss = vis_criterion(pred, yb)\n", + " loss.backward()\n", + " vis_optimizer.step()\n", + "\n", + "# Pick one test sequence\n", + "single_seq = X_test_t[:1] # shape (1, seq_len, 1)\n", + "lstm_vis.eval()\n", + "with torch.no_grad():\n", + " _, hidden_states = lstm_vis(single_seq) # (1, seq_len, hidden)\n", + "\n", + "hidden_np = hidden_states.squeeze(0).cpu().numpy() # (seq_len, hidden_size)\n", + "\n", + "plt.figure(figsize=(12, 5))\n", + "plt.imshow(hidden_np.T, aspect='auto', cmap='RdBu_r', interpolation='nearest')\n", + "plt.colorbar(label='Activation')\n", + "plt.xlabel('Time Step')\n", + "plt.ylabel('Hidden Unit')\n", + "plt.title('LSTM Hidden State Activations Over Time')\n", + "plt.tight_layout()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Experiment 3: Temperature Sampling\n", + "\n", + "Generate text at several temperature values (`0.2`, `0.5`, `1.0`, `1.5`) and observe how the diversity and coherence of the generated text changes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "temperatures = [0.2, 0.5, 1.0, 1.5]\n", + "for temp in temperatures:\n", + " print(f'\\n=== Temperature = {temp} ===')\n", + " print(generate_text(char_model, seed, length=150, temperature=temp))\n", + "\n", + "# TODO: Describe the effect of temperature on the generated text\n", + "# - Low temperature (e.g. 0.2): ...\n", + "# - High temperature (e.g. 1.5): ...\n", + "# YOUR ANSWER HERE:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Summary Questions\n", + "\n", + "1. What is the vanishing gradient problem in RNNs? How does LSTM address it?\n", + "2. Explain the role of the forget gate, input gate, and output gate in an LSTM cell.\n", + "3. Why does a higher temperature in text generation lead to more diverse but less coherent output?\n", + "4. What are the main limitations of character-level language models compared to word-level or subword models?" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Your Answers:**\n", + "\n", + "1. *TODO: Your answer here*\n", + "\n", + "2. *TODO: Your answer here*\n", + "\n", + "3. *TODO: Your answer here*\n", + "\n", + "4. *TODO: Your answer here*" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "name": "python", + "version": "3.10.0" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} From acfff0fd8781751ba3dd35459709a0fca502e2b4 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Thu, 5 Mar 2026 22:32:05 +0000 Subject: [PATCH 3/3] Add Jupyter launcher script and detailed server URL instructions Co-authored-by: sing-git <183478851+sing-git@users.noreply.github.com> --- README.md | 49 ++++++++++++++++++++++++++++++++++++++++++++++-- start_jupyter.sh | 31 ++++++++++++++++++++++++++++++ 2 files changed, 78 insertions(+), 2 deletions(-) create mode 100755 start_jupyter.sh diff --git a/README.md b/README.md index 5cf18ed..9464124 100644 --- a/README.md +++ b/README.md @@ -22,9 +22,54 @@ Each assignment is provided as a Jupyter notebook and includes three parts: ## Getting Started +### Option A — Quickstart script (recommended) + +Run the provided launcher script from the repository root. It installs all required packages and starts the Jupyter server automatically: + ```bash -pip install torch torchvision numpy matplotlib scikit-learn jupyter -jupyter notebook +bash start_jupyter.sh +``` + +The terminal will print output similar to: + ``` +[2/2] Starting Jupyter Notebook server... + The server URL with login token will appear below. +-------------------------------------- +[I 2026-03-05 22:00:00.000 ServerApp] Jupyter Server 2.x is running at: +[I 2026-03-05 22:00:00.000 ServerApp] http://127.0.0.1:8888/tree?token=abc123def456... +[I 2026-03-05 22:00:00.000 ServerApp] or http://127.0.0.1:8888/?token=abc123def456... +``` + +**Copy the full `http://127.0.0.1:8888/...?token=...` URL** from your terminal and: + +- **Browser**: paste it directly into the address bar. +- **VS Code**: open the Command Palette (`Ctrl+Shift+P` / `Cmd+Shift+P`) → **"Jupyter: Specify Jupyter Server for Connections"** → paste the URL. +- **PyCharm / DataSpell**: go to *Settings → Tools → Jupyter → Jupyter Servers* → add a new server and paste the URL. + +--- + +### Option B — Manual setup + +```bash +# 1. Install dependencies +pip install torch torchvision numpy matplotlib scikit-learn notebook + +# 2. Start the server (copy the token URL from terminal output) +jupyter notebook --no-browser --notebook-dir=assignments + +# 3. Open the printed URL in your browser or IDE +``` + +--- + +### Option C — Run in the cloud (no local install) + +Click one of the badges below to open the notebooks directly in your browser — no installation needed: + +| Service | Link | +|---------|------| +| Google Colab | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/sing-git/DeepLearning_FS26/blob/main/assignments/Assignment1_Perceptrons_MLP.ipynb) | +| Binder | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/sing-git/DeepLearning_FS26/HEAD?urlpath=tree/assignments) | Open the notebook for the assignment you are working on from the `assignments/` folder. diff --git a/start_jupyter.sh b/start_jupyter.sh new file mode 100755 index 0000000..2db0319 --- /dev/null +++ b/start_jupyter.sh @@ -0,0 +1,31 @@ +#!/usr/bin/env bash +# start_jupyter.sh — Install dependencies (if needed) and launch Jupyter Notebook. +# The server URL (including the login token) is printed to the terminal. + +set -e + +echo "======================================" +echo " DeepLearning FS26 — Jupyter Launcher" +echo "======================================" + +# 1. Install Python dependencies if any are missing +PACKAGES="torch torchvision numpy matplotlib scikit-learn notebook" +echo "" +echo "[1/2] Checking / installing Python packages..." +pip install --quiet $PACKAGES +echo " Done." + +# 2. Launch Jupyter Notebook and print the URL +echo "" +echo "[2/2] Starting Jupyter Notebook server..." +echo " The server URL with login token will appear below." +echo " Copy the http://127.0.0.1:8888/... link and paste it" +echo " into your browser or into the 'Jupyter Server URL' dialog" +echo " in VS Code / PyCharm." +echo "" +echo " Press Ctrl+C to stop the server." +echo "--------------------------------------" + +# --no-browser: don't try to open a browser automatically +# --notebook-dir: open directly in the assignments folder +jupyter notebook --no-browser --notebook-dir="$(dirname "$0")/assignments"