google-deepmind · boiled-darvari · Oct 25, 2025 · Oct 30, 2025 · Oct 30, 2025 · Nov 1, 2025
diff --git a/README.md b/README.md
@@ -21,7 +21,7 @@ initially through a private beta program.
 
 ## The AI Research Foundations Courses
 
-The eight courses are in the curriculum are:
+The eight courses in the curriculum are:
 
 1. Build Your Own Small Language Model
 2. Represent Your Language Data

diff --git a/ai_foundations/visualizations/plots.py b/ai_foundations/visualizations/plots.py
@@ -290,7 +290,7 @@ def plot_data_and_decision_boundary(
 
     if sum(classification_errors) == 0:
       print(
-          "\n\n✅ Well done! Your decision boundary correclty separates"
+          "\n\n✅ Well done! Your decision boundary correctly separates"
           " all data points."
       )
     else:

diff --git a/course_1/gdm_lab_1_3_compare_n_gram_models_and_transformer_language_models.ipynb b/course_1/gdm_lab_1_3_compare_n_gram_models_and_transformer_language_models.ipynb
@@ -237,8 +237,22 @@
         "print(\"Loaded trigram model.\\n\")\n",
         "\n",
         "print(\"Loading Gemma-1B model...\")\n",
-        "gemma_model = generation.load_gemma()\n",
-        "print(\"Loaded Gemma-1B model.\")"
+        "import time\n",
+        "\n",
+        "# Retry loading the Gemma model with exception handling\n",
+        "max_load_retries = 3\n",
+        "for i in range(max_load_retries):\n",
+        "    try:\n",
+        "        gemma_model = generation.load_gemma()\n",
+        "        print(\"Loaded Gemma-1B model.\")\n",
+        "        break # Exit the loop if loading is successful\n",
+        "    except Exception as e:\n",
+        "        print(f\"Attempt {i+1}/{max_load_retries}: Error loading Gemma model: {e}\")\n",
+        "        if i < max_load_retries - 1:\n",
+        "            print(\"Retrying in 10 seconds...\")\n",
+        "            time.sleep(10)\n",
+        "        else:\n",
+        "            print(\"Failed to load Gemma model after multiple retries.\")\n",
       ]
     },
     {
@@ -626,7 +640,7 @@
         "\n",
         "You have now directly compared the generations of a trigram model and a transformer model and have observed many differences. These comparisons highlighted contrasts in terms of fluency, coherence and relevance between the two models. While the n-gram model often generated word salads or failed to generate a continuation at all, the transformer model generally generated quite reasonable responses (though sometimes they may have not been entirely perfect either).\n",
         "\n",
-        "Note that this comparison was stacked against the n-gram model. That is because the difference between the trigram model and the Gemma-1B model, which were both trained the Africa Galore dataset, is not only one of implementation. The Gemma-1B model has also been trained on a very large dataset. In comparison, the trigram model has only been trained on the paragraphs in the Africa Galore dataset. That being said, even if you had trained the n-gram model on as much data as the Gemma-1B model, the transformer model would have still performed much better.\n",
+        "Note that this comparison was stacked against the n-gram model. That is because the difference between the trigram model and the Gemma-1B model, which were both trained on the Africa Galore dataset, is not only one of implementation. The Gemma-1B model has also been trained on a very large dataset. In comparison, the trigram model has only been trained on the paragraphs in the Africa Galore dataset. That being said, even if you had trained the n-gram model on as much data as the Gemma-1B model, the transformer model would have still performed much better.\n",
         "\n",
         "There are two primary reasons for this:\n",
         "- Transformers have much larger context windows and can therefore consider the information of tokens that are further away from the token to be generated. N-gram models, on the other hand, only have a context window of $n-1$. So in the case of the trigram model, the model only considered the last two tokens for making predictions.\n",

diff --git a/course_2/gdm_lab_2_1_preprocess_data.ipynb b/course_2/gdm_lab_2_1_preprocess_data.ipynb
@@ -350,7 +350,7 @@
         "id": "Qr23mRhp5fit"
       },
       "source": [
-        "Test your function. Make sure that `&lt;` is replaced with `<`, and `&gt;` is replaced with `>` and and `&amp` is replaced with `&`:"
+        "Test your function. Make sure that `&lt;` is replaced with `<`, and `&gt;` is replaced with `>` and `&amp` is replaced with `&`:"
       ]
     },
     {

diff --git a/course_2/gdm_lab_2_2_tokenize_texts_into_characters_and_words.ipynb b/course_2/gdm_lab_2_2_tokenize_texts_into_characters_and_words.ipynb
@@ -429,7 +429,7 @@
         "id": "2zqyYr9pSpNx"
       },
       "source": [
-        "As a first step, take a look again at the first paragraph in the Africa Galore dataset to to remind yourself what the data looks like.\n"
+        "As a first step, take a look again at the first paragraph in the Africa Galore dataset to remind yourself what the data looks like.\n"
       ]
     },
     {

diff --git a/course_3/gdm_lab_3_1_distinguish_between_signal_and_noise.ipynb b/course_3/gdm_lab_3_1_distinguish_between_signal_and_noise.ipynb
@@ -76,7 +76,7 @@
       "source": [
         "### Tasks\n",
         "\n",
-        "You will work with three small language models that have all been trained on a noisy version of the Africa Galore dataset. In this dataset, one of the paragraphs includes a spelling mistake. The phrase \"a vibrant symbol of\" is misspelled as \"a vibrant symbol fo\". Furthermore, this is the only occurence of the phrase \"a vibrant symbol\". All other paragraphs that include the word symbol do not include the adjective \"vibrant\".\n",
+        "You will work with three small language models that have all been trained on a noisy version of the Africa Galore dataset. In this dataset, one of the paragraphs includes a spelling mistake. The phrase \"a vibrant symbol of\" is misspelled as \"a vibrant symbol fo\". Furthermore, this is the only occurrence of the phrase \"a vibrant symbol\". All other paragraphs that include the word symbol do not include the adjective \"vibrant\".\n",
         "\n",
         "**In this lab, you will**:\n",
         "* Compare the continuations to different prompts for models that have been trained for 10, 400, and 1,000 epochs.\n",

diff --git a/course_3/gdm_lab_3_6_mitigate_overfitting.ipynb b/course_3/gdm_lab_3_6_mitigate_overfitting.ipynb
@@ -257,7 +257,7 @@
       "source": [
         "## Tune hyperparameters\n",
         "\n",
-        "In the following cells, you will change one hyperparameter at at time. For each set of hyperparameters, you will train and evaluate a model. You will also inspect the loss curves and accuracy curves for each training run.\n",
+        "In the following cells, you will change one hyperparameter at a time. For each set of hyperparameters, you will train and evaluate a model. You will also inspect the loss curves and accuracy curves for each training run.\n",
         "\n",
         "Run the following cell to define a function that performs the training and visualizations for a given set of hyperparameters. In this function, you will see all components required for training a model, such as the loss function and the optimizer. For now, ignore these details. You will learn more about each of these components in later articles and labs."
       ]
@@ -756,7 +756,7 @@
         "\n",
         "You probably noticed that for both the dropout rate and weight decay strength, the model's performance initially improved compared to the baseline when you set them to a smaller value. However, performance worsened significantly when you set either of these values too high. This is a very common pattern and usually you have to try several values until you find the one that works best for your model and dataset.\n",
         "\n",
-        "In this lab, you also modified one hyperparameter at a time. In practice, you often want to combine overfitting methods, for example dropout and early stopping. If you want to experiment further, add more cells to this lab and try out additional combinations fo hyperparameters."
+        "In this lab, you also modified one hyperparameter at a time. In practice, you often want to combine overfitting methods, for example dropout and early stopping. If you want to experiment further, add more cells to this lab and try out additional combinations of hyperparameters."
       ]
     },
     {

diff --git a/course_4/gdm_lab_4_5_reflection_on_trainable_parameters.ipynb b/course_4/gdm_lab_4_5_reflection_on_trainable_parameters.ipynb
@@ -1283,7 +1283,7 @@
       "source": [
         "## Optional: Training the model\n",
         "\n",
-        "As a last optional exercise, if you would like to see this model in action, you can run the following hidden cell to load the Africa Galore dataset, tokenize and pad the data, and train the model. This will take about one minute to run on a Colab instance with a a GPU or 10 minutes on a Colab instance with a CPU.\n",
+        "As a last optional exercise, if you would like to see this model in action, you can run the following hidden cell to load the Africa Galore dataset, tokenize and pad the data, and train the model. This will take about one minute to run on a Colab instance with a GPU or 10 minutes on a Colab instance with a CPU.\n",
         "\n",
         "You can then sample continuations to a prompt from the model in the cell after the training loop.\n",
         "\n"