Skip to content

Ryan-W31/myGPT2

Repository files navigation

myGPT2

Building a GPT-2 Model from scratch!

This repo is inspired by and follows along Andrej Karpathy's Neural Networks: Zero to Hero youtube playlist.

In this repo, I built a GPT-2 clone based on the GPT-2 paper and the official GPT-2 repository. This model is build using only decoder self-attention blocks and does not implement the encoder block with cross-attention architecture as shown in the GPT-2 paper.

The model in this repo has 124M parameters and uses the FineWeb-Edu dataset to train. Specifically, the FineWeb-Edu dataset with 10B tokens.

Evaluation was done using Hellaswag and compared to the 124M parameter GPT-2 model and the 124M parameter GPT-3 model.

Results

After evaluating using both the cross entropy loss and the Hellaswag evaluation, I obtained these results: myGPT2 Evaluation In the image above, you can find the minimum training and validation loss and the maximum Hellaswag accuracy.

The chart on the left shows the loss comparison between myGPT-2 and the OpenAI 124M GPT-2 Model. It's very cool that myGPT-2 was able to beat the OpenAI Model!

The chart on the right shows the Hellaswag accuracy comparison between myGPT-2 and the OpenAI 124M GPT-2 Model and the OpenAI 124M GPT-3 Model. Again, myGPT-2 was able to beat the OpenAI 124M GPT-2 model, but it did not get close to the OpenAI 124M GPT-3 Model.

Resources

About

Building a GPT-2 Model from scratch!

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors