myGPT2

Building a GPT-2 Model from scratch!

This repo is inspired by and follows along Andrej Karpathy's Neural Networks: Zero to Hero youtube playlist.

In this repo, I built a GPT-2 clone based on the GPT-2 paper and the official GPT-2 repository. This model is build using only decoder self-attention blocks and does not implement the encoder block with cross-attention architecture as shown in the GPT-2 paper.

The model in this repo has 124M parameters and uses the FineWeb-Edu dataset to train. Specifically, the FineWeb-Edu dataset with 10B tokens.

Evaluation was done using Hellaswag and compared to the 124M parameter GPT-2 model and the 124M parameter GPT-3 model.

Results

After evaluating using both the cross entropy loss and the Hellaswag evaluation, I obtained these results: In the image above, you can find the minimum training and validation loss and the maximum Hellaswag accuracy.

The chart on the left shows the loss comparison between myGPT-2 and the OpenAI 124M GPT-2 Model. It's very cool that myGPT-2 was able to beat the OpenAI Model!

The chart on the right shows the Hellaswag accuracy comparison between myGPT-2 and the OpenAI 124M GPT-2 Model and the OpenAI 124M GPT-3 Model. Again, myGPT-2 was able to beat the OpenAI 124M GPT-2 model, but it did not get close to the OpenAI 124M GPT-3 Model.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
README.md		README.md
eval.ipynb		eval.ipynb
eval.py		eval.py
hellaswag_eval.py		hellaswag_eval.py
input.txt		input.txt
myGPT2_eval.png		myGPT2_eval.png
preprocessing.py		preprocessing.py
requirements.txt		requirements.txt
train_mygpt2.py		train_mygpt2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

myGPT2

Building a GPT-2 Model from scratch!

Results

Resources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

myGPT2

Building a GPT-2 Model from scratch!

Results

Resources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages