Rocket Lander PPO

Custom reinforcement learning project for training a continuous-control rocket to land on a pad using Proximal Policy Optimization (PPO) in PyTorch.

What This Project Does

Simulates a 2D rocket landing environment with thrust, torque, fuel use, and strict landing constraints
Trains a PPO agent with observation normalization and generalized advantage estimation
Uses curriculum learning to start from easier states and ramp toward full difficulty
Uses expert imitation pretraining to bootstrap the policy before PPO fine-tuning
Saves checkpoints with both model weights and normalization statistics

Why This Project Exists

This project started as a reinforcement learning experiment and turned into a debugging exercise around:

reward hacking
PPO training stability
observation normalization consistency
expert warm-start training

The current codebase fixes the main learning bugs and demonstrates real landing behavior, but PPO fine-tuning is still somewhat unstable across long runs. Best checkpoint performance is currently more meaningful than final checkpoint performance.

Current Status

Environment works
Expert controller lands reliably
PPO learns non-trivial landing behavior
Training is still being tuned for long-run stability

This is a strong portfolio project for custom environment design, RL debugging, and training-system iteration.

Tech Stack

Python
PyTorch
NumPy
Matplotlib

Project Structure

train.py: environment, PPO agent, expert controller, training loop, checkpointing, evaluation, and plotting

Running Locally

Create a Python environment.
Install dependencies:

pip install -r requirements.txt

Run training:

python train.py

Optional checkpoint output directory:

export ROCKET_RL_SAVE_DIR=./rocket_rl_runs
python train.py

Running In Google Colab

Open a new Colab notebook.
Set runtime type to GPU.
Upload train.py.
Install dependencies if needed:

!pip install torch matplotlib numpy

Run:

!python /content/train.py

To persist checkpoints to Google Drive:

from google.colab import drive
drive.mount('/content/drive')

import os
os.environ["ROCKET_RL_SAVE_DIR"] = "/content/drive/MyDrive/rocket_rl_runs"
!python /content/train.py

Known Limitations

PPO fine-tuning can regress after early gains
Best saved checkpoint is often stronger than the final model
Results can vary by run and hyperparameter choice

Portfolio Summary

Highlights:

Designed a custom continuous-control RL environment from scratch
Diagnosed and fixed reward misalignment and PPO data-consistency bugs
Added expert imitation pretraining and curriculum learning
Built checkpoint save/load flow that preserves normalization state for correct evaluation

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rocket Lander PPO

What This Project Does

Why This Project Exists

Current Status

Tech Stack

Project Structure

Running Locally

Running In Google Colab

Known Limitations

Portfolio Summary

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Rocket Lander PPO

What This Project Does

Why This Project Exists

Current Status

Tech Stack

Project Structure

Running Locally

Running In Google Colab

Known Limitations

Portfolio Summary

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages