This project teaches you the basics of Reinforcement Learning (RL) using Blackjack as an example game! It's totally beginner-friendly, and every file and piece of code comes with thorough comments and explanations.
We implement a simple Q-Learning agent that learns how to play Blackjack by playing thousands of games against the house. We provide our own step-by-step Blackjack environment, a thoroughly commented agent, and a visible, easy-to-understand training loop!
blackjack_env.py # The Blackjack game environment (like OpenAI Gym's)
qlearning_agent.py # The RL agent that learns how to play
train_blackjack.py # Script to train the agent (extremely verbose)
README.md # This file!
-
blackjack_env.py:
- Implements the Blackjack game logic from scratch.
- Compatible with RL code like OpenAI Gym environments.
- Super commented for learning.
-
qlearning_agent.py:
- Contains a tabular Q-Learning RL agent, the simplest RL method.
- All the key RL learning math is implemented and explained in comments.
-
train_blackjack.py:
- Main script to train the agent!
- Uses a progress bar and prints stats so you can see learning as it happens.
- Every line has a comment and every step is broken down to help you learn RL (and practice Python).
Requirements: Python 3.7+ (v13+ ok), and the packages listed below.
-
Create your environment and install dependencies (if not done):
python3 -m venv venv source venv/bin/activate pip install numpy tqdm -
Train the agent by running:
python train_blackjack.py
-
(Optional) Tweak the files and parameters to see how the agent and learning process changes. Try smaller or larger episode counts!
- The BlackjackEnv lets an agent play against a simulated dealer.
- The QLearningAgent starts with no idea how to play, and slowly improves using Q-learning:
- It explores different actions randomly (epsilon-greedy)
- It builds a Q-table of state-action values over time
- Using these values, it learns which actions lead to winning in each state
- The training script prints updates as the agent gets better!
- You can explore the Q-table after training, or add code for an evaluation loop.
- Display the learned Q-table or visualize policy!
- Try different hyperparameters (alpha, gamma, epsilon)
- Add learning curve plots
- Implement "double down" or "split" actions to make things trickier
- Rewrite the environment with OpenAI Gym's interface
- Try deep Q-learning (DQN) with neural networks for fun!
Feel free to experiment and modify anything! Learning happens best by doing :)