Agent Implementations RL

Here I am implementing the agents as I read through them in the Sutton and Bartow book: Reinforcement Learning An Introduction 2nd edition. Unless stated otherwise these are trained to find q* using ε-greedy policy then evaluated using a greedy policy on the q* obtained through training.

Agents Implemented:

Chapter 5 - Monte Carlo Methods

On-policy first-visit Monte Carlo
Off-policy Monte Carlo

Chapter 6 - Temporal-Difference Learning

SARSA(0)
Q-Learning
Expected SARSA
Double Q-Learning

Chapter 7 - n-step Bootstrapping

n-step SARSA
Off-Policy n-step SARSA
n-step Tree Backup

Chapter 8 - Planning and Learning with Tabular Methods

Tabular Dyna-Q

Chapter 10 - On-policy Control with Approximation

Episodic Semi-Gradient Sarsa

Chapter 12 - Eligibility Traces

SARSA(λ)

Testing:

I am testing each of these using different gymnasium environments.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.gitignore		.gitignore
README.md		README.md
acrobot.py		acrobot.py
agent_helpers.py		agent_helpers.py
blackjack.py		blackjack.py
cliff_walking.py		cliff_walking.py
dyna_q.py		dyna_q.py
frozen_lake.py		frozen_lake.py
monte_carlo.py		monte_carlo.py
mountain_car.py		mountain_car.py
semi_gradient_sarsa.py		semi_gradient_sarsa.py
taxi.py		taxi.py
td_learning.py		td_learning.py
test_all.py		test_all.py
tiles3.py		tiles3.py
visualizer.py		visualizer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agent Implementations RL

Agents Implemented:

Chapter 5 - Monte Carlo Methods

Chapter 6 - Temporal-Difference Learning

Chapter 7 - n-step Bootstrapping

Chapter 8 - Planning and Learning with Tabular Methods

Chapter 10 - On-policy Control with Approximation

Chapter 12 - Eligibility Traces

Testing:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Agent Implementations RL

Agents Implemented:

Chapter 5 - Monte Carlo Methods

Chapter 6 - Temporal-Difference Learning

Chapter 7 - n-step Bootstrapping

Chapter 8 - Planning and Learning with Tabular Methods

Chapter 10 - On-policy Control with Approximation

Chapter 12 - Eligibility Traces

Testing:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages