A research-oriented Snake reinforcement-learning environment built with PyTorch.
This demo shows a trained agent exhibiting stable navigation and late-game risk avoidance.
This project implements a reinforcement-learning Snake agent designed to study learning behavior, reward shaping, and control dynamics under delayed consequences. The agent learns purely from numerical rewards and penalties, without scripted rules or hard-coded strategies.
The repository is intended as an experimental sandbox for observing emergent behavior and failure modes in reinforcement learning.
- A reinforcement-learning Snake environment
- A testbed for studying:
- delayed reward effects
- policy collapse
- reward exploitation (looping, stalling)
- survival vs. reward tradeoffs
- A compact environment where small reward changes produce large behavioral shifts
❌ A scripted or rule-based Snake bot
❌ A shortest-path solver
❌ A benchmark or “perfect” Snake agent
The agent does not know explicit rules like “avoid walls.”
It updates its policy solely through gradient-based learning from outcomes.
- Observe environment state
- Predict action values
- Select an action (with exploration)
- Receive reward or penalty
- Compute prediction error
- Update network weights
This process allows the agent to:
- learn from failure
- propagate consequences backward
- trade short-term reward for long-term survival
- exploit reward structures when misaligned
Snake is well-suited for reinforcement learning research because it combines:
- simple mechanics
- a large state space
- delayed consequences
- clear and observable failure modes
This makes it an effective environment for studying learning dynamics and control behavior.
- Learning behavior emerges from reward structure
- Optimization alone does not guarantee stability
- Control mechanisms influence long-term performance
- Misaligned incentives produce predictable failure patterns
python main.py