Reinforcement Learning for Rocket Trajectory Optimization
Studied the REINFORCE policy-gradient algorithm on the Lunar Lander environment to better understand policy-gradient methods for continuous control. Implemented REINFORCE with continuous actions in PyTorch and tuned its hyperparameters to see which ones matter most for performance.
← back to terminal