Reinforcement Learning for Rocket Trajectory Optimization

Studied the REINFORCE policy-gradient algorithm on the Lunar Lander environment to better understand policy-gradient methods for continuous control. Implemented REINFORCE with continuous actions in PyTorch and tuned its hyperparameters to see which ones matter most for performance.

← back to terminal