# deep_rl **Repository Path**: sheldonzhou/deep_rl ## Basic Information - **Project Name**: deep_rl - **Description**: PyTorch implementations of Deep Reinforcement Learning algorithms (DQN, DDQN, A2C, VPG, TRPO, PPO, DDPG, TD3, SAC, ASAC, TAC, ATAC) - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 2 - **Forks**: 0 - **Created**: 2020-05-20 - **Last Updated**: 2021-10-13 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Deep Reinforcement Learning (DRL) Algorithms with PyTorch This repository contains PyTorch implementations of deep reinforcement learning algorithms. This implementation uses PyTorch. For a TensorFlow implementation of algorithms, take a look at [tsallis_actor_critic_mujoco](https://github.com/rllab-snu/tsallis_actor_critic_mujoco). ## Algorithms Implemented 1. Deep Q-Network (DQN) _{^{([V. Mnih et al. 2015](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf))}} 2. Double DQN (DDQN) _{^{([H. Van Hasselt et al. 2015](https://arxiv.org/abs/1509.06461))}} 3. Advantage Actor Critic (A2C) 4. Vanilla Policy Gradient (VPG) 5. Natural Policy Gradient (NPG) _{^{([S. Kakade et al. 2002](http://papers.nips.cc/paper/2073-a-natural-policy-gradient.pdf))}} 6. Trust Region Policy Optimization (TRPO) _{^{([J. Schulman et al. 2015](https://arxiv.org/abs/1502.05477))}} 7. Proximal Policy Optimization (PPO) _{^{([J. Schulman et al. 2017](https://arxiv.org/abs/1707.06347))}} 8. Deep Deterministic Policy Gradient (DDPG) _{^{([T. Lillicrap et al. 2015](https://arxiv.org/abs/1509.02971))}} 9. Twin Delayed DDPG (TD3) _{^{([S. Fujimoto et al. 2018](https://arxiv.org/abs/1802.09477))}} 10. Soft Actor-Critic (SAC) _{^{([T. Haarnoja et al. 2018](https://arxiv.org/abs/1801.01290))}} 11. Automating entropy adjustment on SAC (ASAC) _{^{([T. Haarnoja et al. 2018](https://arxiv.org/abs/1812.05905))}} 12. Tsallis Actor-Critic (TAC) _{^{([K. Lee et al. 2019](https://arxiv.org/abs/1902.00137))}} 13. Automating entropy adjustment on TAC (ATAC) ## Environments Implemented 1. CartPole-v1 _{^{(as described in [here](https://gym.openai.com/envs/CartPole-v1/))}} 2. Pendulum-v0 _{^{(as described in [here](https://gym.openai.com/envs/Pendulum-v0/))}} 3. MuJoCo environments (HalfCheetah-v2, Ant-v2, Humanoid-v2, etc.) _{^{(as described in [here](https://gym.openai.com/envs/#mujoco))}} ## Results ### CartPole-v1 - Observation space: 4 - Action space: 2

### Pendulum-v0 - Observation space: 3 - Action space: 1

### HalfCheetah-v2 - Observation space: 17 - Action space: 6

### Ant-v2 - Observation space: 111 - Action space: 8

### Humanoid-v2 - Observation space: 376 - Action space: 17

## Requirements - [PyTorch 1.2.0](https://pytorch.org/get-started/previous-versions/) - [TensorBoard](https://pytorch.org/docs/stable/tensorboard.html) - [gym](https://github.com/openai/gym) - [mujoco-py](https://github.com/openai/mujoco-py) ## Usage The repository's high-level structure is: ├── agents └── common ├── results ├── data └── graphs ├── tests └── save_model ### 1) To train the agents on the environments To train all the different agents on MuJoCo environments, follow these steps: ```commandline git clone https://github.com/dongminlee94/deep_rl.git cd deep_rl python run_mujoco.py ``` For other environments, change the last line to `run_cartpole.py`, `run_pendulum.py`. If you want to change configurations of the agents, follow this step: ```commandline python run_mujoco.py \ --env=Humanoid-v2 \ --algo=atac \ --seed=0 \ --iterations=200 \ --steps_per_iter=5000 \ --max_step=1000 ``` ### 2) To watch the learned agents on the above environments To watch all the learned agents on MuJoCo environments, follow these steps: ```commandline cd tests python mujoco_test.py --load=envname_algoname_... ``` You should copy the saved model name in `tests/save_model/envname_algoname_...` and paste the copied name in `envname_algoname_...`. So the saved model will be load.