# deep_rl
**Repository Path**: sheldonzhou/deep_rl
## Basic Information
- **Project Name**: deep_rl
- **Description**: PyTorch implementations of Deep Reinforcement Learning algorithms (DQN, DDQN, A2C, VPG, TRPO, PPO, DDPG, TD3, SAC, ASAC, TAC, ATAC)
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 2
- **Forks**: 0
- **Created**: 2020-05-20
- **Last Updated**: 2021-10-13
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# Deep Reinforcement Learning (DRL) Algorithms with PyTorch
This repository contains PyTorch implementations of deep reinforcement learning algorithms. This implementation uses PyTorch. For a TensorFlow implementation of algorithms, take a look at [tsallis_actor_critic_mujoco](https://github.com/rllab-snu/tsallis_actor_critic_mujoco).
## Algorithms Implemented
1. Deep Q-Network (DQN) ([V. Mnih et al. 2015](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf))
2. Double DQN (DDQN) ([H. Van Hasselt et al. 2015](https://arxiv.org/abs/1509.06461))
3. Advantage Actor Critic (A2C)
4. Vanilla Policy Gradient (VPG)
5. Natural Policy Gradient (NPG) ([S. Kakade et al. 2002](http://papers.nips.cc/paper/2073-a-natural-policy-gradient.pdf))
6. Trust Region Policy Optimization (TRPO) ([J. Schulman et al. 2015](https://arxiv.org/abs/1502.05477))
7. Proximal Policy Optimization (PPO) ([J. Schulman et al. 2017](https://arxiv.org/abs/1707.06347))
8. Deep Deterministic Policy Gradient (DDPG) ([T. Lillicrap et al. 2015](https://arxiv.org/abs/1509.02971))
9. Twin Delayed DDPG (TD3) ([S. Fujimoto et al. 2018](https://arxiv.org/abs/1802.09477))
10. Soft Actor-Critic (SAC) ([T. Haarnoja et al. 2018](https://arxiv.org/abs/1801.01290))
11. Automating entropy adjustment on SAC (ASAC) ([T. Haarnoja et al. 2018](https://arxiv.org/abs/1812.05905))
12. Tsallis Actor-Critic (TAC) ([K. Lee et al. 2019](https://arxiv.org/abs/1902.00137))
13. Automating entropy adjustment on TAC (ATAC)
## Environments Implemented
1. CartPole-v1 (as described in [here](https://gym.openai.com/envs/CartPole-v1/))
2. Pendulum-v0 (as described in [here](https://gym.openai.com/envs/Pendulum-v0/))
3. MuJoCo environments (HalfCheetah-v2, Ant-v2, Humanoid-v2, etc.) (as described in [here](https://gym.openai.com/envs/#mujoco))
## Results
### CartPole-v1
- Observation space: 4
- Action space: 2
### Pendulum-v0
- Observation space: 3
- Action space: 1
### HalfCheetah-v2
- Observation space: 17
- Action space: 6
### Ant-v2
- Observation space: 111
- Action space: 8
### Humanoid-v2
- Observation space: 376
- Action space: 17
## Requirements
- [PyTorch 1.2.0](https://pytorch.org/get-started/previous-versions/)
- [TensorBoard](https://pytorch.org/docs/stable/tensorboard.html)
- [gym](https://github.com/openai/gym)
- [mujoco-py](https://github.com/openai/mujoco-py)
## Usage
The repository's high-level structure is:
├── agents
└── common
├── results
├── data
└── graphs
├── tests
└── save_model
### 1) To train the agents on the environments
To train all the different agents on MuJoCo environments, follow these steps:
```commandline
git clone https://github.com/dongminlee94/deep_rl.git
cd deep_rl
python run_mujoco.py
```
For other environments, change the last line to `run_cartpole.py`, `run_pendulum.py`.
If you want to change configurations of the agents, follow this step:
```commandline
python run_mujoco.py \
--env=Humanoid-v2 \
--algo=atac \
--seed=0 \
--iterations=200 \
--steps_per_iter=5000 \
--max_step=1000
```
### 2) To watch the learned agents on the above environments
To watch all the learned agents on MuJoCo environments, follow these steps:
```commandline
cd tests
python mujoco_test.py --load=envname_algoname_...
```
You should copy the saved model name in `tests/save_model/envname_algoname_...` and paste the copied name in `envname_algoname_...`. So the saved model will be load.