# DRL-Pytorch **Repository Path**: bcledger_admin/DRL-Pytorch ## Basic Information - **Project Name**: DRL-Pytorch - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-06-27 - **Last Updated**: 2025-06-27 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

Clean, Robust, and Unified PyTorch implementation of popular DRL Algorithms

## 0.Star History

## 1.Dependencies This repository uses the following python dependencies unless explicitly stated: ```python gymnasium==0.29.1 numpy==1.26.1 pytorch==2.1.0 python==3.11.5 ```
## 2.How to use my code Enter the folder of the algorithm that you want to use, and run the **main.py** to train from scratch: ```bash python main.py ``` For more details, please check the **README.md** file in the corresponding algorithm folder.
## 3. Separate links of the code + [1.Q-learning](https://github.com/XinJingHao/Q-learning) + [2.1Duel Double DQN](https://github.com/XinJingHao/Duel-Double-DQN-Pytorch) + [2.2Noisy Duel DDQN on Atari Game](https://github.com/XinJingHao/Noisy-Duel-DDQN-Atari-Pytorch) + [2.3Prioritized Experience Replay(PER) DQN/DDQN](https://github.com/XinJingHao/Prioritized-DQN-DDQN-Pytorch) + [2.4Categorical DQN (C51)](https://github.com/XinJingHao/C51-Categorical-DQN-Pytorch) + [2.5NoisyNet DQN](https://github.com/XinJingHao/NoisyNet-DQN-Pytorch) + [3.1Proximal Policy Optimization(PPO) for Discrete Action Space](https://github.com/XinJingHao/PPO-Discrete-Pytorch) + [3.2Proximal Policy Optimization(PPO) for Continuous Action Space](https://github.com/XinJingHao/PPO-Continuous-Pytorch) + [4.1Deep Deternimistic Policy Gradient(DDPG)](https://github.com/XinJingHao/DDPG-Pytorch) + [4.2Twin Delayed Deep Deterministic Policy Gradient(TD3)](https://github.com/XinJingHao/TD3-Pytorch) + [5.1Soft Actor Critic(SAC) for Discrete Action Space](https://github.com/XinJingHao/SAC-Discrete-Pytorch) + [5.2Soft Actor Critic(SAC) for Continuous Action Space](https://github.com/XinJingHao/SAC-Continuous-Pytorch) + [6.Actor-Sharer-Learner(ASL)](https://github.com/XinJingHao/Actor-Sharer-Learner)
## 4. Recommended Resources for DRL ### 4.1 Simulation Environments: + [gym](https://www.gymlibrary.dev/) and [gymnasium](https://gymnasium.farama.org/) (Lightweight & Standard Env for DRL; Easy to start; Slow):

+ [Isaac Sim](https://developer.nvidia.com/isaac/sim#isaac-lab) (NVIDIA’s physics simulation environment; GPU accelerated; Superfast):

+ [Sparrow](https://github.com/XinJingHao/Sparrow-V2) (Light Weight Simulator for Mobile Robot; DRL friendly):

+ [ROS](https://www.ros.org/) (Popular & Comprehensive physical simulator for robots; Heavy and Slow):

+ [Webots](https://cyberbotics.com/) (Popular physical simulator for robots; Faster than ROS; Less realistic):

+ [Envpool](https://envpool.readthedocs.io/en/latest/index.html) (Fast Vectorized Env) + [Other Popular Envs](https://github.com/clvrai/awesome-rl-envs) ### 4.2 Books： + [《Reinforcement learning: An introduction》](https://books.google.com.sg/books?hl=zh-CN&lr=&id=uWV0DwAAQBAJ&oi=fnd&pg=PR7&dq=Reinforcement+Learning&ots=mivIu01Xp6&sig=zQ6jkZRxJop4fkAgScMgzULGlbY&redir_esc=y#v=onepage&q&f=false)--Richard S. Sutton + 《深度学习入门：基于Python的理论与实现》--斋藤康毅 ### 4.3 Online Courses: + [RL Courses(bilibili)](https://www.bilibili.com/video/BV1UE411G78S?p=1&vd_source=df4b7370976f5ca5034cc18488eec368)--李宏毅(Hongyi Li) + [RL Courses(Youtube)](https://www.youtube.com/watch?v=z95ZYgPgXOY&list=PLJV_el3uVTsODxQFgzMzPLa16h6B8kWM_)--李宏毅(Hongyi Li) + [UCL Course on RL](https://www.davidsilver.uk/teaching/)--David Silver + [动手强化学习](https://hrl.boyuai.com/chapter/1/%E5%88%9D%E6%8E%A2%E5%BC%BA%E5%8C%96%E5%AD%A6%E4%B9%A0)--上海交通大学 + [DRL Courses](https://github.com/wangshusen/DRL)--Shusen Wang ### 4.4 Blogs: + [OpenAI Spinning Up](https://spinningup.openai.com/en/latest/) + [Policy Gradient Theorem --Cangxi](https://zhuanlan.zhihu.com/p/491647161) + [Policy Gradient Algorithms --Lilian](https://lilianweng.github.io/posts/2018-04-08-policy-gradient/) + [Theorem of PPO](https://zhuanlan.zhihu.com/p/563166533) + [The 37 Implementation Details of Proximal Policy Optimization](https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/) + [Prioritized Experience Replay](https://zhuanlan.zhihu.com/p/631171588) + [Soft Actor Critic](https://zhuanlan.zhihu.com/p/566722896) + [A (Long) Peek into Reinforcement Learning --Lilian](https://lilianweng.github.io/posts/2018-02-19-rl-overview/) + [Introduction to TD3](https://zhuanlan.zhihu.com/p/409536699)
## 5. Important Papers DQN: [Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning[J]. nature, 2015, 518(7540): 529-533.](https://www.nature.com/articles/nature14236/?source=post_page) Double DQN: [Van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double q-learning[C]//Proceedings of the AAAI conference on artificial intelligence. 2016, 30(1).](https://ojs.aaai.org/index.php/AAAI/article/view/10295) Duel DQN: [Wang, Ziyu, et al. "Dueling network architectures for deep reinforcement learning." International conference on machine learning. PMLR, 2016.](https://proceedings.mlr.press/v48/wangf16.pdf) PER: [Schaul T, Quan J, Antonoglou I, et al. Prioritized experience replay[J]. arXiv preprint arXiv:1511.05952, 2015.](https://arxiv.org/abs/1511.05952) C51: [Bellemare M G, Dabney W, Munos R. A distributional perspective on reinforcement learning[C]//International conference on machine learning. PMLR, 2017: 449-458.](https://proceedings.mlr.press/v70/bellemare17a/bellemare17a.pdf) NoisyNet DQN: [Fortunato M, Azar M G, Piot B, et al. Noisy networks for exploration[J]. arXiv preprint arXiv:1706.10295, 2017.](https://arxiv.org/abs/1706.10295) PPO: [Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms[J]. arXiv preprint arXiv:1707.06347, 2017.](https://arxiv.org/pdf/1707.06347.pdf) DDPG: [Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning[J]. arXiv preprint arXiv:1509.02971, 2015.](https://arxiv.org/abs/1509.02971) TD3: [Fujimoto S, Hoof H, Meger D. Addressing function approximation error in actor-critic methods[C]//International conference on machine learning. PMLR, 2018: 1587-1596.](https://proceedings.mlr.press/v80/fujimoto18a.html) SAC: [Haarnoja T, Zhou A, Abbeel P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//International conference on machine learning. PMLR, 2018: 1861-1870.](https://proceedings.mlr.press/v80/haarnoja18b) ASL: [Train a Real-world Local Path Planner in One Hour via Partially Decoupled Reinforcement Learning and Vectorized Diversity](https://arxiv.org/abs/2305.04180) ColorDynamic: [Generalizable, Scalable, Real-time, End-to-end Local Planner for Unstructured and Dynamic Environments](https://arxiv.org/abs/2502.19892)
## 6. Citation ```bash @misc{DRL-Pytorch, author = {Jinghao Xin}, title = {DRL-Pytorch}, year = {2022}, publisher = {GitHub}, journal = {GitHub Repository}, howpublished = {\url{https://github.com/XinJingHao/DRL-Pytorch}}, } ```
## 7. Training Curves of my Code: ### [Q-learning:](https://github.com/XinJingHao/Q-learning)

### [Duel Double DQN:](https://github.com/XinJingHao/Duel-Double-DQN-Pytorch) | CartPole | LunarLander | | :----------------------------------------------------------: | :----------------------------------------------------------: |

### [Noisy Duel DDQN on Atari Game:](https://github.com/XinJingHao/Noisy-Duel-DDQN-Atari-Pytorch) Pong| Enduro :-----------------------:|:-----------------------:|

### [Prioritized DQN/DDQN:](https://github.com/XinJingHao/Prioritized-DQN-DDQN-Pytorch) | CartPole | LunarLander | | :----------------------------------------------------------: | :----------------------------------------------------------: | |

|
### [Categorical DQN:](https://github.com/XinJingHao/C51-Categorical-DQN-Pytorch) | CartPole | LunarLander | | :----------------------------------------------------------: | :----------------------------------------------------------: | |

|
### [NoisyNet DQN:](https://github.com/XinJingHao/C51-Categorical-DQN-Pytorch) | CartPole | LunarLander | | :----------------------------------------------------------: | :----------------------------------------------------------: | |

|
### [PPO Discrete:](https://github.com/XinJingHao/PPO-Discrete-Pytorch)

### [PPO Continuous:](https://github.com/XinJingHao/PPO-Continuous-Pytorch)

### [DDPG:](https://github.com/XinJingHao/DDPG-Pytorch) Pendulum| LunarLanderContinuous :-----------------------:|:-----------------------:|

### [TD3:](https://github.com/XinJingHao/TD3-Pytorch)

### [SAC Continuous:](https://github.com/XinJingHao/SAC-Continuous-Pytorch)

### [SAC Discrete:](https://github.com/XinJingHao/SAC-Discrete-Pytorch)

### [Actor-Sharer-Learner (ASL):](https://github.com/XinJingHao/Actor-Sharer-Learner)