# mindrl
**Repository Path**: digitdance/mindrl
## Basic Information
- **Project Name**: mindrl
- **Description**: MindSpore Reinforcement是一个开源的强化学习框架,支持使用强化学习算法对agent进行分布式训练。
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 3
- **Created**: 2025-07-11
- **Last Updated**: 2025-07-11
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# MindSpore Reinforcement
[查看中文](./README_CN.md)
[](https://pypi.org/project/mindspore-rl/) [](https://github.com/mindspore-ai/reinforcement/blob/master/LICENSE) [](https://github.com/mindspore-lab/mindrl/pulls)
- [MindSpore Reinforcement](#mindspore-reinforcement)
- [Overview](#overview)
- [Installation](#installation)
- [Version dependency](#version-dependency)
- [Installing from pip command](#installing-from-pip-command)
- [Installing from source code](#installing-from-source-code)
- [Verification](#verification)
- [Quick Start](#quick-start)
- [Features](#features)
- [Algorithm](#algorithm)
- [Environment](#environment)
- [ReplayBuffer](#replaybuffer)
- [Future Roadmap](#future-roadmap)
- [Community](#community)
- [Governance](#governance)
- [Communication](#communication)
- [Contributions](#contributions)
- [License](#license)
## Overview
MindSpore Reinforcement is an open-source reinforcement learning framework that supports the **distributed training** of agents using reinforcement learning algorithms. MindSpore Reinforcement offers a **clean API abstraction** for writing reinforcement learning algorithms, which decouples the algorithm from deployment and execution considerations, including the use of accelerators, the level of parallelism and the distribution of computation across a cluster of workers. MindSpore Reinforcement translates the reinforcement learning algorithm into a series of compiled **computational graphs**, which are then run efficiently by the MindSpore framework on CPUs, GPUs and Ascend AI processors. Its architecture is shown below:

## Installation
MindSpore Reinforcement depends on the MindSpore training and inference framework. Therefore, please first install [MindSpore](https://www.mindspore.cn/install/en) following the instruction on the official website, then install MindSpore Reinforcement. You can install from `pip` or source code.
### Version dependency
Due the dependency between MindSpore Reinforcement and MindSpore, please follow the table below and install the corresponding MindSpore verision from [MindSpore download page](https://www.mindspore.cn/versions/en).
```shell
pip install https://ms-release.obs.cn-north-4.myhuaweicloud.com/{MindSpore-Version}/MindSpore/cpu/ubuntu_x86/mindspore-{MindSpore-Version}-cp37-cp37m-linux_x86_64.whl
```
| MindSpore Reinforcement Version | Branch | MindSpore version |
| :-----------------------------: | :----------------------------------------------------------: | :---------------: |
| 0.7.0 | [r0.7](https://github.com/mindspore-lab/mindrl/tree/r0.7/) | 2.1.0 |
| 0.6.0 | [r0.6](https://github.com/mindspore-lab/mindrl/tree/r0.6/) | 2.0.0 |
| 0.5.0 | [r0.5](https://gitee.com/mindspore/reinforcement/tree/r0.5/) | 1.8.0 |
| 0.3.0 | [r0.3](https://gitee.com/mindspore/reinforcement/tree/r0.3/) | 1.7.0 |
| 0.2.0 | [r0.2](https://gitee.com/mindspore/reinforcement/tree/r0.2/) | 1.6.0 |
| 0.1.0 | [r0.1](https://gitee.com/mindspore/reinforcement/tree/r0.1/) | 1.5.0 |
### Installing from pip command
If you use the pip command, please download the whl package from [MindSpore Reinforcement](https://www.mindspore.cn/versions/en) page and install it.
```shell
pip install https://ms-release.obs.cn-north-4.myhuaweicloud.com/{MindSpore_version}/Reinforcement/any/mindspore_rl-{Reinforcement_version}-py3-none-any.whl --trusted-host ms-release.obs.cn-north-4.myhuaweicloud.com -i https://pypi.tuna.tsinghua.edu.cn/simple
```
> - Installing whl package will download MindSpore Reinforcement dependencies automatically (detail of dependencies is shown in requirement.txt), other dependencies should install manually.
> - `{MindSpore_version}` stands for the version of MindSpore. For the version matching relationship between MindSpore and Reinforcement, please refer to [page](https://www.mindspore.cn/versions).
> - `{Reinforcement_version}` stands for the version of Reinforcement. For example, if you would like to download version 0.1.0, you should fill 1.5.0 in `{MindSpore_version}` and fill 0.1.0 in `{Reinforcement_version}`.
### Installing from source code
Download [source code](https://github.com/mindspore-lab/mindrl), then enter the `mindrl` directory.
```shell
git clone https://github.com/mindspore-lab/mindrl.git
cd mindrl/
bash build.sh
pip install output/mindspore_rl-{Reinforcement_version}-py3-none_{ARCH}.whl
```
`build.sh` is the compiling script in `mindrl` directory. `Reinforcement_version` is the version of MindSpore Reinforcement, `ARCH` is the platform of your device, such as `x86_64` or `aarch64`.
Install dependencies
```shell
cd mindrl && pip install requirements.txt
```
### Verification
If you can successfully execute following command, then the installation is completed.
```python
import mindspore_rl
```
## Quick Start
The algorithm example of mindcore reinforcement is located under `reinforcement/example/`. A simple algorithm [Deep Q-Learning (DQN)](https://www.mindspore.cn/reinforcement/docs/zh-CN/master/dqn.html) is used to demonstrate how to use MindSpore Reinforcement.
The first way is using script files to run it directly:
```shell
cd reinforcement/example/dqn/scripts
bash run_standalone_train.sh
```
The second way is to use `config.py` and `train.py` to modify the configuration more flexibly:
```shell
cd reinforcement/example/dqn
python train.py --episode 1000 --device_target GPU
```
The first way will generate the logfile `dqn_train_log.txt` in the current directory. The second way prints log information on the screen:
```shell
Episode 0: loss is 0.396, rewards is 42.0
Episode 1: loss is 0.226, rewards is 15.0
Episode 2: loss is 0.202, rewards is 9.0
Episode 3: loss is 0.122, rewards is 15.0
Episode 4: loss is 0.107, rewards is 12.0
Episode 5: loss is 0.078, rewards is 10.0
Episode 6: loss is 0.075, rewards is 8.0
Episode 7: loss is 0.084, rewards is 12.0
Episode 8: loss is 0.069, rewards is 10.0
Episode 9: loss is 0.067, rewards is 10.0
Episode 10: loss is 0.056, rewards is 8.0
-----------------------------------------
Evaluate for episode 10 total rewards is 9.600
-----------------------------------------
```
For more details about the installation guide, tutorials, and APIs, see [MindSpore Reinforcement API Docs](https://www.mindspore.cn/reinforcement/docs/en/master/index.html).
## Features
### Algorithm
### Environment
In the field of reinforcement learning, during the interaction between the agent and the environment, the learning strategy maximizes the numerical benefit signal. As a problem to be solved, environment is an important element in reinforcement learning. The currently supported environments are shown in the table below:
### ReplayBuffer
In reinforcement learning, ReplayBuffer is a commonly used basic data storage method. It is used to store the data obtained by the interaction between the agent and the environment. ReplayBuffer can solve the following problems:
1. The stored historical experience data can be extracted by sampling or certain priority to break the correlation of the training data and make the sampled data have the characteristics of independent and identical distribution.
2. It can provide temporary storage of data and improve the utilization rate of data.
In general, researchers use native Python data structures or numpy data structures to construct ReplayBuffer, or the general reinforcement learning framework also provides standard API encapsulation. The difference is that MindSpore implements the ReplayBuffer structure on the device. On the one hand, it can reduce the frequent copying of data between the host and the device when using GPU/Ascend hardware. On the other hand, it can express the ReplayBuffer in the form of MindSpore operators, which can build a complete IR graph and enable MindSpore GRAPH_MODE optimization to improve the overall performance.
| Type |
Features |
Device |
| CPU | GPU | Ascend |
| UniformReplayBuffer |
1 FIFO, fist in fist out. 2 Support batch input. |
✔️ |
✔️ |
/ |
| PriorityReplayBuffer |
1 Proportional-based priority strategy. 2 Using Sum Tree to improve sample performance. |
✔️ |
✔️ |
✔️ |
| ReservoirReplayBuffer |
keeps an 'unbiased' sample of previous iterations. |
✔️ |
✔️ |
✔️ |
### Distribution
We describe MindSpore Reinforcement Learning (MSRL), a distributed RL training system that supports distribution
policies that govern how RL training computation is parallelised and distributed on cluster resources, without requiring changes to the algorithm implementation. MSRL introduces the new abstraction of a fragmented dataflow graph, which maps Python functions from an RL algorithm’s training loop to parallel computational fragments. Fragments are executed on different devices by translating them to low-level dataflow representations, e.g. computational graphs as supported by deep learning engines, CUDA implementations or multi-threaded CPU processe. Refer to the [detail](https://github.com/mindspore-lab/mindrl/tree/master/mindspore_rl/distribution/README.md).
By now we have supported such distribution policies:
| Policy Type |
Policy |
Example |
| MultiActorSingleLearnerDP |
structure of single leaner with multi actors |
ppo |
| AsyncMultiActorSingleLearnerDP |
async structure of single leaner with multi actors |
a3c |
| SingleActorLearnerWithMultEnvDP |
structure of single actor leaner with multi envs |
ppo |
| SingleActorLearnerWithMultEnvHeterDP |
structure of single actor leaner with multi heterogeneous envs |
ppo |
MultiActorSingleLearnerDP
|
AsyncMultiActorSingleLearnerDP
|
SingleActorLearnerWithMultEnvDPSingleActorLearnerWithMultEnvHeterDP
|
## Future Roadmap
This initial release of MindSpore Reinforcement contains a stable API for implementing reinforcement learning algorithms and executing computation using MindSpore's computational graphs. Now it supports automatic distributed execution of algorithms, multi-agent, offline-rl, mcts and so on. Features of optimized automatic distributed execution and LLMs will be included in the subsequent version of MindSpore Reinforcement. Please look forward to it.
## Community
### Governance
[MindSpore Open Governance](https://gitee.com/mindspore/community/blob/master/governance.md)
### Communication
- [MindSpore Slack](https://join.slack.com/t/mindspore/shared_invite/zt-dgk65rli-3ex4xvS4wHX7UDmsQmfu8w) developer communication platform
- [MindSpore Forum](https://bbs.huaweicloud.com/forum/forum-1076-1.html) Welcome to post.
- [Reinforcement issues](https://github.com/mindspore-lab/mindrl/issues) Welcome to submit issues.
## Contributions
Welcome to MindSpore contribution.
MindSpore Reinforcement will be updated every 3 months. If you encounter any problems, please inform us in time. We appreciate all contributions and can submit your questions or modifications in the form of issues or prs.
## License
[Apache License 2.0](https://gitee.com/mindspore/reinforcement/blob/master/LICENSE)