diff --git a/docs/golden_stick/docs/source_en/index.rst b/docs/golden_stick/docs/source_en/index.rst index 702e87ecb37a177ced622ccbff41c0eef5245ed0..6dcb2c36b3f7520306b5a3819451022c64af6c46 100644 --- a/docs/golden_stick/docs/source_en/index.rst +++ b/docs/golden_stick/docs/source_en/index.rst @@ -28,9 +28,9 @@ In addition to providing rich model compression algorithms, an important design There are many types of model compression algorithms, such as quantization-aware training algorithms, pruning algorithms, matrix decomposition algorithms, knowledge distillation algorithms, etc. In each type of compression algorithm, there are also various specific algorithms, such as LSQ and PACT, which are both quantization-aware training algorithms. Different algorithms are often applied in different ways, which increases the learning cost for users to apply algorithms. MindSpore Golden Stick sorts out and abstracts the algorithm application process, and provides a set of unified algorithm application interfaces to minimize the learning cost of algorithm application. At the same time, this also facilitates the exploration of advanced technologies such as AMC (automatic model compression technology), NAS (network structure search), and HAQ (hardware-aware automatic quantization) based on the algorithm ecology. -2. Provide front-end network modification capabilities to reduce algorithm development costs: +2. Provide front-end network modification capabilities to reduce algorithm development costs: - Model compression algorithms are often designed or optimized for specific network structures. For example, perceptual quantization algorithms often insert fake-quantization nodes on the Conv2d, Conv2d + BatchNorm2d, or Conv2d + BatchNorm2d + Relu structures in the network. MindSpore Golden Stick provides the ability to modify the front-end network through API. Based on this ability, algorithm developers can formulate general network transform rules to implement the algorithm logic without needing to implement the algorithm logic for each specific network. In addition, MindSpore Golden Stick also provides some debugging capabilities, including visualization tool, profiler tool, summary tool, aiming to help algorithm developers improve development and research efficiency, and help users find algorithms that meet their needs. + Model compression algorithms are often designed or optimized for specific network structures. For example, perceptual quantization algorithms often insert fake-quantization nodes on the Conv2d, Conv2d + BatchNorm2d, or Conv2d + BatchNorm2d + Relu structures in the network. MindSpore Golden Stick provides the ability to modify the front-end network through API. Based on this ability, algorithm developers can formulate general network transform rules to implement the algorithm logic without needing to implement the algorithm logic for each specific network. In addition, MindSpore Golden Stick also provides some debugging capabilities, including network dump, level-wise profiling, algorithm effect analysis and visualization tool, aiming to help algorithm developers improve development and research efficiency, and help users find algorithms that meet their needs. General Process of Applying the MindSpore Golden Stick ------------------------------------------------------ diff --git a/docs/golden_stick/docs/source_en/install.md b/docs/golden_stick/docs/source_en/install.md index 743a33f20994b950ed7b9c06fb38ed77fcf7f289..cdc0d9247df858c117c2a1e75ec5670ac5806867 100644 --- a/docs/golden_stick/docs/source_en/install.md +++ b/docs/golden_stick/docs/source_en/install.md @@ -18,10 +18,6 @@ The following table lists the environment required for installing, compiling and The MindSpore Golden Stick depends on the MindSpore training and inference framework, please refer the table below and [MindSpore Installation Guide](https://mindspore.cn/install) to install the corresponding MindSpore verision. -```shell -pip install https://ms-release.obs.cn-north-4.myhuaweicloud.com/{MindSpore-Version}/MindSpore/cpu/ubuntu_x86/mindspore-{MindSpore-Version}-cp37-cp37m-linux_x86_64.whl -``` - | MindSpore Golden Stick Version | Branch | MindSpore version | | :-----------------------------: | :----------------------------------------------------------: | :-------: | | 0.1.0 | [r0.1](https://gitee.com/mindspore/golden-stick/tree/r0.1/) | 1.8.0 | @@ -36,7 +32,7 @@ If you use the pip command, please download the whl package from [MindSpore Gold pip install https://ms-release.obs.cn-north-4.myhuaweicloud.com/{MindSpore_version}/golden_stick/any/mindspore_rl-{mg_version}-py3-none-any.whl --trusted-host ms-release.obs.cn-north-4.myhuaweicloud.com -i https://pypi.tuna.tsinghua.edu.cn/simple ``` -> - Installing whl package will download MindSpore Golden Stick dependencies automatically (detail of dependencies is shown in requirement.txt), other dependencies should install manually. +> - Installing whl package will download MindSpore Golden Stick dependencies automatically (detail of dependencies is shown in requirement.txt), other dependencies should install manually. > - `{MindSpore_version}` stands for the version of MindSpore. For the version matching relationship between MindSpore and MindSpore Golden Stick, please refer to [page](https://www.mindspore.cn/versions). > - `{ms_version}` stands for the version of MindSpore Golden Stick. For example, if you would like to download version 0.1.0, you should fill 1.8.0 in `{MindSpore_version}` and fill 0.1.0 in `{mg_version}`. diff --git a/docs/golden_stick/docs/source_en/quantization/overview.md b/docs/golden_stick/docs/source_en/quantization/overview.md index d923adf538d28fa413be8e66df9cab1a632d65e2..a99f60331500434eee64a71ff6f61e781f8fe28a 100644 --- a/docs/golden_stick/docs/source_en/quantization/overview.md +++ b/docs/golden_stick/docs/source_en/quantization/overview.md @@ -24,7 +24,7 @@ Second, traditional convolution operations use FP32, which takes a lot of time t As shown in the preceding figure, compared with the FP32 type, low-precision data representation types such as FP16 and INT8 occupy less space. Replacing the high-precision data representation type with the low-precision data representation type can greatly reduce the storage space and transmission time. Low-bit computing has higher performance. Compared with FP32, INT8 has a three-fold or even higher acceleration ratio. For the same computing, INT8 has obvious advantages in power consumption. -Currently, there are two types of quantization solutions in the industry: quantization aware training and post-training quantization. +Currently, there are two types of quantization solutions in the industry: **quantization aware training** and **post-training quantization**. (1) **Quantization aware training** requires training data and has better network accuracy. It is applicable to scenarios that have high requirements on the network compression rate and accuracy. The purpose is to reduce accuracy loss. The forward inference process in which the gradient is involved in network training enables the network to obtain a difference of quantization loss. The gradient update needs to be performed in a floating point. Therefore, the gradient is not involved in a backward propagation process. diff --git a/docs/golden_stick/docs/source_en/quantization/simqat.md b/docs/golden_stick/docs/source_en/quantization/simqat.md index c11ac81773e26d2ae72a855a3a4d15d212fb93ec..950400139b6f4119313e4f122539b71083001aa8 100644 --- a/docs/golden_stick/docs/source_en/quantization/simqat.md +++ b/docs/golden_stick/docs/source_en/quantization/simqat.md @@ -109,7 +109,7 @@ quanted_network = algo.apply(network) print(quanted_network) ``` -The quantized network structure is as follows: +The quantized network structure is as follows. QuantizerWrapperCell is the encapsulation class of perceptual quantization training for the original Conv2d or Dense, including the original operator and pseudo-quantization nodes of input, output and weight. Users can refer to [API](HTTP://https://www.mindspore.cn/golden_stick/docs/en/r0.1/mindspore_gs.html#mindspore_gs.SimulatedQuantizationAwareTraining) to modify the algorithm configuration, and verify that the algorithm is configured successfully by checking the QuantizeWrapperCell properties. ```commandline LeNet5Opt< @@ -229,6 +229,8 @@ print(acc) The accuracy of LeNet-5 does not decrease after quantization aware training. +> The model here is not the final deployment model. Due to the addition of pseudo quantization nodes, the ckpt size increases slightly compared to the original model. + ## Summary This document describes the functions of quantization and principles of common quantization algorithms, and provides examples to describe how to use the quantization aware algorithm of MindSpore Golden Stick. The quantization algorithm can greatly reduce the model size and improve the model inference performance without decreasing the accuracy. Try it out yourself. diff --git a/docs/golden_stick/docs/source_zh_cn/install.md b/docs/golden_stick/docs/source_zh_cn/install.md index 868912cbeb86fc010ea01c37a5b9f20473ca49d0..65ecc4b2b6ee210bea9125e14ac2b85e0b5fc1b9 100644 --- a/docs/golden_stick/docs/source_zh_cn/install.md +++ b/docs/golden_stick/docs/source_zh_cn/install.md @@ -16,7 +16,7 @@ ## MindSpore版本依赖关系 -MindSpore Golden Stick依赖MindSpore训练推理框架,请按照根据下表中所指示的对应关系,并参考[MindSpore安装指导](https://mindspore.cn/install)安装对应版本的MindSpore: +MindSpore Golden Stick依赖MindSpore训练推理框架,请按照根据下表中所指示的对应关系,并参考[MindSpore安装指导](https://mindspore.cn/install)安装对应版本的MindSpore。 | MindSpore Golden Stick版本 | 分支 | MindSpore版本 | | :---------------------: | :----------------------------------------------------------: | :-------: | @@ -49,7 +49,7 @@ pip install output/mindspore_gs-0.1.0-py3-none-any.whl ## 验证安装是否成功 -执行以下命令,验证安装结果。导入Python模块不报错即安装成功: +执行以下命令,验证安装结果。导入Python模块不报错即安装成功。 ```python import mindspore_gs diff --git a/docs/golden_stick/docs/source_zh_cn/pruner/scop.md b/docs/golden_stick/docs/source_zh_cn/pruner/scop.md index 1ffceff2d251a349440a96b2543eab232023c81b..300626523dafeb4d30d5cc393d1c85c25f0347c9 100644 --- a/docs/golden_stick/docs/source_zh_cn/pruner/scop.md +++ b/docs/golden_stick/docs/source_zh_cn/pruner/scop.md @@ -185,7 +185,7 @@ if __name__ == "__main__": print("result:", res, "prune_rate=", config.prune_rate, "ckpt=", config.checkpoint_file_path, "params=", total_params) ``` -模型评估的精度(top_1_accuracy)、剪枝率(prune_rate)、模型存储位置(ckpt)及参数量(params)如下: +模型评估的精度(top_1_accuracy)、剪枝率(prune_rate)、模型存储位置(ckpt)及参数量(params)如下: ```text result:{'top_1_accuracy': 0.9273838141025641} prune_rate=0.45 ckpt=~/resnet50_cifar10/train_parallel0/resnet-400_390.ckpt params=10587835