From 5452291b366369e87c7269aa9624e22ec7a65c54 Mon Sep 17 00:00:00 2001 From: huanxiaoling <3174348550@qq.com> Date: Wed, 12 Oct 2022 11:12:29 +0800 Subject: [PATCH] modify the wrong links in files in 1.9 --- .../source_en/faq/implement_problem.md | 34 +++++++++++++++++++ .../learning_rate_and_optimizer.md | 2 +- .../source_en/migration_guide/overview.md | 2 +- .../source_zh_cn/faq/implement_problem.md | 34 +++++++++++++++++++ tutorials/source_en/beginner/introduction.md | 2 +- 5 files changed, 71 insertions(+), 3 deletions(-) diff --git a/docs/mindspore/source_en/faq/implement_problem.md b/docs/mindspore/source_en/faq/implement_problem.md index f07f52a580..7fc2a49c84 100644 --- a/docs/mindspore/source_en/faq/implement_problem.md +++ b/docs/mindspore/source_en/faq/implement_problem.md @@ -619,4 +619,38 @@ A: The reason for this error is that the user did not configure the operator par Therefore, the user needs to set the operator parameters appropriately to avoid such errors. +
+ +**Q: How do I understand the "Ascend Error Message" in the error message?** + +A: The "Ascend Error Message" is a fault message thrown after there is an error during CANN execution when CANN (Ascend Heterogeneous Computing Architecture) interface is called by MindSpore, which contains information such as error code and error description. For example: + +```python +Traceback (most recent call last): + File "train.py", line 292, in + train_net() + File "/home/resnet_csj2/scripts/train_parallel0/src/model_utils/moxing_adapter.py", line 104, in wrapped_func + run_func(*args, **kwargs) + File "train.py", line 227, in train_net + set_parameter() + File "train.py", line 114, in set_parameter + init() + File "/home/miniconda3/envs/ms/lib/python3.7/site-packages/mindspore/communication/management.py", line 149, in init + init_hccl() + RuntimeError: Ascend kernel runtime initialization failed. + + \---------------------------------------------------- + \- Ascend Error Message: + \---------------------------------------------------- + EJ0001: Failed to initialize the HCCP process. Reason: Maybe the last training process is running. //EJ0001 is the error code, followed by the description and cause of the error. The cause of the error in this example is that the distributed training of the same 8 nodes was started several times, causing process conflicts + Solution: Wait for 10s after killing the last training process and try again. //The print message here gives the solution to the problem, and this example suggests that the user clean up the process + TraceBack (most recent call last): //The information printed here is the stack information used by the developer for positioning, and generally the user do not need to pay attention +``` + +```text +tsd client wait response fail, device response code[1]. unknown device error.[FUNC:WaitRsp][FILE:process_mode_manager.cpp][LINE:233] +``` + +In addition, CANN may throw some Inner Errors, for example, the error code is "EI9999: Inner Error". If you cannot search the case description in MindSpore official website or forum, you can ask for help in the community by raising an issue. +
\ No newline at end of file diff --git a/docs/mindspore/source_en/migration_guide/model_development/learning_rate_and_optimizer.md b/docs/mindspore/source_en/migration_guide/model_development/learning_rate_and_optimizer.md index d824916742..99928dc24a 100644 --- a/docs/mindspore/source_en/migration_guide/model_development/learning_rate_and_optimizer.md +++ b/docs/mindspore/source_en/migration_guide/model_development/learning_rate_and_optimizer.md @@ -1,6 +1,6 @@ # Learning Rate and Optimizer - + Before reading this chapter, please read the official MindSpore tutorial [Optimizer](https://www.mindspore.cn/tutorials/en/r1.9/advanced/modules/optim.html). diff --git a/docs/mindspore/source_en/migration_guide/overview.md b/docs/mindspore/source_en/migration_guide/overview.md index 058d3735ca..e6353bd48d 100644 --- a/docs/mindspore/source_en/migration_guide/overview.md +++ b/docs/mindspore/source_en/migration_guide/overview.md @@ -41,6 +41,6 @@ This chapter will introduce some methods of debugging and tuning from three aspe This chapter contains a complete network migration sample. From the analysis and replication of the benchmark network, it details the steps of script development and precision debugging and tuning, and finally lists the common problems and corresponding optimization methods during the migration process, framework performance issues. -## [FAQs](https://www.mindspore.cn/docs/en/r1.9/migration_guide/faq.html) +## FAQs This chapter lists the frequently-asked questions and corresponding solutions. diff --git a/docs/mindspore/source_zh_cn/faq/implement_problem.md b/docs/mindspore/source_zh_cn/faq/implement_problem.md index 8131db7a01..b71eb622bb 100644 --- a/docs/mindspore/source_zh_cn/faq/implement_problem.md +++ b/docs/mindspore/source_zh_cn/faq/implement_problem.md @@ -604,4 +604,38 @@ A: 此问题的原因为:用户未正确配置算子参数,导致算子申 因此,用户需要适当设置算子参数,以避免此类报错。 +
+ +**Q: 如何理解报错提示中的"Ascend Error Message"?** + + A: "Ascend Error Message"是MindSpore调用CANN(昇腾异构计算架构)接口时,CANN执行出错后抛出的故障信息,其中包含错误码和错误描述等信息,如下例子: + +```python +Traceback (most recent call last): + File "train.py", line 292, in + train_net() + File "/home/resnet_csj2/scripts/train_parallel0/src/model_utils/moxing_adapter.py", line 104, in wrapped_func + run_func(*args, **kwargs) + File "train.py", line 227, in train_net + set_parameter() + File "train.py", line 114, in set_parameter + init() + File "/home/miniconda3/envs/ms/lib/python3.7/site-packages/mindspore/communication/management.py", line 149, in init + init_hccl() + RuntimeError: Ascend kernel runtime initialization failed. + + \---------------------------------------------------- + \- Ascend Error Message: + \---------------------------------------------------- + EJ0001: Failed to initialize the HCCP process. Reason: Maybe the last training process is running. //EJ0001为错误码,之后是错误的描述与原因,本例子的错误原因是多次启动了相同8节点的分布式训练,造成进程冲突 + Solution: Wait for 10s after killing the last training process and try again. //此处打印信息给出了问题的解决方案,此例子建议用户清理进程 + TraceBack (most recent call last): //此处打印的信息是开发用于定位的堆栈信息,一般情况下用户不需关注 +``` + +```text + tsd client wait response fail, device response code[1]. unknown device error.[FUNC:WaitRsp][FILE:process_mode_manager.cpp][LINE:233] +``` + +另外在一些情况下,CANN会抛出一些内部错误(Inner Error),例如:错误码为 "EI9999: Inner Error" 此种情况如果在MindSpore官网或者论坛无法搜索到案例说明,可在社区提单求助。 +
\ No newline at end of file diff --git a/tutorials/source_en/beginner/introduction.md b/tutorials/source_en/beginner/introduction.md index 69bf40b872..15d35403b0 100644 --- a/tutorials/source_en/beginner/introduction.md +++ b/tutorials/source_en/beginner/introduction.md @@ -75,7 +75,7 @@ After the neural network model is trained, you can export the model or load the To support network building, entire graph execution, subgraph execution, and single-operator execution, MindSpore provides users with three levels of APIs. In descending order, these are High-Level Python API, Medium-Level Python API, and Low-Level Python API. -![MindSpore API](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r1.9/tutorials/source_zh_cn/beginner/images/introduction3.png) +![MindSpore API](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r1.9/tutorials/source_en/beginner/images/introduction3.png) - High-Level Python API -- Gitee