From dd75ee403a184b632f1b3f302b567636cdd214ab Mon Sep 17 00:00:00 2001 From: zhangyihuiben Date: Fri, 31 Oct 2025 10:56:09 +0800 Subject: [PATCH] =?UTF-8?q?=E6=95=B4=E6=94=B9=E7=B2=BE=E5=BA=A6=E6=96=87?= =?UTF-8?q?=E6=A1=A3?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- .../advanced_development/accuracy_comparison.md | 12 ++++++------ .../advanced_development/accuracy_comparison.md | 2 +- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/docs/mindformers/docs/source_en/advanced_development/accuracy_comparison.md b/docs/mindformers/docs/source_en/advanced_development/accuracy_comparison.md index d098180d13..b79afb3f86 100644 --- a/docs/mindformers/docs/source_en/advanced_development/accuracy_comparison.md +++ b/docs/mindformers/docs/source_en/advanced_development/accuracy_comparison.md @@ -191,18 +191,18 @@ The following tables describe the configuration comparison with Megatron-LM. | `moe-router-topk-scaling-factor` | Top-*k* score scaling factor. | `routed_scaling_factor` | Top-*k* score scaling factor. | | `moe-router-enable-expert-bias` | Specifies whether to use the bias of an expert. | `balance_via_topk_bias` | Specifies whether to use the bias of an expert. | | `moe-router-bias-update-rate` | Update rate of expert bias. | `topk_bias_update_rate` | Update rate of expert bias. | - | `moe-use-legacy-grouped-gemm` | Specifies whether to use the source version of Grouped GEMM. | Not supported. | | - | `moe-aux-loss-coeff` | Auxiliary loss coefficient of MoE. | Not supported. | | - | `moe-z-loss-coeff` | MoE z-loss coefficient. | Not supported. | | + | `moe-use-legacy-grouped-gemm` | Specifies whether to use the source version of Grouped GEMM. | Not supported. | | + | `moe-aux-loss-coeff` | Auxiliary loss coefficient of MoE. | Not supported. | | + | `moe-z-loss-coeff` | MoE z-loss coefficient. | Not supported. | | | `moe-input-jitter-eps` | Input jitter noise of MoE. | `moe_input_jitter_eps` | Input jitter noise of MoE. | - | `moe-token-dispatcher-type` | Token scheduling policy (for example, **allgather**). | Not supported. | | + | `moe-token-dispatcher-type` | Token scheduling policy (for example, **allgather**). | `moe_token_dispatcher_type` | Token scheduling policy (for example, **allgather**). | | `moe-enable-deepep` | Specifies whether to enable DeepEP hybrid expert optimization. | `moe_enable_deepep` | Specifies whether to enable DeepEP hybrid expert optimization. | | `moe-per-layer-logging` | Prints logs at each MoE layer. | `moe_per_layer_logging` | Prints logs at each MoE layer. | | `moe-expert-capacity-factor` | Expansion ratio of the expert capacity. | `capacity_factor` | Expansion ratio of the expert capacity. | | `moe-pad-expert-input-to-capacity` | Specifies whether to fill the expert input to the capacity upper limit. | `moe_pad_expert_input_to_capacity` | Specifies whether to fill the expert input to the capacity upper limit. | | `moe-token-drop-policy` | Token discarding policy (for example, **probs** or **position**).| `enable_sdrop` | Token discarding policy (for example, **probs** or **position**).| - | `moe-extended-tp` | Enables extended tensor parallelism. | Not supported. | | - | `moe-use-upcycling` | Specifies whether to enable expert upcycling. | Not supported. | | + | `moe-extended-tp` | Enables extended tensor parallelism. | Not supported. | | + | `moe-use-upcycling` | Specifies whether to enable expert upcycling. | Not supported. | | | `moe-permute-fusion` | Enables internal permute fusion optimization of experts. | `moe_permute_fusion` | Enables internal permute fusion optimization of experts. | | `mtp-num-layers` | Number of MoE layers. | `mtp_depth` | Number of MoE layers. | | `mtp-loss-scaling-factor` | Loss scaling in the MoE architecture. | `mtp_loss_factor` | Loss scaling in the MoE architecture. | diff --git a/docs/mindformers/docs/source_zh_cn/advanced_development/accuracy_comparison.md b/docs/mindformers/docs/source_zh_cn/advanced_development/accuracy_comparison.md index d75faadc99..8b77d80cf6 100644 --- a/docs/mindformers/docs/source_zh_cn/advanced_development/accuracy_comparison.md +++ b/docs/mindformers/docs/source_zh_cn/advanced_development/accuracy_comparison.md @@ -195,7 +195,7 @@ Megatron-LM 是一个面向大规模训练任务的成熟框架,具备高度 | `moe-aux-loss-coeff` | MoE 辅助损失系数 | 不支持配置 | | | `moe-z-loss-coeff` | MoE z-loss 系数 | 不支持配置 | | | `moe-input-jitter-eps` | MoE 输入 jitter 噪声量 | `moe_input_jitter_eps` | MoE 输入 jitter 噪声量 | - | `moe-token-dispatcher-type` | token 调度策略(allgather 等) | 不支持配置 | | + | `moe-token-dispatcher-type` | token 调度策略(allgather 等) | `moe_token_dispatcher_type` | token 调度策略(allgather 等) | | `moe-enable-deepep` | 是否启用 DeepEP 混合专家优化 | `moe_enable_deepep` | 是否启用 DeepEP 混合专家优化 | | `moe-per-layer-logging` | 每层 MoE 打印日志 | `moe_per_layer_logging` | 每层 MoE 打印日志 | | `moe-expert-capacity-factor` | expert 容量扩展比例 | `capacity_factor` | expert 容量扩展比例 | -- Gitee