update readme of model_zoo

2021-09-11 11:07:32 +08:00 · 2021-09-11 11:07:32 +08:00 · 1ccbee1fe2
parent d939ea8d79
commit 1ccbee1fe2
4 changed files with 25 additions and 0 deletions
--- a/model_zoo/README.md
+++ b/model_zoo/README.md
@ -102,6 +102,8 @@ Here is the ModelZoo for MindSpore which support different devices including Asc

 If you are looking for exclusive models only for Ascend using different ML platform, you could refer to [Ascend ModelZoo](https://hiascend.com/software/modelzoo) and corresponding [gitee repository](https://gitee.com/ascend/modelzoo)

+Modelzoo will be transferred to a new repo [models](https://gitee.com/mindspore/models).
+
 ## Disclaimers

 Mindspore only provides scripts that downloads and preprocesses public datasets. We do not own these datasets and are not responsible for their quality or maintenance. Please make sure you have permission to use the dataset under the dataset’s license. The models trained on these dataset are for non-commercial research and educational purpose only.
@ -119,3 +121,7 @@ MindSpore is Apache 2.0 licensed. Please see the LICENSE file.
 - **Q: How to resolve the lack of memory while using `PYNATIVE_MODE` with errors such as *Failed to alloc memory pool memory*?**

  **A**: `PYNATIVE_MODE` usually requires more memory than `GRAPH_MODE`, especially in training process which have to deal with back propagation. You could try using smaller batch size.
+
+- **Q: How to resolve the error about the interface not supported, such as `cann not import`?**
+
+  **A**: Please check the version of MindSpore and the branch you fetch the modelzoo scripts. Some model scripits in latest branch will use new interface in the latest version of MindSpore.
--- a/model_zoo/README_CN.md
+++ b/model_zoo/README_CN.md
@ -102,6 +102,8 @@

 相应的专属于Ascend平台的多框架模型可以参考[昇腾ModelZoo](https://hiascend.com/software/modelzoo)以及对应的[代码仓](https://gitee.com/ascend/modelzoo)。

+modelzoo将被转移到一个独立的新仓库[models](https://gitee.com/mindspore/models)。
+
 ## 免责声明

 MindSpore仅提供下载和预处理公共数据集的脚本。我们不拥有这些数据集，也不对它们的质量负责或维护。请确保您具有在数据集许可下使用该数据集的权限。在这些数据集上训练的模型仅用于非商业研究和教学目的。
@ -119,3 +121,7 @@ MindSpore已获得Apache 2.0许可，请参见LICENSE文件。
 - **Q: 使用`PYNATIVE_MODE`运行模型出现错误内存不足，例如*Failed to alloc memory pool memory*, 该怎么处理?**

  **A**: `PYNATIVE_MODE`通常比`GRAPH_MODE`使用更多内存，尤其是在需要进行反向传播计算的训练图中，你可以尝试使用一些更小的batch size.
+
+- **Q: 一些网络运行中报错接口不存在，例如cannot import，该怎么处理?**
+
+  **A**: 优先检查一下获取网络脚本的分支，与所使用的的MindSpore版本是否一致，部分新分支中的模型脚本会使用一些新版本MindSpore才支持的借口，从而在使用老版本MindSpore时会发生报错.
--- a/model_zoo/official/nlp/bert/README.md
+++ b/model_zoo/official/nlp/bert/README.md
@ -40,6 +40,7 @@
            - [Inference Performance](#inference-performance)
 - [Description of Random Situation](#description-of-random-situation)
 - [ModelZoo Homepage](#modelzoo-homepage)
+- [FAQ](#faq)

 # [BERT Description](#contents)

@ -825,5 +826,10 @@ Refer to the [ModelZoo FAQ](https://gitee.com/mindspore/mindspore/tree/master/mo
  You could try lower `learning_rate` to use lower base learning rate or higher `power` to make learning rate decrease faster in config yaml.

 - **Q: Why the training process failed with error for the shape can not match?**
+
  **A**: This is usually caused by the config `seq_length` of model can't match the dataset. You could check and modified the `seq_length` in yaml config according to the dataset you used.
  The parameter of model won't change with `seq_length`, the shapes of parameter only depends on model config `max_position_embeddings`.
+
+- **Q: Why the training process failed with error about operator `Gather`?**
+
+  **A**: Bert use operator `Gather` for embedding. The size of vocab is configured by `vocab_size` in yaml config file. If the vocab used to construct the dataset is larger than config, the operator will failed for the violation access.
--- a/model_zoo/official/nlp/bert/README_CN.md
+++ b/model_zoo/official/nlp/bert/README_CN.md
@ -41,6 +41,7 @@
            - [推理性能](#推理性能)
 - [随机情况说明](#随机情况说明)
 - [ModelZoo主页](#modelzoo主页)
+- [FAQ](#faq)

 <!-- /TOC -->

@ -779,7 +780,13 @@ run_pretrain.py中设置了随机种子，确保分布式训练中每个节点
 优先参考[ModelZoo FAQ](https://gitee.com/mindspore/mindspore/tree/master/model_zoo#FAQ)来查找一些常见的公共问题。

 - **Q: 运行过程中发生持续溢出怎么办？**
+
  **A**： 持续溢出通常是因为使用了较高的学习率导致训练不收敛。可以考虑修改yaml配置文件中的参数，调低`learning_rate`来降低初始学习率或提高`power`加速学习率衰减。

 - **Q: 运行报错shape不匹配是什么问题？**
+
  **A**： Bert模型中的shape不匹配通常是因为模型参数配置和使用的数据集规格不匹配，主要是句长问题，可以考虑修改`seq_length`参数来匹配所使用的具体数据集。改变该参数不影响权重的规格，权重的规格仅与`max_position_embeddings`参数有关。
+
+- **Q: 运行过程中报错Gather算子错误是什么问题？**
+
+  **A**： Bert模型中的使用Gather算子完成embedding操作，操作会根据输入数据的值来映射字典表，字典表的大小由配置文件中的`vocab_size`来决定，当实际使用的数据集编码时使用的字典表大小超过配置的大小时，操作gather算子时就会发出越界访问的错误，从而Gather算子会报错中止程序。