modified shell in network

2021-03-16 19:44:23 +08:00 · 2021-03-16 19:44:23 +08:00 · 8bdea4ab54
parent df4da9ca85
commit 8bdea4ab54
3 changed files with 85 additions and 20 deletions
--- a/model_zoo/official/cv/yolov3_darknet53/README.md
+++ b/model_zoo/official/cv/yolov3_darknet53/README.md
@ -47,8 +47,23 @@ Dataset used: [COCO2014](https://cocodataset.org/#download)
    - Train：13G, 82,783 images
    - Val：6G, 40,504 images
    - Annotations: 241M, Train/Val annotations
- Data format：zip files
-    - Note：Data will be processed in yolo_dataset.py, and unzip files before uses it.
+- The directory structure is as follows.
+
+    ```text
+        ├── dataset
+            ├── coco2014
+                ├── annotations
+                │   ├─ train.json
+                │   └─ val.json
+                ├─ train
+                │   ├─picture1.jpg
+                │   ├─ ...
+                │   └─picturen.jpg
+                └─ val
+                    ├─picture1.jpg
+                    ├─ ...
+                    └─picturen.jpg
+    ```

 ## [Environment Requirements](#contents)

@ -62,11 +77,29 @@ Dataset used: [COCO2014](https://cocodataset.org/#download)

 ## [Quick Start](#contents)

-After installing MindSpore via the official website, you can start training and evaluation in as follows. If running on GPU, please add `--device_target=GPU` in the python command or use the "_gpu" shell script ("xxx_gpu.sh").
+- After installing MindSpore via the official website, you can start training and evaluation in as follows. If running on GPU, please add `--device_target=GPU` in the python command or use the "_gpu" shell script ("xxx_gpu.sh").
+- Prepare the backbone_darknet53.ckpt and hccl_8p.json files, before run network.
+    - Pretrained_backbone can use src/convert_weight.py, convert darknet53.conv.74 to mindspore ckpt.
+
+      ```
+      python convert_weight.py --input_file ./darknet53.conv.74
+      ```
+
+      darknet53.conv.74 can get from [download](https://pjreddie.com/media/files/darknet53.conv.74) .
+      you can use command in linux os.
+
+      ```
+      wget https://pjreddie.com/media/files/darknet53.conv.74
+      ```
+
+    - Genatating hccl_8p.json, Run the script of model_zoo/utils/hccl_tools/hccl_tools.py.
+      The following parameter "[0-8)" indicates that the hccl_8p.json file of cards 0 to 7 is generated.
+
+      ```
+      python hccl_tools.py --device_num "[0,8)"
+      ```

 ```network
-# The darknet53_backbone.ckpt in the follow script is got from darknet53 training like paper.
-# pretrained_backbone can use src/convert_weight.py, convert darknet53.conv.74 to mindspore ckpt, darknet53.conv.74 can get from `https://pjreddie.com/media/files/darknet53.conv.74` .
 # The parameter of training_shape define image shape for network, default is "".
 # It means use 10 kinds of shape as input shape, or it can be set some kind of shape.
 # run training example(1p) by python command.
@ -309,15 +342,15 @@ This the standard format from `pycocotools`, you can refer to [cocodataset](http
 | Model Version              | YOLOv3                                                      |YOLOv3                                                       |
 | Resource                   | Ascend 910; CPU 2.60GHz, 192cores; Memory, 755G             | NV SMX2 V100-16G; CPU 2.10GHz, 96cores; Memory, 251G        |
 | uploaded Date              | 09/15/2020 (month/day/year)                                 | 09/02/2020 (month/day/year)                                 |
-| MindSpore Version          | 1.0.0                                                       | 1.0.0                                                       |
+| MindSpore Version          | 1.1.1                                                       | 1.1.1                                                       |
 | Dataset                    | COCO2014                                                    | COCO2014                                                    |
-| Training Parameters        | epoch=320, batch_size=32, lr=0.001, momentum=0.9            | epoch=320, batch_size=32, lr=0.001, momentum=0.9            |
+| Training Parameters        | epoch=320, batch_size=32, lr=0.001, momentum=0.9            | epoch=320, batch_size=32, lr=0.1, momentum=0.9            |
 | Optimizer                  | Momentum                                                    | Momentum                                                    |
 | Loss Function              | Sigmoid Cross Entropy with logits                           | Sigmoid Cross Entropy with logits                           |
 | outputs                    | boxes and label                                             | boxes and label                                             |
 | Loss                       | 34                                                          | 34                                                          |
 | Speed                      | 1pc: 350 ms/step;                                           | 1pc: 600 ms/step;                                           |
-| Total time                 | 8pc: 18.5 hours                                             | 8pc: 18 hours(shape=416)                                    |
+| Total time                 | 8pc: 13 hours                                               | 8pc: 18 hours(shape=416)                                    |
 | Parameters (M)             | 62.1                                                        | 62.1                                                        |
 | Checkpoint for Fine tuning | 474M (.ckpt file)                                           | 474M (.ckpt file)                                           |
 | Scripts                    | https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/yolov3_darknet53 | https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/yolov3_darknet53 |
@ -329,7 +362,7 @@ This the standard format from `pycocotools`, you can refer to [cocodataset](http
 | Model Version       | YOLOv3                      | YOLOv3                       |
 | Resource            | Ascend 910                  | NV SMX2 V100-16G             |
 | Uploaded Date       | 09/15/2020 (month/day/year) | 08/20/2020 (month/day/year)  |
-| MindSpore Version   | 1.0.0                       | 1.0.0                        |
+| MindSpore Version   | 1.1.1                       | 1.1.1                        |
 | Dataset             | COCO2014, 40,504  images    | COCO2014, 40,504  images     |
 | batch_size          | 1                           | 1                            |
 | outputs             | mAP                         | mAP                          |
--- a/model_zoo/official/cv/yolov3_darknet53/README_CN.md
+++ b/model_zoo/official/cv/yolov3_darknet53/README_CN.md
@ -49,13 +49,28 @@ YOLOv3使用DarkNet53执行特征提取，这是YOLOv2中的Darknet-19和残差
    - 训练集：13G，82783张图像  
    - 验证集：6GM，40504张图像
    - 标注：241M，训练/验证标注
- 数据格式：zip文件
-    - 注：数据将在yolo_dataset.py中处理，并在使用前解压文件。
+- 数据集的文件目录结构如下所示
+
+    ```ext
+        ├── dataset
+            ├── coco2014
+                ├── annotations
+                │   ├─ train.json
+                │   └─ val.json
+                ├─ train
+                │   ├─picture1.jpg
+                │   ├─ ...
+                │   └─picturen.jpg
+                └─ val
+                    ├─picture1.jpg
+                    ├─ ...
+                    └─picturen.jpg
+    ```

 # 环境要求

 - 硬件（Ascend/GPU）
-    - 使用Ascend或GPU处理器来搭建硬件环境。如需试用Ascend处理器，请发送[申请表](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx)至ascend@huawei.com，审核通过即可获得资源。
+    - 使用Ascend或GPU处理器来搭建硬件环境。如需试用Ascend处理器，请发送[申请表](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) 至ascend@huawei.com，审核通过即可获得资源。
 - 框架
    - [MindSpore](https://www.mindspore.cn/install)
 - 如需查看详情，请参见如下资源：
@ -64,11 +79,28 @@ YOLOv3使用DarkNet53执行特征提取，这是YOLOv2中的Darknet-19和残差

 # 快速入门

-通过官方网站安装MindSpore后，您可以按照如下步骤进行训练和评估：如果在GPU上运行，请在python命令中添加`--device_target=GPU`，或者使用“_gpu”shell脚本（“xxx_gpu.sh”）。
+- 通过官方网站安装MindSpore后，您可以按照如下步骤进行训练和评估：如果在GPU上运行，请在python命令中添加`--device_target=GPU`，或者使用“_gpu”shell脚本（“xxx_gpu.sh”）。
+- 在运行任务之前，需要准备backbone_darknet53.ckpt和hccl_8p.json文件。
+    - 使用src路径下的convert_weight.py脚本将darknet53.conv.74转换成mindspore ckpt格式。
+
+      ```command
+      python convert_weight.py --input_file ./darknet53.conv.74
+      ```
+
+      可以从网站[下载](https://pjreddie.com/media/files/darknet53.conv.74) darknet53.conv.74文件。
+      也可以在linux系统中使用指令下载该文件。
+
+      ```command
+      wget https://pjreddie.com/media/files/darknet53.conv.74
+      ```
+
+    - 可以运行model_zoo/utils/hccl_tools/路径下的hccl_tools.py脚本生成hccl_8p.json文件，下面指令中参数"[0, 8)"表示生成0-7的8卡hccl_8p.json文件。
+
+      ```command
+      python hccl_tools.py --device_num "[0,8)"
+      ```

 ```python
-# 下面的脚本中的darknet53_backbone.ckpt是从darknet53训练得到的。
-# pretrained_backbone可以使用src/convert_weight.py，将darknet53.conv.74转换为MindSpore checkpoint。可通过`https://pjreddie.com/media/files/darknet53.conv.74`获取darknet53.conv.74。
 # training_shape参数定义网络图像形状，默认为""。
 # 意思是使用10种形状作为输入形状，或者可以设置某种形状。
 # 通过python命令执行训练示例(1卡)。
@ -313,15 +345,15 @@ sh run_eval.sh dataset/coco2014/ checkpoint/0-319_102400.ckpt
 | 模型版本              | YOLOv3                                                      |YOLOv3                                                       |
 | 资源                   | Ascend 910；CPU 2.60GHz，192核；内存：755G             | NV SMX2 V100-16G；CPU 2.10GHz，96核；内存：251G        |
 | 上传日期              | 2020-06-31                                 | 2020-09-02                                  |
-| MindSpore版本          | 0.5.0-alpha                                                 | 0.7.0                                                       |
+| MindSpore版本          | 1.1.1                                                 | 1.1.1                                                       |
 | 数据集                    | COCO2014                                                    | COCO2014                                                    |
-| 训练参数        | epoch=320，batch_size=32，lr=0.001，momentum=0.9            | epoch=320，batch_size=32，lr=0.001，momentum=0.9            |
+| 训练参数        | epoch=320，batch_size=32，lr=0.001，momentum=0.9            | epoch=320，batch_size=32，lr=0.1，momentum=0.9            |
 | 优化器                  | Momentum                                                    | Momentum                                                    |
 | 损失函数              | 带logits的Sigmoid交叉熵                           | 带logits的Sigmoid交叉熵                           |
 | 输出                    | 边界框和标签                                             | 边界框和标签                                             |
 | 损失                       | 34                                                          | 34                                                          |
 | 速度                      | 1卡：350毫秒/步;                                           | 1卡: 600毫秒/步;                                           |
-| 总时长                 | 8卡：18.5小时                                             | 8卡: 18小时(shape=416)                                    |
+| 总时长                 | 8卡：13小时                                               | 8卡: 18小时(shape=416)                                    |
 | 参数(M)             | 62.1                                                        | 62.1                                                        |
 | 微调检查点 | 474M (.ckpt文件)                                           | 474M (.ckpt文件)                                           |
 | 脚本                    | https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/yolov3_darknet53 | https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/yolov3_darknet53 |
@ -333,7 +365,7 @@ sh run_eval.sh dataset/coco2014/ checkpoint/0-319_102400.ckpt
 | 模型版本       | YOLOv3                      | YOLOv3                       |
 | 资源            | Ascend 910                  | NV SMX2 V100-16G             |
 | 上传日期       |  2020-06-31 | 2020-08-20  |
-| MindSpore版本   | 0.5.0-alpha                 | 0.7.0                        |
+| MindSpore版本   | 1.1.1                 | 1.1.1                        |
 | 数据集             | COCO2014，40504张图像    | COCO2014，40504张图像     |
 | batch_size          | 1                           | 1                            |
 | 输出             | mAP                         | mAP                          |
--- a/model_zoo/official/cv/yolov3_darknet53/train.py
+++ b/model_zoo/official/cv/yolov3_darknet53/train.py
@ -171,7 +171,7 @@ def train():
    args = parse_args()
    network_init(args)
    if args.need_profiler:
-        from mindspore.profiler.profiling import Profiler
+        from mindspore.profiler import Profiler
        profiler = Profiler(output_path=args.outputs_dir, is_detail=True, is_show_op_path=True)

    loss_meter = AverageMeter('loss')