fix resnext152 eval data path and use bash in readme

This commit is contained in:
l00486551 2021-08-10 17:43:38 +08:00
parent 3b0c3e640b
commit e91cd800bd
5 changed files with 71 additions and 75 deletions

View File

@ -37,8 +37,8 @@ The overall network architecture of ResNeXt is show below:
Dataset used: [imagenet](http://www.image-net.org/)
- Dataset size: ~125G, 1.2W colorful images in 1000 classes
- Train: 120G, 1.2W images
- Test: 5G, 50000 images
- Train: 120G, 1.2W images
- Test: 5G, 50000 images
- Data format: RGB images
- Note: Data will be processed in src/dataset.py
@ -53,12 +53,11 @@ For FP16 operators, if the input data type is FP32, the backend of MindSpore wil
# [Environment Requirements](#contents)
- HardwareAscend
- Prepare hardware environment with Ascend processor. If you want to try Ascend, please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources.
- Framework
- [MindSpore](https://www.mindspore.cn/install/en)
- [MindSpore](https://www.mindspore.cn/install)
- For more information, please check the resources below
- [MindSpore Tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html)
- [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html)
- [MindSpore Tutorials](https://www.mindspore.cn/tutorials/zh-CN/master/index.html)
- [MindSpore Python API](https://www.mindspore.cn/docs/api/zh-CN/master/index.html)
# [Script description](#contents)
@ -145,18 +144,18 @@ or shell script:
```script
Ascend:
# distribute training example(8p)
sh run_distribute_train.sh RANK_TABLE_FILE DATA_PATH
bash run_distribute_train.sh RANK_TABLE_FILE DATA_PATH
# standalone training
sh run_standalone_train.sh DEVICE_ID DATA_PATH
bash run_standalone_train.sh DEVICE_ID DATA_PATH
```
#### Launch
```bash
# distributed training example(8p) for Ascend
sh scripts/run_distribute_train.sh RANK_TABLE_FILE /dataset/train
bash scripts/run_distribute_train.sh RANK_TABLE_FILE DATA_PATH
# standalone training example for Ascend
sh scripts/run_standalone_train.sh 0 /dataset/train
bash scripts/run_standalone_train.sh DEVICE_ID DATA_PATH
```
You can find checkpoint file together with result in log.
@ -175,7 +174,7 @@ or shell script:
```script
# Evaluation
sh run_eval.sh DEVICE_ID DATA_PATH PRETRAINED_CKPT_PATH PLATFORM
bash run_eval.sh DEVICE_ID DATA_PATH PRETRAINED_CKPT_PATH PLATFORM
```
PLATFORM is Ascend, default is Ascend.
@ -184,10 +183,10 @@ PLATFORM is Ascend, default is Ascend.
```bash
# Evaluation with checkpoint
sh scripts/run_eval.sh 0 /opt/npu/datasets/classification/val /resnext152_100.ckpt Ascend
bash scripts/run_eval.sh DEVICE_ID PRETRAINED_CKPT_PATH PLATFORM
#Directly use the script to run
python eval.py --data_dir /opt/npu/pvc/dataset/storage/imagenet/val/ --platform Ascend --pretrained /root/test/resnext152_64x4d/outputs_demo/best_acc_4.ckpt
# Directly use the script to run
python eval.py --data_dir ~/imagenet/val/ --platform Ascend --pretrained ~/best_acc_4.ckpt
```
#### Result
@ -213,31 +212,31 @@ python export.py --device_target [PLATFORM] --ckpt_file [CKPT_PATH] --file_forma
### Training Performance
| Parameters | ResNeXt152 | |
| -------------------------- | --------------------------------------------- | ---- |
| Resource | Ascend 910, cpu:2.60GHz 192cores, memory:755G | |
| uploaded Date | 06/30/2021 | |
| MindSpore Version | 1.2 | |
| Dataset | ImageNet | |
| Training Parameters | src/config.py | |
| Optimizer | Momentum | |
| Loss Function | SoftmaxCrossEntropy | |
| Loss | 1.28923 | |
| Accuracy | 80.08%(TOP1) | |
| Total time | 7.8 h 8ps | |
| Checkpoint for Fine tuning | 192 M(.ckpt file) | |
| Parameters | ResNeXt152 |
| -------------------------- | --------------------------------------------- |
| Resource | Ascend 910, cpu:2.60GHz 192cores, memory:755G |
| uploaded Date | 06/30/2021 |
| MindSpore Version | 1.2 |
| Dataset | ImageNet |
| Training Parameters | src/config.py |
| Optimizer | Momentum |
| Loss Function | SoftmaxCrossEntropy |
| Loss | 1.28923 |
| Accuracy | 80.08%(TOP1) |
| Total time | 7.8 h 8ps |
| Checkpoint for Fine tuning | 192 M(.ckpt file) |
#### Inference Performance
| Parameters | | | |
| ----------------- | ---- | ---- | ---------------- |
| Resource | | | Ascend 910 |
| uploaded Date | | | 06/20/2021 |
| MindSpore Version | | | 1.2 |
| Dataset | | | ImageNet, 1.2W |
| batch_size | | | 1 |
| outputs | | | probability |
| Accuracy | | | acc=80.08%(TOP1) |
| Parameters | |
| ----------------- | ---------------- |
| Resource | Ascend 910 |
| uploaded Date | 06/20/2021 |
| MindSpore Version | 1.2 |
| Dataset | ImageNet, 1.2W |
| batch_size | 1 |
| outputs | probability |
| Accuracy | acc=80.08%(TOP1) |
# [Description of Random Situation](#contents)

View File

@ -57,13 +57,11 @@ ResNeXt整体网络架构如下
# 环境要求
- 硬件Ascend
- 准备Ascend处理器搭建硬件环境。如需试用昇腾处理器请发送[申请表](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx)至ascend@huawei.com审核通过即可获得资源。
- 框架
- [MindSpore](https://www.mindspore.cn/install)
- 如需查看详情,请参见如下资源:
- [MindSpore教程](https://www.mindspore.cn/tutorial/training/zh-CN/master/index.html)
- [MindSpore Python API](https://www.mindspore.cn/doc/api_python/zh-CN/master/index.html)
- [MindSpore教程](https://www.mindspore.cn/tutorials/zh-CN/master/index.html)
- [MindSpore Python API](https://www.mindspore.cn/docs/api/zh-CN/master/index.html)
# 脚本说明
@ -149,18 +147,18 @@ python train.py --data_dir ~/imagenet/train/ --platform Ascend --is_distributed
```shell
Ascend:
# 分布式训练示例8卡
sh run_distribute_train.sh RANK_TABLE_FILE DATA_PATH
bash run_distribute_train.sh RANK_TABLE_FILE DATA_PATH
# 单机训练
sh run_standalone_train.sh DEVICE_ID DATA_PATH
bash run_standalone_train.sh DEVICE_ID DATA_PATH
```
### 样例
```shell
# Ascend分布式训练示例8卡
sh scripts/run_distribute_train.sh RANK_TABLE_FILE /dataset/train
bash scripts/run_distribute_train.sh RANK_TABLE_FILE DATA_PATH
# Ascend单机训练示例
sh scripts/run_standalone_train.sh 0 /dataset/train
bash scripts/run_standalone_train.sh DEVICE_ID DATA_PATH
```
您可以在日志中找到检查点文件和结果。
@ -179,7 +177,7 @@ python eval.py --data_dir ~/imagenet/val/ --platform Ascend --pretrained resnext
```shell
# 评估
sh run_eval.sh DEVICE_ID DATA_PATH PRETRAINED_CKPT_PATH PLATFORM
bash run_eval.sh DEVICE_ID DATA_PATH PRETRAINED_CKPT_PATH PLATFORM
```
PLATFORM is Ascend, default is Ascend.
@ -188,10 +186,10 @@ PLATFORM is Ascend, default is Ascend.
```shell
# 检查点评估
sh scripts/run_eval.sh 0 /opt/npu/datasets/classification/val /resnext152_100.ckpt Ascend
bash scripts/run_eval.sh DEVICE_ID PRETRAINED_CKPT_PATH PLATFORM
#或者直接使用脚本运行
python eval.py --data_dir /opt/npu/pvc/dataset/storage/imagenet/val/ --platform Ascend --pretrained /root/test/resnext152_64x4d/outputs_demo/best_acc_0.ckpt
python eval.py --data_dir ~/imagenet/val/ --platform Ascend --pretrained ~/best_acc_0.ckpt
```
#### 结果
@ -217,31 +215,31 @@ python export.py --device_target [PLATFORM] --ckpt_file [CKPT_PATH] --file_forma
### 训练性能
| 参数 | ResNeXt152 | |
| -------------------------- | ---------------------------------------------------------- | ------------------------- |
| 资源 | Ascend 910CPU2.60GHz192核内存755GB | |
| 上传日期 | 2021-6-30 | |
| MindSpore版本 | 1.2 | |
| 数据集 | ImageNet | |
| 训练参数 | src/config.py | |
| 优化器 | Momentum | |
| 损失函数 | Softmax交叉熵 | |
| 损失 | 1.2892 | |
| 准确率 | 80.08%(TOP1) | |
| 总时长 | 7.8小时 8卡 | |
| 调优检查点 | 192 M.ckpt文件 | |
| 参数 | ResNeXt152 |
| -------------------------- | ---------------------------------------------------------- |
| 资源 | Ascend 910CPU2.60GHz192核内存755GB |
| 上传日期 | 2021-6-30 |
| MindSpore版本 | 1.2 |
| 数据集 | ImageNet |
| 训练参数 | src/config.py |
| 优化器 | Momentum |
| 损失函数 | Softmax交叉熵 |
| 损失 | 1.2892 |
| 准确率 | 80.08%(TOP1) |
| 总时长 | 7.8小时 8卡 |
| 调优检查点 | 192 M.ckpt文件 |
#### 推理性能
| 参数 | | | |
| -------------------------- | ----------------------------- | ------------------------- | -------------------- |
| 资源 | | | Ascend 910 |
| 上传日期 | | | 2021-6-20 |
| MindSpore版本 | | | 1.2 |
| 数据集 | | | ImageNet 1.2万 |
| batch_size | | | 1 |
| 输出 | | | 概率 |
| 准确率 | | | acc=80.08%(TOP1) |
| 参数 | |
| -------------------------- | -------------------- |
| 资源 | Ascend 910 |
| 上传日期 | 2021-6-20 |
| MindSpore版本 | 1.2 |
| 数据集 | ImageNet 1.2万 |
| batch_size | 1 |
| 输出 | 概率 |
| 准确率 | acc=80.08%(TOP1) |
# 随机情况说明

View File

@ -52,6 +52,7 @@ do
--is_distribute=1 \
--device_id=$DEVICE_ID \
--pretrained=$PATH_CHECKPOINT \
--data_dir=$DATA_DIR > log_less.txt 2>&1 &
--data_dir=$DATA_DIR \
--run_eval=False > log_less.txt 2>&1 &
cd ../
done

View File

@ -26,5 +26,6 @@ python train.py \
--is_distribute=0 \
--device_id=$DEVICE_ID \
--pretrained=$PATH_CHECKPOINT \
--data_dir=$DATA_DIR > log.txt 2>&1 &
--data_dir=$DATA_DIR \
--run_eval=False > log.txt 2>&1 &

View File

@ -146,7 +146,7 @@ def parse_args(cloud_args=None):
#dataset of eval dataset
parser.add_argument('--eval_data_dir',
type=str,
default='/opt/npu/pvc/dataset/storage/imagenet/val',
default='',
help='eval data dir')
parser.add_argument('--eval_per_batch_size',
default=32,
@ -289,9 +289,6 @@ def train(cloud_args=None):
# checkpoint save
progress_cb = ProgressMonitor(args)
callbacks = [progress_cb,]
#eval dataset
if args.eval_data_dir is None or (not os.path.isdir(args.eval_data_dir)):
raise ValueError("{} is not a existing path.".format(args.eval_data_dir))
#code like eval.py
#if run eval
if args.run_eval: