Add cache demo for modelzoo resnet

This commit is contained in:
Lixia Chen 2021-04-15 17:13:18 -04:00
parent fcac556d58
commit 357ae7833c
10 changed files with 277 additions and 19 deletions

View File

@ -155,7 +155,8 @@ python eval.py --net=[resnet50|resnet101] --dataset=[cifar10|imagenet2012] --dat
├── run_eval_gpu.sh # launch gpu evaluation
├── run_standalone_train_gpu.sh # launch gpu standalone training(1 pcs)
├── run_gpu_resnet_benchmark.sh # launch gpu benchmark train for resnet50 with imagenet2012
└── run_eval_gpu_resnet_benckmark.sh # launch gpu benchmark eval for resnet50 with imagenet2012
|── run_eval_gpu_resnet_benckmark.sh # launch gpu benchmark eval for resnet50 with imagenet2012
└── cache_util.sh # a collection of helper functions to manage cache
├── src
├── config.py # parameter configuration
├── dataset.py # data preprocessing
@ -330,7 +331,26 @@ bash run_parameter_server_train_gpu.sh [resnet50|resnet101] [cifar10|imagenet201
#### Evaluation while training
You can add `run_eval` to start shell and set it True, if you want evaluation while training. And you can set argument option: `eval_dataset_path`, `save_best_ckpt`, `eval_start_epoch`, `eval_interval` when `run_eval` is True.
```bash
# evaluation while distributed training Ascend example:
bash run_distribute_train.sh [resnet18|resnet50|resnet101|se-resnet50] [cifar10|imagenet2012] [RANK_TABLE_FILE] [DATASET_PATH] [RUN_EVAL](optional) [EVAL_DATASET_PATH](optional)
# evaluation while standalone training Ascend example:
bash run_standalone_train.sh [resnet18|resnet50|resnet101|se-resnet50] [cifar10|imagenet2012] [RANK_TABLE_FILE] [DATASET_PATH] [RUN_EVAL](optional) [EVAL_DATASET_PATH](optional)
# evaluation while distributed training GPU example:
bash run_distribute_train_gpu.sh [resnet50|resnet101] [cifar10|imagenet2012] [DATASET_PATH] [RUN_EVAL](optional) [EVAL_DATASET_PATH](optional)
# evaluation while standalone training GPU example:
bash run_standalone_train_gpu.sh [resnet50|resnet101] [cifar10|imagenet2012] [DATASET_PATH] [RUN_EVAL](optional) [EVAL_DATASET_PATH](optional)
```
`RUN_EVAL` and `EVAL_DATASET_PATH` are optional arguments, setting `RUN_EVAL`=True allows you to do evaluation while training. When `RUN_EVAL` is set, `EVAL_DATASET_PATH` must also be set.
And you can also set these optional arguments: `save_best_ckpt`, `eval_start_epoch`, `eval_interval` for python script when `RUN_EVAL` is True.
By default, a standalone cache server would be started to cache all eval images in tensor format in memory to improve the evaluation performance. Please make sure the dataset fits in memory (Around 30GB of memory required for ImageNet2012 eval dataset, 6GB of memory required for CIFAR-10 eval dataset).
Users can choose to shutdown the cache server after training or leave it alone for future usage.
### Result

View File

@ -67,7 +67,7 @@ ResNet的总体网络架构如下
使用的数据集:[ImageNet2012](http://www.image-net.org/)
- 数据集大小共1000个类、224*224彩色图像
- 训练集共1,281,167张图像
- 训练集共1,281,167张图像
- 测试集共50,000张图像
- 数据格式JPEG
- 注数据在dataset.py中处理。
@ -143,7 +143,8 @@ bash run_eval_gpu.sh [resnet50|resnet101] [cifar10|imagenet2012] [DATASET_PATH]
├── run_distribute_train_gpu.sh # 启动GPU分布式训练8卡
├── run_parameter_server_train_gpu.sh # 启动GPU参数服务器训练8卡
├── run_eval_gpu.sh # 启动GPU评估
└── run_standalone_train_gpu.sh # 启动GPU单机训练单卡
├── run_standalone_train_gpu.sh # 启动GPU单机训练单卡
└── cache_util.sh # 使用单节点緩存的帮助函数
├── src
├── config.py # 参数配置
├── dataset.py # 数据预处理
@ -304,7 +305,25 @@ bash run_parameter_server_train_gpu.sh [resnet50|resnet101] [cifar10|imagenet201
#### 训练时推理
训练时推理需要在启动文件中添加`run_eval` 并设置为True。与此同时需要设置: `eval_dataset_path`, `save_best_ckpt`, `eval_start_epoch`, `eval_interval`
```bash
# Ascend 分布式训练时推理示例:
bash run_distribute_train.sh [resnet18|resnet50|resnet101|se-resnet50] [cifar10|imagenet2012] [RANK_TABLE_FILE] [DATASET_PATH] [RUN_EVAL](optional) [EVAL_DATASET_PATH](optional)
# Ascend 单机训练时推理示例:
bash run_standalone_train.sh [resnet18|resnet50|resnet101|se-resnet50] [cifar10|imagenet2012] [RANK_TABLE_FILE] [DATASET_PATH] [RUN_EVAL](optional) [EVAL_DATASET_PATH](optional)
# GPU 分布式训练时推理示例:
bash run_distribute_train_gpu.sh [resnet50|resnet101] [cifar10|imagenet2012] [DATASET_PATH] [RUN_EVAL](optional) [EVAL_DATASET_PATH](optional)
# GPU 单机训练时推理示例:
bash run_standalone_train_gpu.sh [resnet50|resnet101] [cifar10|imagenet2012] [DATASET_PATH] [RUN_EVAL](optional) [EVAL_DATASET_PATH](optional)
```
训练时推理需要在设置`RUN_EVAL`为True与此同时还需要设置`EVAL_DATASET_PATH`。此外,当设置`RUN_EVAL`为True时还可为python脚本设置`save_best_ckpt`, `eval_start_epoch`, `eval_interval`等参数。
默认情况下我们将启动一个独立的缓存服务器将推理数据集的图片以tensor的形式保存在内存中以带来推理性能的提升。用户在使用缓存前需确保内存大小足够缓存推理集中的图片缓存ImageNet2012的推理集大约需要30GB的内存缓存CIFAR-10的推理集约需要使用6GB的内存
在训练结束后,可以选择关闭缓存服务器或不关闭它以继续为未来的推理提供缓存服务。
### 结果

View File

@ -0,0 +1,49 @@
#!/usr/bin/env bash
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
bootup_cache_server()
{
echo "Booting up cache server..."
result=$(cache_admin --start 2>&1)
rc=$?
echo "${result}"
if [ "${rc}" -ne 0 ] && [[ ! ${result} =~ "Cache server is already up and running" ]]; then
echo "cache_admin command failure!" "${result}"
exit 1
fi
}
generate_cache_session()
{
result=$(cache_admin -g | awk 'END {print $NF}')
rc=$?
echo "${result}"
if [ "${rc}" -ne 0 ]; then
echo "cache_admin command failure!" "${result}"
exit 1
fi
}
shutdown_cache_server()
{
echo "Shutting down cache server..."
result=$(cache_admin --stop 2>&1)
rc=$?
echo "${result}"
if [ "${rc}" -ne 0 ] && [[ ! ${result} =~ "Server on port 50052 is not up or has been shutdown already" ]]; then
echo "cache_admin command failure!" "${result}"
exit 1
fi
}

View File

@ -14,9 +14,12 @@
# limitations under the License.
# ============================================================================
if [ $# != 4 ] && [ $# != 5 ]
. cache_util.sh
if [ $# != 4 ] && [ $# != 5 ] && [ $# != 6 ]
then
echo "Usage: bash run_distribute_train.sh [resnet18|resnet50|resnet101|se-resnet50] [cifar10|imagenet2012] [RANK_TABLE_FILE] [DATASET_PATH] [PRETRAINED_CKPT_PATH](optional)"
echo " bash run_distribute_train.sh [resnet18|resnet50|resnet101|se-resnet50] [cifar10|imagenet2012] [RANK_TABLE_FILE] [DATASET_PATH] [RUN_EVAL](optional) [EVAL_DATASET_PATH](optional)"
exit 1
fi
@ -60,6 +63,12 @@ then
PATH3=$(get_real_path $5)
fi
if [ $# == 6 ]
then
RUN_EVAL=$5
EVAL_DATASET_PATH=$(get_real_path $6)
fi
if [ ! -f $PATH1 ]
then
echo "error: RANK_TABLE_FILE=$PATH1 is not a file"
@ -78,6 +87,18 @@ then
exit 1
fi
if [ "x${RUN_EVAL}" == "xTrue" ] && [ ! -d $EVAL_DATASET_PATH ]
then
echo "error: EVAL_DATASET_PATH=$EVAL_DATASET_PATH is not a directory"
exit 1
fi
if [ "x${RUN_EVAL}" == "xTrue" ]
then
bootup_cache_server
CACHE_SESSION_ID=$(generate_cache_session)
fi
ulimit -u unlimited
export DEVICE_NUM=8
export RANK_SIZE=8
@ -108,5 +129,10 @@ do
python train.py --net=$1 --dataset=$2 --run_distribute=True --device_num=$DEVICE_NUM --dataset_path=$PATH2 --pre_trained=$PATH3 &> log &
fi
if [ $# == 6 ]
then
python train.py --net=$1 --dataset=$2 --run_distribute=True --device_num=$DEVICE_NUM --dataset_path=$PATH2 \
--run_eval=$RUN_EVAL --eval_dataset_path=$EVAL_DATASET_PATH --enable_cache=True --cache_session_id=$CACHE_SESSION_ID &> log &
fi
cd ..
done

View File

@ -14,9 +14,12 @@
# limitations under the License.
# ============================================================================
if [ $# != 3 ] && [ $# != 4 ]
. cache_util.sh
if [ $# != 3 ] && [ $# != 4 ] && [ $# != 5 ]
then
echo "Usage: bash run_distribute_train_gpu.sh [resnet50|resnet101] [cifar10|imagenet2012] [DATASET_PATH] [PRETRAINED_CKPT_PATH](optional)"
echo " bash run_distribute_train_gpu.sh [resnet50|resnet101] [cifar10|imagenet2012] [DATASET_PATH] [RUN_EVAL](optional) [EVAL_DATASET_PATH](optional)"
exit 1
fi
@ -54,6 +57,12 @@ then
PATH2=$(get_real_path $4)
fi
if [ $# == 5 ]
then
RUN_EVAL=$4
EVAL_DATASET_PATH=$(get_real_path $5)
fi
if [ ! -d $PATH1 ]
then
@ -67,6 +76,18 @@ then
exit 1
fi
if [ "x${RUN_EVAL}" == "xTrue" ] && [ ! -d $EVAL_DATASET_PATH ]
then
echo "error: EVAL_DATASET_PATH=$EVAL_DATASET_PATH is not a directory"
exit 1
fi
if [ "x${RUN_EVAL}" == "xTrue" ]
then
bootup_cache_server
CACHE_SESSION_ID=$(generate_cache_session)
fi
ulimit -u unlimited
export DEVICE_NUM=8
export RANK_SIZE=8
@ -91,3 +112,11 @@ then
python train.py --net=$1 --dataset=$2 --run_distribute=True \
--device_num=$DEVICE_NUM --device_target="GPU" --dataset_path=$PATH1 --pre_trained=$PATH2 &> log &
fi
if [ $# == 5 ]
then
mpirun --allow-run-as-root -n $RANK_SIZE --output-filename log_output --merge-stderr-to-stdout \
python train.py --net=$1 --dataset=$2 --run_distribute=True \
--device_num=$DEVICE_NUM --device_target="GPU" --dataset_path=$PATH1 --run_eval=$RUN_EVAL \
--eval_dataset_path=$EVAL_DATASET_PATH --enable_cache=True --cache_session_id=$CACHE_SESSION_ID &> log &
fi

View File

@ -14,9 +14,12 @@
# limitations under the License.
# ============================================================================
if [ $# != 3 ] && [ $# != 4 ]
. cache_util.sh
if [ $# != 3 ] && [ $# != 4 ] && [ $# != 5 ]
then
echo "Usage: bash run_standalone_train.sh [resnet18|resnet50|resnet101|se-resnet50] [cifar10|imagenet2012] [DATASET_PATH] [PRETRAINED_CKPT_PATH](optional)"
echo " bash run_standalone_train.sh [resnet18|resnet50|resnet101|se-resnet50] [cifar10|imagenet2012] [DATASET_PATH] [RUN_EVAL](optional) [EVAL_DATASET_PATH](optional)"
exit 1
fi
@ -59,6 +62,12 @@ then
PATH2=$(get_real_path $4)
fi
if [ $# == 5 ]
then
RUN_EVAL=$4
EVAL_DATASET_PATH=$(get_real_path $5)
fi
if [ ! -d $PATH1 ]
then
echo "error: DATASET_PATH=$PATH1 is not a directory"
@ -71,6 +80,18 @@ then
exit 1
fi
if [ "x${RUN_EVAL}" == "xTrue" ] && [ ! -d $EVAL_DATASET_PATH ]
then
echo "error: EVAL_DATASET_PATH=$EVAL_DATASET_PATH is not a directory"
exit 1
fi
if [ "x${RUN_EVAL}" == "xTrue" ]
then
bootup_cache_server
CACHE_SESSION_ID=$(generate_cache_session)
fi
ulimit -u unlimited
export DEVICE_NUM=1
export RANK_ID=0
@ -96,4 +117,10 @@ if [ $# == 4 ]
then
python train.py --net=$1 --dataset=$2 --dataset_path=$PATH1 --pre_trained=$PATH2 &> log &
fi
if [ $# == 5 ]
then
python train.py --net=$1 --dataset=$2 --dataset_path=$PATH1 --run_eval=$RUN_EVAL \
--eval_dataset_path=$EVAL_DATASET_PATH --enable_cache=True --cache_session_id=$CACHE_SESSION_ID &> log &
fi
cd ..

View File

@ -14,9 +14,12 @@
# limitations under the License.
# ============================================================================
. cache_util.sh
if [ $# != 3 ] && [ $# != 4 ]
then
echo "Usage: bash run_standalone_train_gpu.sh [resnet50|resnet101] [cifar10|imagenet2012] [DATASET_PATH] [PRETRAINED_CKPT_PATH](optional)"
echo " bash run_standalone_train_gpu.sh [resnet50|resnet101] [cifar10|imagenet2012] [DATASET_PATH] [RUN_EVAL](optional) [EVAL_DATASET_PATH](optional)"
exit 1
fi
@ -54,6 +57,12 @@ then
PATH2=$(get_real_path $4)
fi
if [ $# == 5 ]
then
RUN_EVAL=$4
EVAL_DATASET_PATH=$(get_real_path $5)
fi
if [ ! -d $PATH1 ]
then
echo "error: DATASET_PATH=$PATH1 is not a directory"
@ -66,6 +75,19 @@ then
exit 1
fi
if [ "x${RUN_EVAL}" == "xTrue" ] && [ ! -d $EVAL_DATASET_PATH ]
then
echo "error: EVAL_DATASET_PATH=$EVAL_DATASET_PATH is not a directory"
exit 1
fi
if [ "x${RUN_EVAL}" == "xTrue" ]
then
bootup_cache_server
CACHE_SESSION_ID=$(generate_cache_session)
fi
ulimit -u unlimited
export DEVICE_NUM=1
export DEVICE_ID=0
@ -92,4 +114,10 @@ if [ $# == 4 ]
then
python train.py --net=$1 --dataset=$2 --device_target="GPU" --dataset_path=$PATH1 --pre_trained=$PATH2 &> log &
fi
if [ $# == 5 ]
then
python train.py --net=$1 --dataset=$2 --device_target="GPU" --dataset_path=$PATH1 --run_eval=$RUN_EVAL \
--eval_dataset_path=$EVAL_DATASET_PATH --enable_cache=True --cache_session_id=$CACHE_SESSION_ID &> log &
fi
cd ..

View File

@ -23,7 +23,8 @@ import mindspore.dataset.transforms.c_transforms as C2
from mindspore.communication.management import init, get_rank, get_group_size
def create_dataset1(dataset_path, do_train, repeat_num=1, batch_size=32, target="Ascend", distribute=False):
def create_dataset1(dataset_path, do_train, repeat_num=1, batch_size=32, target="Ascend", distribute=False,
enable_cache=False, cache_session_id=None):
"""
create a train or evaluate cifar10 dataset for resnet50
Args:
@ -33,6 +34,8 @@ def create_dataset1(dataset_path, do_train, repeat_num=1, batch_size=32, target=
batch_size(int): the batch size of dataset. Default: 32
target(str): the device target. Default: Ascend
distribute(bool): data for distribute or not. Default: False
enable_cache(bool): whether tensor caching service is used for eval.
cache_session_id(int): If enable_cache, cache session_id need to be provided.
Returns:
dataset
@ -70,7 +73,16 @@ def create_dataset1(dataset_path, do_train, repeat_num=1, batch_size=32, target=
type_cast_op = C2.TypeCast(mstype.int32)
data_set = data_set.map(operations=type_cast_op, input_columns="label", num_parallel_workers=8)
data_set = data_set.map(operations=trans, input_columns="image", num_parallel_workers=8)
# only enable cache for eval
if do_train:
enable_cache = False
if enable_cache:
if not cache_session_id:
raise ValueError("A cache session_id must be provided to use cache.")
eval_cache = ds.DatasetCache(session_id=int(cache_session_id), size=0)
data_set = data_set.map(operations=trans, input_columns="image", num_parallel_workers=8, cache=eval_cache)
else:
data_set = data_set.map(operations=trans, input_columns="image", num_parallel_workers=8)
# apply batch operations
data_set = data_set.batch(batch_size, drop_remainder=True)
@ -80,7 +92,8 @@ def create_dataset1(dataset_path, do_train, repeat_num=1, batch_size=32, target=
return data_set
def create_dataset2(dataset_path, do_train, repeat_num=1, batch_size=32, target="Ascend", distribute=False):
def create_dataset2(dataset_path, do_train, repeat_num=1, batch_size=32, target="Ascend", distribute=False,
enable_cache=False, cache_session_id=None):
"""
create a train or eval imagenet2012 dataset for resnet50
@ -91,6 +104,8 @@ def create_dataset2(dataset_path, do_train, repeat_num=1, batch_size=32, target=
batch_size(int): the batch size of dataset. Default: 32
target(str): the device target. Default: Ascend
distribute(bool): data for distribute or not. Default: False
enable_cache(bool): whether tensor caching service is used for eval.
cache_session_id(int): If enable_cache, cache session_id need to be provided.
Returns:
dataset
@ -135,7 +150,17 @@ def create_dataset2(dataset_path, do_train, repeat_num=1, batch_size=32, target=
type_cast_op = C2.TypeCast(mstype.int32)
data_set = data_set.map(operations=trans, input_columns="image", num_parallel_workers=8)
data_set = data_set.map(operations=type_cast_op, input_columns="label", num_parallel_workers=8)
# only enable cache for eval
if do_train:
enable_cache = False
if enable_cache:
if not cache_session_id:
raise ValueError("A cache session_id must be provided to use cache.")
eval_cache = ds.DatasetCache(session_id=int(cache_session_id), size=0)
data_set = data_set.map(operations=type_cast_op, input_columns="label", num_parallel_workers=8,
cache=eval_cache)
else:
data_set = data_set.map(operations=type_cast_op, input_columns="label", num_parallel_workers=8)
# apply batch operations
data_set = data_set.batch(batch_size, drop_remainder=True)
@ -146,7 +171,8 @@ def create_dataset2(dataset_path, do_train, repeat_num=1, batch_size=32, target=
return data_set
def create_dataset3(dataset_path, do_train, repeat_num=1, batch_size=32, target="Ascend", distribute=False):
def create_dataset3(dataset_path, do_train, repeat_num=1, batch_size=32, target="Ascend", distribute=False,
enable_cache=False, cache_session_id=None):
"""
create a train or eval imagenet2012 dataset for resnet101
Args:
@ -156,6 +182,8 @@ def create_dataset3(dataset_path, do_train, repeat_num=1, batch_size=32, target=
batch_size(int): the batch size of dataset. Default: 32
target(str): the device target. Default: Ascend
distribute(bool): data for distribute or not. Default: False
enable_cache(bool): whether tensor caching service is used for eval.
cache_session_id(int): If enable_cache, cache session_id need to be provided.
Returns:
dataset
@ -199,7 +227,17 @@ def create_dataset3(dataset_path, do_train, repeat_num=1, batch_size=32, target=
type_cast_op = C2.TypeCast(mstype.int32)
data_set = data_set.map(operations=trans, input_columns="image", num_parallel_workers=8)
data_set = data_set.map(operations=type_cast_op, input_columns="label", num_parallel_workers=8)
# only enable cache for eval
if do_train:
enable_cache = False
if enable_cache:
if not cache_session_id:
raise ValueError("A cache session_id must be provided to use cache.")
eval_cache = ds.DatasetCache(session_id=int(cache_session_id), size=0)
data_set = data_set.map(operations=type_cast_op, input_columns="label", num_parallel_workers=8,
cache=eval_cache)
else:
data_set = data_set.map(operations=type_cast_op, input_columns="label", num_parallel_workers=8)
# apply batch operations
data_set = data_set.batch(batch_size, drop_remainder=True)
@ -209,7 +247,8 @@ def create_dataset3(dataset_path, do_train, repeat_num=1, batch_size=32, target=
return data_set
def create_dataset4(dataset_path, do_train, repeat_num=1, batch_size=32, target="Ascend", distribute=False):
def create_dataset4(dataset_path, do_train, repeat_num=1, batch_size=32, target="Ascend", distribute=False,
enable_cache=False, cache_session_id=None):
"""
create a train or eval imagenet2012 dataset for se-resnet50
@ -220,6 +259,8 @@ def create_dataset4(dataset_path, do_train, repeat_num=1, batch_size=32, target=
batch_size(int): the batch size of dataset. Default: 32
target(str): the device target. Default: Ascend
distribute(bool): data for distribute or not. Default: False
enable_cache(bool): whether tensor caching service is used for eval.
cache_session_id(int): If enable_cache, cache session_id need to be provided.
Returns:
dataset
@ -261,7 +302,17 @@ def create_dataset4(dataset_path, do_train, repeat_num=1, batch_size=32, target=
type_cast_op = C2.TypeCast(mstype.int32)
data_set = data_set.map(operations=trans, input_columns="image", num_parallel_workers=12)
data_set = data_set.map(operations=type_cast_op, input_columns="label", num_parallel_workers=12)
# only enable cache for eval
if do_train:
enable_cache = False
if enable_cache:
if not cache_session_id:
raise ValueError("A cache session_id must be provided to use cache.")
eval_cache = ds.DatasetCache(session_id=int(cache_session_id), size=0)
data_set = data_set.map(operations=type_cast_op, input_columns="label", num_parallel_workers=12,
cache=eval_cache)
else:
data_set = data_set.map(operations=type_cast_op, input_columns="label", num_parallel_workers=12)
# apply batch operations
data_set = data_set.batch(batch_size, drop_remainder=True)

View File

@ -16,10 +16,12 @@
import os
import stat
import time
from mindspore import save_checkpoint
from mindspore import log as logger
from mindspore.train.callback import Callback
class EvalCallBack(Callback):
"""
Evaluation callback when training.
@ -72,8 +74,11 @@ class EvalCallBack(Callback):
cb_params = run_context.original_args()
cur_epoch = cb_params.cur_epoch_num
if cur_epoch >= self.eval_start_epoch and (cur_epoch - self.eval_start_epoch) % self.interval == 0:
eval_start = time.time()
res = self.eval_function(self.eval_param_dict)
print("epoch: {}, {}: {}".format(cur_epoch, self.metrics_name, res), flush=True)
eval_cost = time.time() - eval_start
print("epoch: {}, {}: {}, eval_cost:{:.2f}".format(cur_epoch, self.metrics_name, res, eval_cost),
flush=True)
if res >= self.best_res:
self.best_res = res
self.best_epoch = cur_epoch

View File

@ -60,6 +60,9 @@ parser.add_argument("--eval_start_epoch", type=int, default=40,
help="Evaluation start epoch when run_eval is True, default is 40.")
parser.add_argument("--eval_interval", type=int, default=1,
help="Evaluation interval when run_eval is True, default is 1.")
parser.add_argument('--enable_cache', type=ast.literal_eval, default=False,
help='Caching the eval dataset in memory to speedup evaluation, default is False.')
parser.add_argument('--cache_session_id', type=str, default="", help='The session id for cache service.')
args_opt = parser.parse_args()
set_seed(1)
@ -239,10 +242,11 @@ if __name__ == '__main__':
if args_opt.eval_dataset_path is None or (not os.path.isdir(args_opt.eval_dataset_path)):
raise ValueError("{} is not a existing path.".format(args_opt.eval_dataset_path))
eval_dataset = create_dataset(dataset_path=args_opt.eval_dataset_path, do_train=False,
batch_size=config.batch_size, target=target)
batch_size=config.batch_size, target=target, enable_cache=args_opt.enable_cache,
cache_session_id=args_opt.cache_session_id)
eval_param_dict = {"model": model, "dataset": eval_dataset, "metrics_name": "acc"}
eval_cb = EvalCallBack(apply_eval, eval_param_dict, interval=args_opt.eval_interval,
eval_start_epoch=args_opt.eval_start_epoch, save_best_ckpt=True,
eval_start_epoch=args_opt.eval_start_epoch, save_best_ckpt=args_opt.save_best_ckpt,
ckpt_directory=ckpt_save_dir, besk_ckpt_name="best_acc.ckpt",
metrics_name="acc")
cb += [eval_cb]