Fix chinese api info
This commit is contained in:
parent
3008a3cee8
commit
73bea6d638
|
@ -171,9 +171,9 @@ mindspore.dataset.CLUEDataset
|
|||
- **ValueError** - `task` 参数不为 'AFQMC'、'TNEWS'、'IFLYTEK'、'CMNLI'、'WSC' 或 'CSL'。
|
||||
- **ValueError** - `usage` 参数不为 'train'、'test' 或 'eval'。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards` 。
|
||||
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
|
||||
- **ValueError** - `shard_id` 参数错误(小于0或者大于等于 `num_shards` )。
|
||||
|
||||
**关于CLUE数据集:**
|
||||
|
||||
|
@ -204,5 +204,5 @@ mindspore.dataset.CLUEDataset
|
|||
howpublished = {https://github.com/CLUEbenchmark/CLUE}
|
||||
}
|
||||
|
||||
|
||||
.. include:: mindspore.dataset.api_list_nlp.rst
|
||||
|
||||
.. include:: mindspore.dataset.api_list_nlp.rst
|
||||
|
|
|
@ -29,7 +29,7 @@
|
|||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
|
||||
- **ValueError** - `shard_id` 参数错误(小于0或者大于等于 `num_shards` )。
|
||||
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards` 。
|
||||
|
||||
|
||||
.. include:: mindspore.dataset.api_list_nlp.rst
|
||||
|
||||
.. include:: mindspore.dataset.api_list_nlp.rst
|
||||
|
|
|
@ -29,13 +29,13 @@ mindspore.dataset.Caltech101Dataset
|
|||
|
||||
异常:
|
||||
- **RuntimeError** - `dataset_dir` 路径下不包含任何数据文件。
|
||||
- **ValueError** - `target_type` 参数取值不为'category'、'annotation'或'all'。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
- **RuntimeError** - 同时指定了 `sampler` 和 `shuffle` 参数。
|
||||
- **RuntimeError** - 同时指定了 `sampler` 和 `num_shards` 参数或同时指定了 `sampler` 和 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
|
||||
- **ValueError** - `shard_id` 参数值错误(小于0或者大于等于 `num_shards` )。
|
||||
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards` 。
|
||||
- **ValueError** - `target_type` 参数取值不为'category'、'annotation'或'all'。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
|
||||
.. note:: 此数据集可以指定参数 `sampler` ,但参数 `sampler` 和参数 `shuffle` 的行为是互斥的。下表展示了几种合法的输入参数组合及预期的行为。
|
||||
|
||||
|
@ -110,5 +110,5 @@ mindspore.dataset.Caltech101Dataset
|
|||
url = {http://data.caltech.edu/records/20086},
|
||||
}
|
||||
|
||||
|
||||
.. include:: mindspore.dataset.api_list_vision.rst
|
||||
|
||||
.. include:: mindspore.dataset.api_list_vision.rst
|
||||
|
|
|
@ -20,13 +20,13 @@ mindspore.dataset.Caltech256Dataset
|
|||
|
||||
异常:
|
||||
- **RuntimeError** - `dataset_dir` 路径下不包含任何数据文件。
|
||||
- **ValueError** - `target_type` 参数取值不为'category'、'annotation'或'all'。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
- **RuntimeError** - 同时指定了 `sampler` 和 `shuffle` 参数。
|
||||
- **RuntimeError** - 同时指定了 `sampler` 和 `num_shards` 参数或同时指定了 `sampler` 和 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
|
||||
- **ValueError** - `shard_id` 参数值错误(小于0或者大于等于 `num_shards` )。
|
||||
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards` 。
|
||||
- **ValueError** - `target_type` 参数取值不为'category'、'annotation'或'all'。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
|
||||
.. note:: 此数据集可以指定参数 `sampler` ,但参数 `sampler` 和参数 `shuffle` 的行为是互斥的。下表展示了几种合法的输入参数组合及预期的行为。
|
||||
|
||||
|
@ -97,5 +97,5 @@ mindspore.dataset.Caltech256Dataset
|
|||
year = {2007}
|
||||
}
|
||||
|
||||
|
||||
.. include:: mindspore.dataset.api_list_vision.rst
|
||||
|
||||
.. include:: mindspore.dataset.api_list_vision.rst
|
||||
|
|
|
@ -23,13 +23,13 @@ mindspore.dataset.CelebADataset
|
|||
|
||||
异常:
|
||||
- **RuntimeError** - `dataset_dir` 路径下不包含任何数据文件。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
- **ValueError** - `usage` 参数取值不为'train'、'valid'、'test'或'all'。
|
||||
- **RuntimeError** - 同时指定了 `sampler` 和 `shuffle` 参数。
|
||||
- **RuntimeError** - 同时指定了 `sampler` 和 `num_shards` 参数或同时指定了 `sampler` 和 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
|
||||
- **ValueError** - `shard_id` 参数值错误(小于0或者大于等于 `num_shards` )。
|
||||
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards` 。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
- **ValueError** - `usage` 参数取值不为'train'、'valid'、'test'或'all'。
|
||||
|
||||
.. note:: 此数据集可以指定参数 `sampler` ,但参数 `sampler` 和参数 `shuffle` 的行为是互斥的。下表展示了几种合法的输入参数组合及预期的行为。
|
||||
|
||||
|
@ -120,5 +120,5 @@ mindspore.dataset.CelebADataset
|
|||
howpublished = {http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html}
|
||||
}
|
||||
|
||||
|
||||
.. include:: mindspore.dataset.api_list_vision.rst
|
||||
|
||||
.. include:: mindspore.dataset.api_list_vision.rst
|
||||
|
|
|
@ -21,13 +21,13 @@ mindspore.dataset.Cifar100Dataset
|
|||
|
||||
异常:
|
||||
- **RuntimeError** - `dataset_dir` 路径下不包含数据文件。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
- **ValueError** - `usage` 参数取值不为'train'、'test'或'all'。
|
||||
- **RuntimeError** - 同时指定了 `sampler` 和 `shuffle` 参数。
|
||||
- **RuntimeError** - 同时指定了 `sampler` 和 `num_shards` 参数或同时指定了 `sampler` 和 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
|
||||
- **ValueError** - `shard_id` 参数错误(小于0或者大于等于 `num_shards`)。
|
||||
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards` 。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
- **ValueError** - `usage` 参数取值不为'train'、'test'或'all'。
|
||||
|
||||
.. note:: 此数据集可以指定参数 `sampler` ,但参数 `sampler` 和参数 `shuffle` 的行为是互斥的。下表展示了几种合法的输入参数组合及预期的行为。
|
||||
|
||||
|
@ -84,5 +84,5 @@ mindspore.dataset.Cifar100Dataset
|
|||
howpublished = {http://www.cs.toronto.edu/~kriz/cifar.html}
|
||||
}
|
||||
|
||||
|
||||
.. include:: mindspore.dataset.api_list_vision.rst
|
||||
|
||||
.. include:: mindspore.dataset.api_list_vision.rst
|
||||
|
|
|
@ -21,13 +21,13 @@ mindspore.dataset.Cifar10Dataset
|
|||
|
||||
异常:
|
||||
- **RuntimeError** - `dataset_dir` 路径下不包含数据文件。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
- **ValueError** - `usage` 参数取值不为'train'、'test'或'all'。
|
||||
- **RuntimeError** - 同时指定了 `sampler` 和 `shuffle` 参数。
|
||||
- **RuntimeError** - 同时指定了 `sampler` 和 `num_shards` 参数或同时指定了 `sampler` 和 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
|
||||
- **ValueError** - `shard_id` 参数错误(小于0或者大于等于 `num_shards` )。
|
||||
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards` 。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
- **ValueError** - `usage` 参数取值不为'train'、'test'或'all'。
|
||||
|
||||
.. note:: 此数据集可以指定参数 `sampler` ,但参数 `sampler` 和参数 `shuffle` 的行为是互斥的。下表展示了几种合法的输入参数组合及预期的行为。
|
||||
|
||||
|
@ -88,4 +88,4 @@ mindspore.dataset.Cifar10Dataset
|
|||
howpublished = {http://www.cs.toronto.edu/~kriz/cifar.html}
|
||||
}
|
||||
|
||||
.. include:: mindspore.dataset.api_list_vision.rst
|
||||
.. include:: mindspore.dataset.api_list_vision.rst
|
||||
|
|
|
@ -25,16 +25,16 @@ mindspore.dataset.CityscapesDataset
|
|||
|
||||
异常:
|
||||
- **RuntimeError** - `dataset_dir` 路径下不包含任何数据文件。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
- **RuntimeError** - 同时指定了 `sampler` 和 `shuffle` 参数。
|
||||
- **RuntimeError** - 同时指定了 `sampler` 和 `num_shards` 参数或同时指定了 `sampler` 和 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
- **ValueError** - `dataset_dir` 路径非法或不存在。
|
||||
- **ValueError** - `task` 参数取值不为'instance'、'semantic'、'polygon'或'color'。
|
||||
- **ValueError** - `quality_mode` 参数取值不为'fine'或'coarse'。
|
||||
- **ValueError** - `usage` 参数取值不在给定的字段中。
|
||||
- **ValueError** - `shard_id` 参数值错误(小于0或者大于等于 `num_shards` )。
|
||||
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards` 。
|
||||
|
||||
.. note:: 此数据集可以指定参数 `sampler` ,但参数 `sampler` 和参数 `shuffle` 的行为是互斥的。下表展示了几种合法的输入参数组合及预期的行为。
|
||||
|
||||
|
@ -121,5 +121,5 @@ mindspore.dataset.CityscapesDataset
|
|||
year = {2016}
|
||||
}
|
||||
|
||||
|
||||
.. include:: mindspore.dataset.api_list_vision.rst
|
||||
|
||||
.. include:: mindspore.dataset.api_list_vision.rst
|
||||
|
|
|
@ -67,7 +67,7 @@
|
|||
- **ValueError** - `task` 参数取值不为 `Detection` 、 `Stuff` 、`Panoptic` 或 `Keypoint` 。
|
||||
- **ValueError** - `annotation_file` 参数对应的文件不存在。
|
||||
- **ValueError** - `dataset_dir` 参数路径不存在。
|
||||
- **ValueError** - `shard_id` 参数值错误(小于0或者大于等于 `num_shards` )。
|
||||
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards` 。
|
||||
|
||||
.. note::
|
||||
- 当参数 `extra_metadata` 为True时,还需使用 `rename` 操作删除额外数据列'_meta-filename'的前缀'_meta-',
|
||||
|
@ -151,5 +151,5 @@
|
|||
bibsource = {dblp computer science bibliography, https://dblp.org}
|
||||
}
|
||||
|
||||
|
||||
.. include:: mindspore.dataset.api_list_vision.rst
|
||||
|
||||
.. include:: mindspore.dataset.api_list_vision.rst
|
||||
|
|
|
@ -29,7 +29,7 @@ mindspore.dataset.DBpediaDataset
|
|||
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
- **ValueError** - `shard_id` 参数值错误,小于0或者大于等于 `num_shards` 。
|
||||
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards` 。
|
||||
|
||||
**关于DBpedia数据集:**
|
||||
|
||||
|
|
|
@ -35,7 +35,7 @@ mindspore.dataset.DIV2KDataset
|
|||
- **ValueError** - `scale` 参数取值不在给定的字段中,或与 `downgrade` 参数的值不匹配。
|
||||
- **ValueError** - `scale` 参数取值为8,但 `downgrade` 参数的值不为 'bicubic'。
|
||||
- **ValueError** - `downgrade` 参数取值为'mild'、'difficult'或'wild',但 `scale` 参数的值不为4。
|
||||
- **ValueError** - `shard_id` 参数值错误(小于0或者大于等于 `num_shards` )。
|
||||
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards` 。
|
||||
|
||||
.. note:: 此数据集可以指定参数 `sampler` ,但参数 `sampler` 和参数 `shuffle` 的行为是互斥的。下表展示了几种合法的输入参数组合及预期的行为。
|
||||
|
||||
|
@ -138,5 +138,5 @@ mindspore.dataset.DIV2KDataset
|
|||
year = {2017}
|
||||
}
|
||||
|
||||
|
||||
.. include:: mindspore.dataset.api_list_vision.rst
|
||||
|
||||
.. include:: mindspore.dataset.api_list_vision.rst
|
||||
|
|
|
@ -28,7 +28,7 @@
|
|||
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
|
||||
- **ValueError** - `annotation_file` 参数对应的文件不存在。
|
||||
- **ValueError** - `dataset_dir` 参数路径不存在。
|
||||
- **ValueError** - `shard_id` 参数值错误(小于0或者大于等于 `num_shards` )。
|
||||
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards` 。
|
||||
|
||||
.. note:: 此数据集可以指定参数 `sampler` ,但参数 `sampler` 和参数 `shuffle` 的行为是互斥的。下表展示了几种合法的输入参数组合及预期的行为。
|
||||
|
||||
|
@ -126,5 +126,5 @@
|
|||
bibsource = {dblp computer science bibliography, https://dblp.org}
|
||||
}
|
||||
|
||||
|
||||
.. include:: mindspore.dataset.api_list_vision.rst
|
||||
|
||||
.. include:: mindspore.dataset.api_list_vision.rst
|
||||
|
|
|
@ -29,7 +29,7 @@ mindspore.dataset.Flowers102Dataset
|
|||
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
- **ValueError** - `shard_id` 参数值错误,小于0或者大于等于 `num_shards` 。
|
||||
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards` 。
|
||||
|
||||
.. note:: 此数据集可以指定参数 `sampler` ,但参数 `sampler` 和参数 `shuffle` 的行为是互斥的。下表展示了几种合法的输入参数组合及预期的行为。
|
||||
|
||||
|
|
|
@ -14,9 +14,8 @@
|
|||
- **column_names** (Union[str, list[str]],可选) - 指定数据集生成的列名。默认值:None,不指定。用户可以通过此参数或 `schema` 参数指定列名。
|
||||
- **column_types** (list[mindspore.dtype],可选) - 指定生成数据集各个数据列的数据类型。默认值:None,不指定。
|
||||
如果未指定该参数,则自动推断类型;如果指定了该参数,将在数据输出时做类型匹配检查。
|
||||
- **schema** (Union[Schema, str],可选) - 读取模式策略,用于指定读取数据列的数据类型、数据维度等信息。
|
||||
支持传入JSON文件路径或 mindspore.dataset.Schema 构造的对象。默认值:None,不指定。
|
||||
用户可以通过提供 `column_names` 或 `schema` 指定数据集的列名,但如果同时指定两者,则将优先从 `schema` 中获取列名信息。
|
||||
- **schema** (Union[str, Schema], 可选) - 数据格式策略,用于指定读取数据列的数据类型、数据维度等信息。
|
||||
支持传入JSON文件路径或 mindspore.dataset.Schema 构造的对象。默认值:None。
|
||||
- **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值:None,读取全部样本。
|
||||
- **num_parallel_workers** (int, 可选) - 指定读取数据的工作进程数/线程数(由参数 `python_multiprocessing` 决定当前为多进程模式或多线程模式)。默认值:1。
|
||||
- **shuffle** (bool,可选) - 是否混洗数据集。只有输入的 `source` 参数带有可随机访问属性(`__getitem__`)时,才可以指定该参数。默认值:None。下表中会展示不同配置的预期行为。
|
||||
|
@ -34,7 +33,7 @@
|
|||
- **ValueError** - 同时指定了 `sampler` 和 `num_shards` 参数或同时指定了 `sampler` 和 `shard_id` 参数。
|
||||
- **ValueError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
|
||||
- **ValueError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
|
||||
- **ValueError** - `shard_id` 参数值错误(小于0或者大于等于 `num_shards` )。
|
||||
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards` 。
|
||||
|
||||
.. note::
|
||||
- `source` 参数接收用户自定义的Python函数(PyFuncs),不要将 `mindspore.nn` 和 `mindspore.ops` 目录下或其他的网络计算算子添加
|
||||
|
@ -67,5 +66,5 @@
|
|||
- False
|
||||
- 不允许
|
||||
|
||||
|
||||
.. include:: mindspore.dataset.api_list_nlp.rst
|
||||
|
||||
.. include:: mindspore.dataset.api_list_nlp.rst
|
||||
|
|
|
@ -27,7 +27,7 @@ mindspore.dataset.IMDBDataset
|
|||
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
- **ValueError** - `shard_id` 参数值错误,小于0或者大于等于 `num_shards` 。
|
||||
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards` 。
|
||||
|
||||
.. note:: 此数据集可以指定参数 `sampler` ,但参数 `sampler` 和参数 `shuffle` 的行为是互斥的。下表展示了几种合法的输入参数组合及预期的行为。
|
||||
|
||||
|
|
|
@ -29,7 +29,7 @@ mindspore.dataset.ImageFolderDataset
|
|||
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
|
||||
- **RuntimeError** - `class_indexing` 参数的类型不是dict。
|
||||
- **ValueError** - `shard_id` 参数值错误(小于0或者大于等于 `num_shards` )。
|
||||
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards` 。
|
||||
|
||||
.. note::
|
||||
- 如果 `decode` 参数的值为False,则得到的 `image` 列的shape为[undecoded_image_size],如果为True则 `image` 列的shape为[H,W,C]。
|
||||
|
|
|
@ -30,7 +30,7 @@
|
|||
- **ValueError** - `num_parallel_workers` 参数超过最大线程数。
|
||||
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
|
||||
- **ValueError** - `shard_id` 参数值错误(小于0或者大于等于 `num_shards` )。
|
||||
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards` 。
|
||||
|
||||
.. note:: 此数据集可以指定参数 `sampler` ,但参数 `sampler` 和参数 `shuffle` 的行为是互斥的。下表展示了几种合法的输入参数组合及预期的行为。
|
||||
|
||||
|
@ -60,5 +60,5 @@
|
|||
- False
|
||||
- 不允许
|
||||
|
||||
|
||||
.. include:: mindspore.dataset.api_list_nlp.rst
|
||||
|
||||
.. include:: mindspore.dataset.api_list_nlp.rst
|
||||
|
|
|
@ -27,7 +27,7 @@ mindspore.dataset.MnistDataset
|
|||
- **RuntimeError** - 同时指定了 `sampler` 和 `num_shards` 参数或同时指定了 `sampler` 和 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
|
||||
- **ValueError** - `shard_id` 参数错误(小于0或者大于等于 `num_shards` )。
|
||||
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards` 。
|
||||
|
||||
.. note:: 此数据集可以指定参数 `sampler` ,但参数 `sampler` 和参数 `shuffle` 的行为是互斥的。下表展示了几种合法的输入参数组合及预期的行为。
|
||||
|
||||
|
|
|
@ -54,7 +54,7 @@ mindspore.dataset.NumpySlicesDataset
|
|||
- **ValueError** - 同时指定了 `sampler` 和 `num_shards` 参数或同时指定了 `sampler` 和 `shard_id` 参数。
|
||||
- **ValueError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
|
||||
- **ValueError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
|
||||
- **ValueError** - `shard_id` 参数值错误(小于0或者大于等于 `num_shards` )。
|
||||
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards` 。
|
||||
|
||||
|
||||
.. include:: mindspore.dataset.api_list_nlp.rst
|
||||
|
||||
.. include:: mindspore.dataset.api_list_nlp.rst
|
||||
|
|
|
@ -33,12 +33,12 @@
|
|||
- **ValueError** - `columns_list` 参数无效。
|
||||
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
|
||||
- **ValueError** - `shard_id` 参数值错误(小于0或者大于等于 `num_shards` )。
|
||||
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards` 。
|
||||
|
||||
.. note::
|
||||
- 需要用户提前在云存储上创建同步用的目录,然后通过 `sync_obs_path` 指定。
|
||||
- 如果线下训练,建议为每次训练设置 `BATCH_JOB_ID` 环境变量。
|
||||
- 分布式训练中,假如使用多个节点(服务器),则必须使用每个节点全部的8张卡。如果只有一个节点(服务器),则没有这样的限制。
|
||||
|
||||
|
||||
.. include:: mindspore.dataset.api_list_nlp.rst
|
||||
|
||||
.. include:: mindspore.dataset.api_list_nlp.rst
|
||||
|
|
|
@ -3,7 +3,7 @@ mindspore.dataset.Places365Dataset
|
|||
|
||||
.. py:class:: mindspore.dataset.Places365Dataset(dataset_dir, usage=None, small=True, decode=False, num_samples=None, num_parallel_workers=None, shuffle=None, sampler=None, num_shards=None, shard_id=None, cache=None)
|
||||
|
||||
读取和解析PhotoTour数据集的源数据集。
|
||||
读取和解析Places365数据集的源数据集。
|
||||
|
||||
生成的数据集有两列: `[image, label]`。
|
||||
`image` 列的数据类型为uint8。 `label` 列的数据类型为uint32。
|
||||
|
@ -23,12 +23,12 @@ mindspore.dataset.Places365Dataset
|
|||
|
||||
异常:
|
||||
- **RuntimeError** - `dataset_dir` 路径下不包含数据文件。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
- **RuntimeError** - 同时指定了 `sampler` 和 `shuffle` 参数。
|
||||
- **RuntimeError** - 同时指定了 `sampler` 和 `num_shards` 参数或同时指定了 `sampler` 和 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
|
||||
- **ValueError** - `shard_id` 参数错误(小于0或者大于等于 `num_shards` )。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
- **ValueError** - `shard_id` 参数错误,参数小于0或者大于等于 `num_shards` 。
|
||||
- **ValueError** - `usage` 不是['train-standard', 'train-challenge', 'val']中的任何一个。
|
||||
|
||||
.. note:: 此数据集可以指定参数 `sampler` ,但参数 `sampler` 和参数 `shuffle` 的行为是互斥的。下表展示了几种合法的输入参数组合及预期的行为。
|
||||
|
|
|
@ -21,12 +21,12 @@ mindspore.dataset.QMnistDataset
|
|||
|
||||
异常:
|
||||
- **RuntimeError** - `dataset_dir` 路径下不包含数据文件。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
- **RuntimeError** - 同时指定了 `sampler` 和 `shuffle` 参数。
|
||||
- **RuntimeError** - 同时指定了 `sampler` 和 `num_shards` 参数或同时指定了 `sampler` 和 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
|
||||
- **ValueError** - `shard_id` 参数错误(小于0或者大于等于 `num_shards` )。
|
||||
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards` 。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
|
||||
.. note:: 此数据集可以指定参数 `sampler` ,但参数 `sampler` 和参数 `shuffle` 的行为是互斥的。下表展示了几种合法的输入参数组合及预期的行为。
|
||||
|
||||
|
@ -87,5 +87,5 @@ mindspore.dataset.QMnistDataset
|
|||
publisher = {Curran Associates, Inc.},
|
||||
}
|
||||
|
||||
|
||||
.. include:: mindspore.dataset.api_list_vision.rst
|
||||
|
||||
.. include:: mindspore.dataset.api_list_vision.rst
|
||||
|
|
|
@ -7,9 +7,9 @@ mindspore.dataset.RandomDataset
|
|||
|
||||
参数:
|
||||
- **total_rows** (int, 可选) - 随机生成样本数据的数量。默认值:None,生成随机数量的样本。
|
||||
- **schema** (Union[str, Schema], 可选) - 读取模式策略,用于指定读取数据列的数据类型、数据维度等信息。
|
||||
支持传入JSON文件路径或 mindspore.dataset.Schema 构造的对象。默认值:None,不指定。
|
||||
- **columns_list** (list[str], 可选) - 指定生成数据集的列名。默认值:None,生成的数据列将以"c0","c1","c2" ... "cn"的规则命名。
|
||||
- **schema** (Union[str, Schema], 可选) - 数据格式策略,用于指定读取数据列的数据类型、数据维度等信息。
|
||||
支持传入JSON文件路径或 mindspore.dataset.Schema 构造的对象。默认值:None。
|
||||
- **columns_list** (list[str], 可选) - 指定生成数据集的列名。默认值:None,生成的数据列将以"c0"、"c1"、"c2" ... "cn"的规则命名。
|
||||
- **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值:None,读取所有样本。
|
||||
- **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。
|
||||
- **cache** (DatasetCache, 可选) - 单节点数据缓存服务,用于加快数据集处理,详情请阅读 `单节点数据缓存 <https://www.mindspore.cn/tutorials/experts/zh-CN/master/dataset/cache.html>`_ 。默认值:None,不使用缓存。
|
||||
|
@ -17,5 +17,16 @@ mindspore.dataset.RandomDataset
|
|||
- **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。
|
||||
- **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。
|
||||
|
||||
|
||||
.. include:: mindspore.dataset.api_list_nlp.rst
|
||||
异常:
|
||||
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards` 。
|
||||
- **TypeError** - `total_rows` 的类型不是int。
|
||||
- **TypeError** - `num_shards` 的类型不是int。
|
||||
- **TypeError** - `num_parallel_workers` 的类型不是int。
|
||||
- **TypeError** - `shuffle` 的类型不是bool。
|
||||
- **TypeError** - `columns_list` 的类型不是list。
|
||||
|
||||
|
||||
.. include:: mindspore.dataset.api_list_nlp.rst
|
||||
|
|
|
@ -5,7 +5,7 @@ mindspore.dataset.SBDataset
|
|||
|
||||
读取和解析Semantic Boundaries数据集的源文件构建数据集。
|
||||
|
||||
根据给定的 `task` 配置,生成数据集具有不同的输出列:
|
||||
通过配置 `task` 参数,生成的数据集具有不同的输出列:
|
||||
|
||||
- `task` = 'Boundaries',有两个输出列: `image` 列的数据类型为uint8,`label` 列包含1个的数据类型为uint8的图像。
|
||||
- `task` = 'Segmentation',有两个输出列: `image` 列的数据类型为uint8。 `label` 列包含20个的数据类型为uint8的图像。
|
||||
|
@ -15,7 +15,7 @@ mindspore.dataset.SBDataset
|
|||
- **task** (str, 可选) - 指定读取SB数据集的任务类型,支持'Boundaries'和'Segmentation'。默认值:'Boundaries'。
|
||||
- **usage** (str, 可选) - 指定数据集的子集,可取值为'train'、'val'、'train_noval'和'all'。默认值:'train'。
|
||||
- **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值:None,所有图像样本。
|
||||
- **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。
|
||||
- **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:1,使用mindspore.dataset.config中配置的线程数。
|
||||
- **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None。下表中会展示不同参数配置的预期行为。
|
||||
- **decode** (bool, 可选) - 是否对读取的图片进行解码操作。默认值:False,不解码。
|
||||
- **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值:None。下表中会展示不同配置的预期行为。
|
||||
|
@ -24,15 +24,15 @@ mindspore.dataset.SBDataset
|
|||
|
||||
异常:
|
||||
- **RuntimeError** - `dataset_dir` 路径下不包含任何数据文件。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
- **RuntimeError** - 同时指定了 `sampler` 和 `shuffle` 参数。
|
||||
- **RuntimeError** - 同时指定了 `sampler` 和 `num_shards` 参数或同时指定了 `sampler` 和 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
|
||||
- **ValueError** - `dataset_dir` 不存在。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
- **ValueError** - `task` 不是['Boundaries', 'Segmentation']中的任何一个。
|
||||
- **ValueError** - `usage` 不是['train', 'val', 'train_noval', 'all']中的任何一个。
|
||||
- **ValueError** - `shard_id` 参数值错误(小于0或者大于等于 `num_shards` )。
|
||||
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards` 。
|
||||
|
||||
.. note:: 此数据集可以指定参数 `sampler` ,但参数 `sampler` 和参数 `shuffle` 的行为是互斥的。下表展示了几种合法的输入参数组合及预期的行为。
|
||||
|
||||
|
@ -102,5 +102,5 @@ mindspore.dataset.SBDataset
|
|||
year = "2011",
|
||||
}
|
||||
|
||||
|
||||
.. include:: mindspore.dataset.api_list_vision.rst
|
||||
|
||||
.. include:: mindspore.dataset.api_list_vision.rst
|
||||
|
|
|
@ -9,10 +9,10 @@ mindspore.dataset.SBUDataset
|
|||
|
||||
参数:
|
||||
- **dataset_dir** (str) - 包含数据集文件的根目录的路径。
|
||||
- **decode** (bool, 可选) - 是否对读取的图片进行解码操作。默认值:False,不解码。
|
||||
- **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值:None,所有图像样本。
|
||||
- **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。
|
||||
- **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None。下表中会展示不同参数配置的预期行为。
|
||||
- **decode** (bool, 可选) - 是否对读取的图片进行解码操作。默认值:False,不解码。
|
||||
- **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值:None。下表中会展示不同配置的预期行为。
|
||||
- **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。
|
||||
- **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。
|
||||
|
@ -20,12 +20,12 @@ mindspore.dataset.SBUDataset
|
|||
|
||||
异常:
|
||||
- **RuntimeError** - `dataset_dir` 路径下不包含任何数据文件。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
- **RuntimeError** - 同时指定了 `sampler` 和 `shuffle` 参数。
|
||||
- **RuntimeError** - 同时指定了 `sampler` 和 `num_shards` 参数或同时指定了 `sampler` 和 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
|
||||
- **ValueError** - `shard_id` 参数值错误(小于0或者大于等于 `num_shards` )。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards` 。
|
||||
|
||||
.. note:: 此数据集可以指定参数 `sampler` ,但参数 `sampler` 和参数 `shuffle` 的行为是互斥的。下表展示了几种合法的输入参数组合及预期的行为。
|
||||
|
||||
|
@ -83,5 +83,5 @@ mindspore.dataset.SBUDataset
|
|||
Year = {2011},
|
||||
}
|
||||
|
||||
|
||||
.. include:: mindspore.dataset.api_list_vision.rst
|
||||
|
||||
.. include:: mindspore.dataset.api_list_vision.rst
|
||||
|
|
|
@ -22,13 +22,13 @@ mindspore.dataset.STL10Dataset
|
|||
|
||||
异常:
|
||||
- **RuntimeError** - `dataset_dir` 路径下不包含数据文件。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
- **RuntimeError** - 同时指定了 `sampler` 和 `shuffle` 参数。
|
||||
- **RuntimeError** - 同时指定了 `sampler` 和 `num_shards` 参数或同时指定了 `sampler` 和 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
|
||||
- **ValueError** - `usage` 参数无效。
|
||||
- **ValueError** - `shard_id` 参数错误(小于0或者大于等于 `num_shards` )。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards` 。
|
||||
|
||||
.. note:: 此数据集可以指定参数 `sampler` ,但参数 `sampler` 和参数 `shuffle` 的行为是互斥的。下表展示了几种合法的输入参数组合及预期的行为。
|
||||
|
||||
|
@ -95,5 +95,5 @@ mindspore.dataset.STL10Dataset
|
|||
}
|
||||
}
|
||||
|
||||
|
||||
.. include:: mindspore.dataset.api_list_vision.rst
|
||||
|
||||
.. include:: mindspore.dataset.api_list_vision.rst
|
||||
|
|
|
@ -11,7 +11,7 @@ mindspore.dataset.SVHNDataset
|
|||
- **dataset_dir** (str) - 包含数据集文件的根目录路径。
|
||||
- **usage** (str, 可选) - 指定数据集的子集,可取值为'train'、'test'、'extra'或'all'。默认值:None,读取全部样本图片。
|
||||
- **num_samples** (int, 可选) - 指定从数据集中读取的样本数,可以小于数据集总数。默认值:None,读取全部样本图片。
|
||||
- **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。
|
||||
- **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:1,使用mindspore.dataset.config中配置的线程数。
|
||||
- **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None。下表中会展示不同参数配置的预期行为。
|
||||
- **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值:None。下表中会展示不同配置的预期行为。
|
||||
- **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。
|
||||
|
@ -19,13 +19,13 @@ mindspore.dataset.SVHNDataset
|
|||
|
||||
异常:
|
||||
- **RuntimeError** - `dataset_dir` 路径下不包含数据文件。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
- **RuntimeError** - 同时指定了 `sampler` 和 `shuffle` 参数。
|
||||
- **RuntimeError** - 同时指定了 `sampler` 和 `num_shards` 参数或同时指定了 `sampler` 和 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
|
||||
- **ValueError** - `usage` 参数无效。
|
||||
- **ValueError** - `shard_id` 参数错误(小于0或者大于等于 `num_shards` )。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards` 。
|
||||
|
||||
.. note:: 此数据集可以指定参数 `sampler` ,但参数 `sampler` 和参数 `shuffle` 的行为是互斥的。下表展示了几种合法的输入参数组合及预期的行为。
|
||||
|
||||
|
|
|
@ -19,12 +19,12 @@ mindspore.dataset.SemeionDataset
|
|||
|
||||
异常:
|
||||
- **RuntimeError** - `dataset_dir` 路径下不包含任何数据文件。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
- **RuntimeError** - 同时指定了 `sampler` 和 `shuffle` 参数。
|
||||
- **RuntimeError** - 同时指定了 `sampler` 和 `num_shards` 参数或同时指定了 `sampler` 和 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
|
||||
- **ValueError** - `shard_id` 参数值错误(小于0或者大于等于 `num_shards` )。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards` 。
|
||||
|
||||
.. note:: 此数据集可以指定参数 `sampler` ,但参数 `sampler` 和参数 `shuffle` 的行为是互斥的。下表展示了几种合法的输入参数组合及预期的行为。
|
||||
|
||||
|
@ -77,5 +77,5 @@ mindspore.dataset.SemeionDataset
|
|||
author={M Buscema, MetaNet},
|
||||
}
|
||||
|
||||
|
||||
.. include:: mindspore.dataset.api_list_vision.rst
|
||||
|
||||
.. include:: mindspore.dataset.api_list_vision.rst
|
||||
|
|
|
@ -11,9 +11,8 @@ mindspore.dataset.SogouNewsDataset
|
|||
- **dataset_dir** (str) - 包含数据集文件的根目录路径。
|
||||
- **usage** (str, 可选) - 指定数据集的子集,可取值为'train','test'或'all'。默认值:None,读取全部样本。
|
||||
取值为'train'时将会读取45万个训练样本,取值为'test'时将会读取6万个测试样本,取值为'all'时将会读取全部51万个样本。默认值:None,读取全部样本。
|
||||
- **num_samples** (int, 可选) - 指定从数据集中读取的样本数。
|
||||
- **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。
|
||||
- **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式,支持传入bool类型与枚举类型进行指定。默认值:mindspore.dataset.Shuffle.GLOBAL。
|
||||
- **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值:None, 读取全部样本。
|
||||
- **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式,支持传入bool类型与枚举类型进行指定。默认值: `Shuffle.GLOBAL` 。
|
||||
如果 `shuffle` 为False,则不混洗,如果 `shuffle` 为True,等同于将 `shuffle` 设置为mindspore.dataset.Shuffle.GLOBAL。
|
||||
通过传入枚举变量设置数据混洗的模式:
|
||||
|
||||
|
@ -22,13 +21,14 @@ mindspore.dataset.SogouNewsDataset
|
|||
|
||||
- **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。
|
||||
- **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。
|
||||
- **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。
|
||||
- **cache** (DatasetCache, 可选) - 单节点数据缓存服务,用于加快数据集处理,详情请阅读 `单节点数据缓存 <https://www.mindspore.cn/tutorials/experts/zh-CN/master/dataset/cache.html>`_ 。默认值:None,不使用缓存。
|
||||
|
||||
异常:
|
||||
- **RuntimeError** - `dataset_dir` 参数所指向的文件目录不存在或缺少数据集文件。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
|
||||
**关于SogouNew数据集:**
|
||||
|
||||
|
@ -60,5 +60,5 @@ mindspore.dataset.SogouNewsDataset
|
|||
primaryClass={cs.LG}
|
||||
}
|
||||
|
||||
|
||||
.. include:: mindspore.dataset.api_list_nlp.rst
|
||||
|
||||
.. include:: mindspore.dataset.api_list_nlp.rst
|
||||
|
|
|
@ -22,12 +22,12 @@ mindspore.dataset.SpeechCommandsDataset
|
|||
|
||||
异常:
|
||||
- **RuntimeError** - `dataset_dir` 路径下不包含任何数据文件。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
- **RuntimeError** - 同时指定了 `sampler` 和 `shuffle` 参数。
|
||||
- **RuntimeError** - 同时指定了 `sampler` 和 `num_shards` 参数或同时指定了 `sampler` 和 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
|
||||
- **ValueError** - `shard_id` 参数值错误(小于0或者大于等于 `num_shards` )。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards` 。
|
||||
|
||||
.. note:: 此数据集可以指定参数 `sampler` ,但参数 `sampler` 和参数 `shuffle` 的行为是互斥的。下表展示了几种合法的输入参数组合及预期的行为。
|
||||
|
||||
|
@ -87,5 +87,5 @@ mindspore.dataset.SpeechCommandsDataset
|
|||
year={2018}
|
||||
}
|
||||
|
||||
|
||||
.. include:: mindspore.dataset.api_list_audio.rst
|
||||
|
||||
.. include:: mindspore.dataset.api_list_audio.rst
|
||||
|
|
|
@ -7,8 +7,8 @@ mindspore.dataset.TFRecordDataset
|
|||
|
||||
参数:
|
||||
- **dataset_files** (Union[str, list[str]]) - 数据集文件路径,支持单文件路径字符串、多文件路径字符串列表或可被glob库模式匹配的字符串,文件列表将在内部进行字典排序。
|
||||
- **schema** (Union[str, Schema], 可选) - 读取模式策略,用于指定读取数据列的数据类型、数据维度等信息。
|
||||
支持传入JSON文件路径或 mindspore.dataset.Schema 构造的对象。默认值:None,不指定。
|
||||
- **schema** (Union[str, Schema], 可选) - 数据格式策略,用于指定读取数据列的数据类型、数据维度等信息。
|
||||
支持传入JSON文件路径或 mindspore.dataset.Schema 构造的对象。默认值:None。
|
||||
- **columns_list** (list[str], 可选) - 指定从TFRecord文件中读取的数据列。默认值:None,读取所有列。
|
||||
- **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值:None,读取全部样本。
|
||||
|
||||
|
@ -33,7 +33,7 @@ mindspore.dataset.TFRecordDataset
|
|||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
|
||||
- **ValueError** - `shard_id` 参数值错误(小于0或者大于等于 `num_shards` )。
|
||||
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards` 。
|
||||
|
||||
|
||||
.. include:: mindspore.dataset.api_list_nlp.rst
|
||||
|
||||
.. include:: mindspore.dataset.api_list_nlp.rst
|
||||
|
|
|
@ -25,12 +25,12 @@ mindspore.dataset.TedliumDataset
|
|||
|
||||
异常:
|
||||
- **RuntimeError** - `dataset_dir` 路径下不包含任何数据文件。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
- **RuntimeError** - 同时指定了 `sampler` 和 `shuffle` 参数。
|
||||
- **RuntimeError** - 同时指定了 `sampler` 和 `num_shards` 参数或同时指定了 `sampler` 和 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
|
||||
- **ValueError** - `shard_id` 参数值错误(小于0或者大于等于 `num_shards` )。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards` 。
|
||||
|
||||
.. note:: 此数据集可以指定参数 `sampler` ,但参数 `sampler` 和参数 `shuffle` 的行为是互斥的。下表展示了几种合法的输入参数组合及预期的行为。
|
||||
|
||||
|
@ -149,5 +149,5 @@ mindspore.dataset.TedliumDataset
|
|||
biburl={https://www.openslr.org/51/}
|
||||
}
|
||||
|
||||
|
||||
.. include:: mindspore.dataset.api_list_audio.rst
|
||||
|
||||
.. include:: mindspore.dataset.api_list_audio.rst
|
||||
|
|
|
@ -25,7 +25,7 @@
|
|||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
|
||||
- **ValueError** - `shard_id` 参数值错误(小于0或者大于等于 `num_shards` )。
|
||||
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards` 。
|
||||
|
||||
|
||||
.. include:: mindspore.dataset.api_list_nlp.rst
|
||||
|
||||
.. include:: mindspore.dataset.api_list_nlp.rst
|
||||
|
|
|
@ -10,9 +10,9 @@ mindspore.dataset.UDPOSDataset
|
|||
参数:
|
||||
- **dataset_dir** (str) - 包含数据集文件的根目录路径。
|
||||
- **usage** (str, 可选) - 指定数据集的子集,可取值为'train'、'test'、'valid'或'all'。
|
||||
取值为'train'时将会读取12,543个样本,取值为'test'时将会读取2,077个测试样本,取值为'test'时将会读取9,981个样本,取值为'valid'时将会读取2,002个样本,取值为'all'时将会读取全部16,622个样本。默认值:None,读取全部样本。
|
||||
取值为'train'时将会读取12,543个样本,取值为'test'时将会读取2,077个测试样本,取值为'valid'时将会读取2,002个样本,取值为'all'时将会读取全部16,622个样本。默认值:None,读取全部样本。
|
||||
- **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值:None,读取全部样本。
|
||||
- **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式,支持传入bool类型与枚举类型进行指定。默认值:mindspore.dataset.Shuffle.GLOBAL。
|
||||
- **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式,支持传入bool类型与枚举类型进行指定。默认值: `Shuffle.GLOBAL` 。
|
||||
如果 `shuffle` 为False,则不混洗,如果 `shuffle` 为True,等同于将 `shuffle` 设置为mindspore.dataset.Shuffle.GLOBAL。
|
||||
通过传入枚举变量设置数据混洗的模式:
|
||||
|
||||
|
@ -26,9 +26,27 @@ mindspore.dataset.UDPOSDataset
|
|||
|
||||
异常:
|
||||
- **RuntimeError** - `dataset_dir` 参数所指向的文件目录不存在或缺少数据集文件。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
|
||||
|
||||
.. include:: mindspore.dataset.api_list_nlp.rst
|
||||
**关于UDPOS数据集:**
|
||||
|
||||
UDPOS是一个解析的文本语料库数据集,用于阐明句法或者语义句子结构。
|
||||
该语料库包含254,830个单词和16,622个句子,取自各种网络媒体,包括博客、新闻组、电子邮件和评论。
|
||||
|
||||
**引用:**
|
||||
|
||||
.. code-block::
|
||||
|
||||
@inproceedings{silveira14gold,
|
||||
year = {2014},
|
||||
author = {Natalia Silveira and Timothy Dozat and Marie-Catherine de Marneffe and Samuel Bowman
|
||||
and Miriam Connor and John Bauer and Christopher D. Manning},
|
||||
title = {A Gold Standard Dependency Corpus for {E}nglish},
|
||||
booktitle = {Proceedings of the Ninth International Conference on Language
|
||||
Resources and Evaluation (LREC-2014)}
|
||||
}
|
||||
|
||||
|
||||
.. include:: mindspore.dataset.api_list_nlp.rst
|
||||
|
|
|
@ -3,17 +3,17 @@ mindspore.dataset.USPSDataset
|
|||
|
||||
.. py:class:: mindspore.dataset.USPSDataset(dataset_dir, usage=None, num_samples=None, num_parallel_workers=None, shuffle=Shuffle.GLOBAL, num_shards=None, shard_id=None, cache=None)
|
||||
|
||||
读取和解析UDPOS数据集的源数据集。
|
||||
读取和解析USPS数据集的源数据集。
|
||||
|
||||
生成的数据集有两列: `[image, label]`。 `image` 列的数据类型为uint8。 `label` 列的数据类型为uint32。
|
||||
|
||||
参数:
|
||||
- **dataset_dir** (str) - 包含数据集文件的根目录路径。
|
||||
- **usage** (str, 可选) - 指定数据集的子集,可取值为'train'、'test'、或'all'。
|
||||
取值为'train'时将会读取7,291个样本,取值为'test'时将会读取2,077个测试样本,取值为'test'时将会读取2,007个样本,取值为'all'时将会读取全部9,298个样本。默认值:None,读取全部样本。
|
||||
取值为'train'时将会读取7,291个样本,取值为'test'时将会读取2,007个测试样本,取值为'all'时将会读取全部9,298个样本。默认值:None,读取全部样本。
|
||||
- **num_samples** (int, 可选) - 指定从数据集中读取的样本数。
|
||||
- **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。
|
||||
- **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式,支持传入bool类型与枚举类型进行指定。默认值:mindspore.dataset.Shuffle.GLOBAL。
|
||||
- **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式,支持传入bool类型与枚举类型进行指定。默认值:`Shuffle.GLOBAL` 。
|
||||
如果 `shuffle` 为False,则不混洗,如果 `shuffle` 为True,等同于将 `shuffle` 设置为mindspore.dataset.Shuffle.GLOBAL。
|
||||
通过传入枚举变量设置数据混洗的模式:
|
||||
|
||||
|
@ -26,11 +26,11 @@ mindspore.dataset.USPSDataset
|
|||
|
||||
异常:
|
||||
- **RuntimeError** - `dataset_dir` 路径下不包含数据文件。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
|
||||
- **ValueError** - `usage` 参数无效。
|
||||
- **ValueError** - `shard_id` 参数错误(小于0或者大于等于 `num_shards` )。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards` 。
|
||||
|
||||
**关于USPS数据集:**
|
||||
|
||||
|
@ -61,5 +61,5 @@ mindspore.dataset.USPSDataset
|
|||
publisher={IEEE}
|
||||
}
|
||||
|
||||
|
||||
.. include:: mindspore.dataset.api_list_vision.rst
|
||||
|
||||
.. include:: mindspore.dataset.api_list_vision.rst
|
||||
|
|
|
@ -43,7 +43,7 @@ mindspore.dataset.VOCDataset
|
|||
- **ValueError** - 指定的任务不为'Segmentation'或'Detection'。
|
||||
- **ValueError** - 指定任务为'Segmentation'时, `class_indexing` 参数不为None。
|
||||
- **ValueError** - 与 `usage` 参数相关的txt文件不存在。
|
||||
- **ValueError** - `shard_id` 参数值错误(小于0或者大于等于 `num_shards` )。
|
||||
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards` 。
|
||||
|
||||
.. note::
|
||||
- 当参数 `extra_metadata` 为True时,还需使用 `rename` 操作删除额外数据列'_meta-filename'的前缀'_meta-',
|
||||
|
@ -125,5 +125,5 @@ mindspore.dataset.VOCDataset
|
|||
howpublished = {http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html}
|
||||
}
|
||||
|
||||
|
||||
.. include:: mindspore.dataset.api_list_vision.rst
|
||||
|
||||
.. include:: mindspore.dataset.api_list_vision.rst
|
||||
|
|
|
@ -11,7 +11,7 @@ mindspore.dataset.WIDERFaceDataset
|
|||
参数:
|
||||
- **dataset_dir** (str) - 包含数据集文件的根目录路径。
|
||||
- **usage** (str, 可选) - 指定数据集的子集,可取值为'train'、'test'、'valid'或'all'。
|
||||
取值为'train'时将会读取12,880个样本,取值为'test'时将会读取2,077个测试样本,取值为'test'时将会读取16,097个样本,取值为'valid'时将会读取3,226个样本,取值为'all'时将会读取全部类别样本。默认值:None,读取全部样本。
|
||||
取值为'train'时将会读取12,880个样本,取值为'test'时将会读取16,097个样本,取值为'valid'时将会读取3,226个样本,取值为'all'时将会读取全部类别样本。默认值:None,读取全部样本。
|
||||
- **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值:None,读取全部样本。
|
||||
- **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。
|
||||
- **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None。下表中会展示不同参数配置的预期行为。
|
||||
|
@ -23,13 +23,13 @@ mindspore.dataset.WIDERFaceDataset
|
|||
|
||||
异常:
|
||||
- **RuntimeError** - `dataset_dir` 不包含任何数据文件。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
- **RuntimeError** - 同时指定了 `sampler` 和 `shuffle` 参数。
|
||||
- **RuntimeError** - 同时指定了 `sampler` 和 `num_shards` 参数或同时指定了 `sampler` 和 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
|
||||
- **ValueError** - `shard_id` 参数值错误(小于0或者大于等于 `num_shards` )。
|
||||
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards` 。
|
||||
- **ValueError** - `usage` 不在['train', 'test', 'valid', 'all']中。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
- **ValueError** - `annotation_file` 不存在。
|
||||
- **ValueError** - `dataset_dir` 不存在。
|
||||
|
||||
|
@ -109,5 +109,5 @@ mindspore.dataset.WIDERFaceDataset
|
|||
year={2016},
|
||||
}
|
||||
|
||||
|
||||
.. include:: mindspore.dataset.api_list_vision.rst
|
||||
|
||||
.. include:: mindspore.dataset.api_list_vision.rst
|
||||
|
|
|
@ -9,10 +9,10 @@ mindspore.dataset.WikiTextDataset
|
|||
|
||||
参数:
|
||||
- **dataset_dir** (str) - 包含数据集文件的根目录路径。
|
||||
- **usage** (str, 可选) - 指定数据集的子集,可取值为'train', 'test', 'valid'或'all'。默认值:None,读取全部样本。
|
||||
- **usage** (str, 可选) - 指定数据集的子集,可取值为'train'、'test'、'valid'或'all'。默认值:None,读取全部样本。
|
||||
- **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值:None,读取全部样本。
|
||||
- **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。
|
||||
- **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式,支持传入bool类型与枚举类型进行指定。默认值:mindspore.dataset.Shuffle.GLOBAL。
|
||||
- **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式,支持传入bool类型与枚举类型进行指定。默认值:`Shuffle.GLOBAL` 。
|
||||
如果 `shuffle` 为False,则不混洗,如果 `shuffle` 为True,等同于将 `shuffle` 设置为mindspore.dataset.Shuffle.GLOBAL。
|
||||
通过传入枚举变量设置数据混洗的模式:
|
||||
|
||||
|
@ -22,14 +22,14 @@ mindspore.dataset.WikiTextDataset
|
|||
- **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。
|
||||
- **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。
|
||||
- **cache** (DatasetCache, 可选) - 单节点数据缓存服务,用于加快数据集处理,详情请阅读 `单节点数据缓存 <https://www.mindspore.cn/tutorials/experts/zh-CN/master/dataset/cache.html>`_ 。默认值:None,不使用缓存。
|
||||
|
||||
异常:
|
||||
- **RuntimeError** - `dataset_dir` 参数所指向的文件目录不存在或缺少数据集文件。
|
||||
- **ValueError** - `num_samples` 参数值错误(小于0)。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
|
||||
- **ValueError** - `shard_id` 参数值错误(小于0或者大于等于 `num_shards` )。
|
||||
|
||||
异常:
|
||||
- **RuntimeError** - `dataset_dir` 参数所指向的文件目录不存在或缺少数据集文件。
|
||||
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
|
||||
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards` 。
|
||||
- **ValueError** - `num_samples` 参数值错误,小于0。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
|
||||
**关于WikiText数据集:**
|
||||
|
||||
|
@ -59,5 +59,5 @@ mindspore.dataset.WikiTextDataset
|
|||
year={2016}
|
||||
}
|
||||
|
||||
|
||||
.. include:: mindspore.dataset.api_list_nlp.rst
|
||||
|
||||
.. include:: mindspore.dataset.api_list_nlp.rst
|
||||
|
|
|
@ -9,11 +9,11 @@ mindspore.dataset.YahooAnswersDataset
|
|||
|
||||
参数:
|
||||
- **dataset_dir** (str) - 包含数据集文件的根目录路径。
|
||||
- **usage** (str, 可选) - 指定数据集的子集,可取值为'train', 'test'或'all'。
|
||||
- **usage** (str, 可选) - 指定数据集的子集,可取值为'train'、'test'或'all'。
|
||||
取值为'train'时将会读取1,400,000个训练样本,取值为'test'时将会读取60,000个测试样本,取值为'all'时将会读取全部1,460,000个样本。默认值:None,读取全部样本。
|
||||
- **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值:None,读取全部样本。
|
||||
- **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。
|
||||
- **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式,支持传入bool类型与枚举类型进行指定。默认值:mindspore.dataset.Shuffle.GLOBAL。
|
||||
- **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式,支持传入bool类型与枚举类型进行指定。默认值:`Shuffle.GLOBAL` 。
|
||||
如果 `shuffle` 为False,则不混洗,如果 `shuffle` 为True,等同于将 `shuffle` 设置为mindspore.dataset.Shuffle.GLOBAL。
|
||||
通过传入枚举变量设置数据混洗的模式:
|
||||
|
||||
|
@ -26,10 +26,10 @@ mindspore.dataset.YahooAnswersDataset
|
|||
|
||||
异常:
|
||||
- **RuntimeError** - `dataset_dir` 参数所指向的文件目录不存在或缺少数据集文件。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
|
||||
- **ValueError** - `shard_id` 参数值错误(小于0或者大于等于 `num_shards` )。
|
||||
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards` 。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
|
||||
**关于YahooAnswers数据集:**
|
||||
|
||||
|
@ -59,5 +59,5 @@ mindspore.dataset.YahooAnswersDataset
|
|||
howpublished = {}
|
||||
}
|
||||
|
||||
|
||||
.. include:: mindspore.dataset.api_list_nlp.rst
|
||||
|
||||
.. include:: mindspore.dataset.api_list_nlp.rst
|
||||
|
|
|
@ -13,7 +13,7 @@ mindspore.dataset.YelpReviewDataset
|
|||
对于Polarity数据集,'train'将读取560,000个训练样本,'test'将读取38,000个测试样本,'all'将读取所有598,000个样本。
|
||||
对于Full数据集,'train'将读取650,000个训练样本,'test'将读取50,000个测试样本,'all'将读取所有700,000个样本。默认值:None,读取所有样本。
|
||||
- **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值:None,读取全部样本。
|
||||
- **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式,支持传入bool类型与枚举类型进行指定。默认值:mindspore.dataset.Shuffle.GLOBAL。
|
||||
- **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式,支持传入bool类型与枚举类型进行指定。默认值:`Shuffle.GLOBAL` 。
|
||||
如果 `shuffle` 为False,则不混洗,如果 `shuffle` 为True,等同于将 `shuffle` 设置为mindspore.dataset.Shuffle.GLOBAL。
|
||||
通过传入枚举变量设置数据混洗的模式:
|
||||
|
||||
|
@ -27,9 +27,9 @@ mindspore.dataset.YelpReviewDataset
|
|||
|
||||
异常:
|
||||
- **RuntimeError** - `dataset_dir` 参数所指向的文件目录不存在或缺少数据集文件。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
|
||||
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
|
||||
|
||||
**关于YelpReview数据集:**
|
||||
|
||||
|
@ -88,5 +88,5 @@ mindspore.dataset.YelpReviewDataset
|
|||
year = {2015},
|
||||
}
|
||||
|
||||
|
||||
.. include:: mindspore.dataset.api_list_nlp.rst
|
||||
|
||||
.. include:: mindspore.dataset.api_list_nlp.rst
|
||||
|
|
|
@ -25,7 +25,7 @@ mindspore.dataset.YesNoDataset
|
|||
- **RuntimeError** - 同时指定了 `sampler` 和 `num_shards` 参数或同时指定了 `sampler` 和 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
|
||||
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
|
||||
- **ValueError** - `shard_id` 参数值错误(小于0或者大于等于 `num_shards` )。
|
||||
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards` 。
|
||||
|
||||
.. note:: 此数据集可以指定参数 `sampler` ,但参数 `sampler` 和参数 `shuffle` 的行为是互斥的。下表展示了几种合法的输入参数组合及预期的行为。
|
||||
|
||||
|
@ -79,5 +79,5 @@ mindspore.dataset.YesNoDataset
|
|||
url = "http://wwww.openslr.org/1/"
|
||||
}
|
||||
|
||||
|
||||
.. include:: mindspore.dataset.api_list_audio.rst
|
||||
|
||||
.. include:: mindspore.dataset.api_list_audio.rst
|
||||
|
|
|
@ -11,7 +11,7 @@ mindspore.dataset.vision.AdjustGamma
|
|||
更多详细信息,请参见 `Gamma矫正 <https://en.wikipedia.org/wiki/Gamma_correction>`_ 。
|
||||
|
||||
参数:
|
||||
- **gamma** (float) - 输出图像像素值与输入图像像素值呈指数相关。 `gamma` 大于1使阴影更暗,而 `gamma` 小于1使黑暗区域更亮。
|
||||
- **gamma** (float) - 非负实数。输出图像像素值与输入图像像素值呈指数相关。 `gamma` 大于1使阴影更暗,而 `gamma` 小于1使黑暗区域更亮。
|
||||
- **gain** (float, 可选) - 常数乘数。默认值:1.0。
|
||||
|
||||
异常:
|
||||
|
|
|
@ -11,7 +11,7 @@ mindspore.dataset.vision.PadToSize
|
|||
- **offset** (Union[int, Sequence[int, int]], 可选) - 顶部和左侧要填充的长度。
|
||||
如果输入整型,使用此值填充图像上侧和左侧。
|
||||
如果提供了序列[int, int],则应按[top, left]的顺序排列,填充图像上侧和左侧。
|
||||
默认值:None,表示对称填充。
|
||||
默认值:None,表示对称填充,保持原始图像处于中心位置。
|
||||
- **fill_value** (Union[int, tuple[int, int, int]], 可选) - 填充的像素值,仅在 `padding_mode` 取值为Border.CONSTANT时有效。
|
||||
如果是3元素元组,则分别用于填充R、G、B通道。
|
||||
如果是整数,则用于所有 RGB 通道。
|
||||
|
|
|
@ -3,7 +3,7 @@ mindspore.dataset.vision.RandomAdjustSharpness
|
|||
|
||||
.. py:class:: mindspore.dataset.vision.RandomAdjustSharpness(degree, prob=0.5)
|
||||
|
||||
以给定的概率随机调整输入图像的清晰度。
|
||||
以给定的概率随机调整输入图像的锐度。
|
||||
|
||||
参数:
|
||||
- **degree** (float) - 锐度调整度,必须是非负的。
|
||||
|
|
|
@ -7,7 +7,7 @@ mindspore.dataset.vision.RandomAutoContrast
|
|||
|
||||
参数:
|
||||
- **cutoff** (float, 可选) - 输入图像直方图中最亮和最暗像素的百分比。该值必须在 [0.0, 50.0) 范围内。默认值:0.0。
|
||||
- **ignore** (Union[int, sequence], 可选) - 要忽略的背景像素值,忽略值必须在 [0, 255] 范围内。默认值:None。
|
||||
- **ignore** (Union[int, sequence], 可选) - 要忽略的背景像素值,该值必须在 [0, 255] 范围内。默认值:None。
|
||||
- **prob** (float, 可选) - 图像被调整对比度的概率,取值范围:[0.0, 1.0]。默认值:0.5。
|
||||
|
||||
异常:
|
||||
|
|
|
@ -52,9 +52,9 @@ class CMUArcticDataset(MappableDataset, AudioBaseDataset):
|
|||
num_parallel_workers (int, optional): Number of workers to read the data.
|
||||
Default: None, will use value set in the config.
|
||||
shuffle (bool, optional): Whether or not to perform shuffle on the dataset.
|
||||
Default: None, expected order behavior shown in the table.
|
||||
Default: None, expected order behavior shown in the table below.
|
||||
sampler (Sampler, optional): Object used to choose samples from the
|
||||
dataset. Default: None, expected order behavior shown in the table.
|
||||
dataset. Default: None, expected order behavior shown in the table below.
|
||||
num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None.
|
||||
When this argument is specified, `num_samples` reflects the max sample number of per shard.
|
||||
shard_id (int, optional): The shard ID within `num_shards`. Default: None. This
|
||||
|
@ -188,9 +188,9 @@ class GTZANDataset(MappableDataset, AudioBaseDataset):
|
|||
num_parallel_workers (int, optional): Number of workers to read the data.
|
||||
Default: None, will use value set in the config.
|
||||
shuffle (bool, optional): Whether or not to perform shuffle on the dataset.
|
||||
Default: None, expected order behavior shown in the table.
|
||||
Default: None, expected order behavior shown in the table below.
|
||||
sampler (Sampler, optional): Object used to choose samples from the
|
||||
dataset. Default: None, expected order behavior shown in the table.
|
||||
dataset. Default: None, expected order behavior shown in the table below.
|
||||
num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None.
|
||||
When this argument is specified, `num_samples` reflects the max sample number of per shard.
|
||||
shard_id (int, optional): The shard ID within `num_shards`. Default: None. This
|
||||
|
@ -324,9 +324,9 @@ class LibriTTSDataset(MappableDataset, AudioBaseDataset):
|
|||
num_parallel_workers (int, optional): Number of workers to read the data.
|
||||
Default: None, will use value set in the config.
|
||||
shuffle (bool, optional): Whether or not to perform shuffle on the dataset.
|
||||
Default: None, expected order behavior shown in the table.
|
||||
Default: None, expected order behavior shown in the table below.
|
||||
sampler (Sampler, optional): Object used to choose samples from the
|
||||
dataset. Default: None, expected order behavior shown in the table.
|
||||
dataset. Default: None, expected order behavior shown in the table below.
|
||||
num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None.
|
||||
When this argument is specified, `num_samples` reflects the max sample number of per shard.
|
||||
shard_id (int, optional): The shard ID within `num_shards`. Default: None. This
|
||||
|
@ -610,9 +610,9 @@ class SpeechCommandsDataset(MappableDataset, AudioBaseDataset):
|
|||
num_parallel_workers (int, optional): Number of workers to read the data.
|
||||
Default: None, will use value set in the config.
|
||||
shuffle (bool, optional): Whether or not to perform shuffle on the dataset.
|
||||
Default: None, expected order behavior shown in the table.
|
||||
Default: None, expected order behavior shown in the table below.
|
||||
sampler (Sampler, optional): Object used to choose samples from the dataset.
|
||||
Default: None, expected order behavior shown in the table.
|
||||
Default: None, expected order behavior shown in the table below.
|
||||
num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None.
|
||||
When this argument is specified, `num_samples` reflects the maximum sample number of per shard.
|
||||
shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only be specified
|
||||
|
@ -623,11 +623,11 @@ class SpeechCommandsDataset(MappableDataset, AudioBaseDataset):
|
|||
|
||||
Raises:
|
||||
RuntimeError: If `dataset_dir` does not contain data files.
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
RuntimeError: If `sampler` and `shuffle` are specified at the same time.
|
||||
RuntimeError: If `sampler` and `num_shards`/`shard_id` are specified at the same time.
|
||||
RuntimeError: If `num_shards` is specified but `shard_id` is None.
|
||||
RuntimeError: If `shard_id` is specified but `num_shards` is None.
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
ValueError: If `shard_id` is invalid (< 0 or >= `num_shards`).
|
||||
|
||||
Note:
|
||||
|
@ -743,9 +743,9 @@ class TedliumDataset(MappableDataset, AudioBaseDataset):
|
|||
num_parallel_workers (int, optional): Number of workers to read the data.
|
||||
Default: None, number set in the config.
|
||||
shuffle (bool, optional): Whether to perform shuffle on the dataset. Default: None, expected
|
||||
order behavior shown in the table.
|
||||
order behavior shown in the table below.
|
||||
sampler (Sampler, optional): Object used to choose samples from the
|
||||
dataset. Default: None, expected order behavior shown in the table.
|
||||
dataset. Default: None, expected order behavior shown in the table below.
|
||||
num_shards (int, optional): Number of shards that the dataset will be divided
|
||||
into. Default: None. When this argument is specified, `num_samples` reflects
|
||||
the maximum sample number of per shard.
|
||||
|
@ -757,11 +757,11 @@ class TedliumDataset(MappableDataset, AudioBaseDataset):
|
|||
|
||||
Raises:
|
||||
RuntimeError: If `dataset_dir` does not contain stm files.
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
RuntimeError: If `sampler` and `shuffle` are specified at the same time.
|
||||
RuntimeError: If `sampler` and `num_shards`/`shard_id` are specified at the same time.
|
||||
RuntimeError: If `num_shards` is specified but `shard_id` is None.
|
||||
RuntimeError: If `shard_id` is specified but `num_shards` is None.
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
ValueError: If `shard_id` is invalid (< 0 or >= `num_shards`).
|
||||
|
||||
Note:
|
||||
|
@ -942,9 +942,9 @@ class YesNoDataset(MappableDataset, AudioBaseDataset):
|
|||
num_parallel_workers (int, optional): Number of workers to read the data.
|
||||
Default: None, will use value set in the config.
|
||||
shuffle (bool, optional): Whether or not to perform shuffle on the dataset.
|
||||
Default: None, expected order behavior shown in the table.
|
||||
Default: None, expected order behavior shown in the table below.
|
||||
sampler (Sampler, optional): Object used to choose samples from the
|
||||
dataset. Default: None, expected order behavior shown in the table.
|
||||
dataset. Default: None, expected order behavior shown in the table below.
|
||||
num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None.
|
||||
When this argument is specified, `num_samples` reflects the maximum sample number of per shard.
|
||||
shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only
|
||||
|
|
|
@ -248,8 +248,9 @@ class TFRecordDataset(SourceDataset, UnionBaseDataset):
|
|||
Args:
|
||||
dataset_files (Union[str, list[str]]): String or list of files to be read or glob strings to search for a
|
||||
pattern of files. The list will be sorted in a lexicographical order.
|
||||
schema (Union[str, Schema], optional): Path to the JSON schema file or schema object. Default: None.
|
||||
If the schema is not provided, the meta data from the TFData file is considered the schema.
|
||||
schema (Union[str, Schema], optional): Data format policy, which specifies the data types and shapes of the data
|
||||
column to be read. Both JSON file path and objects constructed by mindspore.dataset.Schema are acceptable.
|
||||
Default: None.
|
||||
columns_list (list[str], optional): List of columns to be read. Default: None, read all columns.
|
||||
num_samples (int, optional): The number of samples (rows) to be included in the dataset. Default: None.
|
||||
If num_samples is None and numRows(parsed from schema) does not exist, read the full dataset;
|
||||
|
|
|
@ -402,9 +402,9 @@ class CLUEDataset(SourceDataset, TextBaseDataset):
|
|||
ValueError: task is not in 'AFQMC', 'TNEWS', 'IFLYTEK', 'CMNLI', 'WSC' or 'CSL'.
|
||||
ValueError: usage is not in 'train', 'test' or 'eval'.
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
ValueError: If `shard_id` is invalid (< 0 or >= `num_shards`).
|
||||
RuntimeError: If `num_shards` is specified but `shard_id` is None.
|
||||
RuntimeError: If `shard_id` is specified but `num_shards` is None.
|
||||
ValueError: If `shard_id` is invalid (< 0 or >= `num_shards`).
|
||||
|
||||
Examples:
|
||||
>>> clue_dataset_dir = ["/path/to/clue_dataset_file"] # contains 1 or multiple clue files
|
||||
|
@ -1118,10 +1118,10 @@ class Multi30kDataset(SourceDataset, TextBaseDataset):
|
|||
num_parallel_workers (int, optional): Number of workers to read the data.
|
||||
Default: None, number set in the config.
|
||||
shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch.
|
||||
Default: `Shuffle.GLOBAL`. Bool type and Shuffle enum are both supported to pass in.
|
||||
If shuffle is False, no shuffling will be performed;
|
||||
If shuffle is True, the behavior is the same as setting shuffle to be Shuffle.GLOBAL
|
||||
Otherwise, there are two levels of shuffling:
|
||||
Bool type and Shuffle enum are both supported to pass in. Default: `Shuffle.GLOBAL` .
|
||||
If shuffle is False, no shuffling will be performed.
|
||||
If shuffle is True, it is equivalent to setting `shuffle` to mindspore.dataset.Shuffle.GLOBAL.
|
||||
Set the mode of data shuffling by passing in enumeration variables:
|
||||
|
||||
- Shuffle.GLOBAL: Shuffle both the files and samples.
|
||||
|
||||
|
@ -1312,10 +1312,10 @@ class SogouNewsDataset(SourceDataset, TextBaseDataset):
|
|||
'all' will read from all 510,000 samples. Default: None, all samples.
|
||||
num_samples (int, optional): Number of samples (rows) to read. Default: None, read all samples.
|
||||
shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch.
|
||||
Default: `Shuffle.GLOBAL`. Bool type and Shuffle enum are both supported to pass in.
|
||||
Bool type and Shuffle enum are both supported to pass in. Default: `Shuffle.GLOBAL` .
|
||||
If shuffle is False, no shuffling will be performed.
|
||||
If shuffle is True, performs global shuffle.
|
||||
There are three levels of shuffling, desired shuffle enum defined by mindspore.dataset.Shuffle.
|
||||
If shuffle is True, it is equivalent to setting `shuffle` to mindspore.dataset.Shuffle.GLOBAL.
|
||||
Set the mode of data shuffling by passing in enumeration variables:
|
||||
|
||||
- Shuffle.GLOBAL: Shuffle both the files and samples, same as setting shuffle to True.
|
||||
|
||||
|
@ -1332,9 +1332,9 @@ class SogouNewsDataset(SourceDataset, TextBaseDataset):
|
|||
|
||||
Raises:
|
||||
RuntimeError: If `dataset_dir` does not contain data files.
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
RuntimeError: If `num_shards` is specified but `shard_id` is None.
|
||||
RuntimeError: If `shard_id` is specified but `num_shards` is None.
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
|
||||
Examples:
|
||||
>>> sogou_news_dataset_dir = "/path/to/sogou_news_dataset_dir"
|
||||
|
@ -1404,10 +1404,10 @@ class SQuADDataset(SourceDataset, TextBaseDataset):
|
|||
num_parallel_workers (int, optional): Number of workers to read the data.
|
||||
Default: None, number set in the config.
|
||||
shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch.
|
||||
Default: `Shuffle.GLOBAL`. Bool type and Shuffle enum are both supported to pass in.
|
||||
If shuffle is False, no shuffling will be performed;
|
||||
If shuffle is True, the behavior is the same as setting shuffle to be Shuffle.GLOBAL
|
||||
Otherwise, there are two levels of shuffling:
|
||||
Bool type and Shuffle enum are both supported to pass in. Default: `Shuffle.GLOBAL` .
|
||||
If shuffle is False, no shuffling will be performed.
|
||||
If shuffle is True, it is equivalent to setting `shuffle` to mindspore.dataset.Shuffle.GLOBAL.
|
||||
Set the mode of data shuffling by passing in enumeration variables:
|
||||
|
||||
- Shuffle.GLOBAL: Shuffle both the files and samples.
|
||||
|
||||
|
@ -1565,10 +1565,10 @@ class UDPOSDataset(SourceDataset, TextBaseDataset):
|
|||
'all' will read from all 16,622 samples. Default: None, all samples.
|
||||
num_samples (int, optional): Number of samples (rows) to read. Default: None, reads the full dataset.
|
||||
shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch.
|
||||
Default: `Shuffle.GLOBAL`. Bool type and Shuffle enum are both supported to pass in.
|
||||
If shuffle is False, no shuffling will be performed;
|
||||
If shuffle is True, the behavior is the same as setting shuffle to be Shuffle.GLOBAL
|
||||
Otherwise, there are two levels of shuffling:
|
||||
Bool type and Shuffle enum are both supported to pass in. Default: `Shuffle.GLOBAL` .
|
||||
If shuffle is False, no shuffling will be performed.
|
||||
If shuffle is True, it is equivalent to setting `shuffle` to mindspore.dataset.Shuffle.GLOBAL.
|
||||
Set the mode of data shuffling by passing in enumeration variables:
|
||||
|
||||
- Shuffle.GLOBAL: Shuffle both the files and samples.
|
||||
|
||||
|
@ -1586,13 +1586,32 @@ class UDPOSDataset(SourceDataset, TextBaseDataset):
|
|||
|
||||
Raises:
|
||||
RuntimeError: If `dataset_dir` does not contain data files.
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
RuntimeError: If `num_shards` is specified but `shard_id` is None.
|
||||
RuntimeError: If `shard_id` is specified but `num_shards` is None.
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
|
||||
Examples:
|
||||
>>> udpos_dataset_dir = "/path/to/udpos_dataset_dir"
|
||||
>>> dataset = ds.UDPOSDataset(dataset_dir=udpos_dataset_dir, usage='all')
|
||||
|
||||
About UDPOS dataset:
|
||||
|
||||
Text corpus dataset that clarifies syntactic or semantic sentence structure.
|
||||
The corpus comprises 254,830 words and 16,622 sentences, taken from various web media including
|
||||
weblogs, newsgroups, emails and reviews.
|
||||
|
||||
Citation:
|
||||
|
||||
.. code-block::
|
||||
|
||||
@inproceedings{silveira14gold,
|
||||
year = {2014},
|
||||
author = {Natalia Silveira and Timothy Dozat and Marie-Catherine de Marneffe and Samuel Bowman
|
||||
and Miriam Connor and John Bauer and Christopher D. Manning},
|
||||
title = {A Gold Standard Dependency Corpus for {E}nglish},
|
||||
booktitle = {Proceedings of the Ninth International Conference on Language
|
||||
Resources and Evaluation (LREC-2014)}
|
||||
}
|
||||
"""
|
||||
|
||||
@check_udpos_dataset
|
||||
|
@ -1622,10 +1641,10 @@ class WikiTextDataset(SourceDataset, TextBaseDataset):
|
|||
num_parallel_workers (int, optional): Number of workers to read the data.
|
||||
Default: None, number set in the config.
|
||||
shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch.
|
||||
Default: `Shuffle.GLOBAL`. Bool type and Shuffle enum are both supported to pass in.
|
||||
If shuffle is False, no shuffling will be performed;
|
||||
If shuffle is True, the behavior is the same as setting shuffle to be Shuffle.GLOBAL
|
||||
Otherwise, there are two levels of shuffling:
|
||||
Bool type and Shuffle enum are both supported to pass in. Default: `Shuffle.GLOBAL` .
|
||||
If shuffle is False, no shuffling will be performed.
|
||||
If shuffle is True, it is equivalent to setting `shuffle` to mindspore.dataset.Shuffle.GLOBAL.
|
||||
Set the mode of data shuffling by passing in enumeration variables:
|
||||
|
||||
- Shuffle.GLOBAL: Shuffle both the files and samples.
|
||||
|
||||
|
@ -1641,11 +1660,11 @@ class WikiTextDataset(SourceDataset, TextBaseDataset):
|
|||
|
||||
Raises:
|
||||
RuntimeError: If `dataset_dir` does not contain data files or invalid.
|
||||
ValueError: If `num_samples` is invalid (< 0).
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
RuntimeError: If `num_shards` is specified but `shard_id` is None.
|
||||
RuntimeError: If `shard_id` is specified but `num_shards` is None.
|
||||
ValueError: If `shard_id` is invalid (< 0 or >= `num_shards`).
|
||||
ValueError: If `num_samples` is invalid (< 0).
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
|
||||
About WikiTextDataset dataset:
|
||||
|
||||
|
@ -1711,10 +1730,10 @@ class YahooAnswersDataset(SourceDataset, TextBaseDataset):
|
|||
num_parallel_workers (int, optional): Number of workers to read the data.
|
||||
Default: None, number set in the config.
|
||||
shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch.
|
||||
Default: `Shuffle.GLOBAL`. Bool type and Shuffle enum are both supported to pass in.
|
||||
If shuffle is False, no shuffling will be performed;
|
||||
If shuffle is True, the behavior is the same as setting shuffle to be Shuffle.GLOBAL
|
||||
Otherwise, there are two levels of shuffling:
|
||||
Bool type and Shuffle enum are both supported to pass in. Default: `Shuffle.GLOBAL` .
|
||||
If shuffle is False, no shuffling will be performed.
|
||||
If shuffle is True, it is equivalent to setting `shuffle` to mindspore.dataset.Shuffle.GLOBAL.
|
||||
Set the mode of data shuffling by passing in enumeration variables:
|
||||
|
||||
- Shuffle.GLOBAL: Shuffle both the files and samples.
|
||||
|
||||
|
@ -1730,10 +1749,10 @@ class YahooAnswersDataset(SourceDataset, TextBaseDataset):
|
|||
|
||||
Raises:
|
||||
RuntimeError: If `dataset_dir` does not contain data files.
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
RuntimeError: If `num_shards` is specified but `shard_id` is None.
|
||||
RuntimeError: If `shard_id` is specified but `num_shards` is None.
|
||||
ValueError: If `shard_id` is invalid (< 0 or >= `num_shards`).
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
|
||||
Examples:
|
||||
>>> yahoo_answers_dataset_dir = "/path/to/yahoo_answers_dataset_directory"
|
||||
|
@ -1804,10 +1823,10 @@ class YelpReviewDataset(SourceDataset, TextBaseDataset):
|
|||
'all' will read from all 700,000 samples. Default: None, all samples.
|
||||
num_samples (int, optional): Number of samples (rows) to read. Default: None, reads all samples.
|
||||
shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch.
|
||||
Default: `Shuffle.GLOBAL`. Bool type and Shuffle enum are both supported to pass in.
|
||||
If shuffle is False, no shuffling will be performed;
|
||||
If shuffle is True, the behavior is the same as setting shuffle to be Shuffle.GLOBAL
|
||||
Otherwise, there are two levels of shuffling:
|
||||
Bool type and Shuffle enum are both supported to pass in. Default: `Shuffle.GLOBAL` .
|
||||
If shuffle is False, no shuffling will be performed.
|
||||
If shuffle is True, it is equivalent to setting `shuffle` to mindspore.dataset.Shuffle.GLOBAL.
|
||||
Set the mode of data shuffling by passing in enumeration variables:
|
||||
|
||||
- Shuffle.GLOBAL: Shuffle both the files and samples.
|
||||
|
||||
|
@ -1824,9 +1843,9 @@ class YelpReviewDataset(SourceDataset, TextBaseDataset):
|
|||
|
||||
Raises:
|
||||
RuntimeError: If `dataset_dir` does not contain data files.
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
RuntimeError: If `num_shards` is specified but `shard_id` is None.
|
||||
RuntimeError: If `shard_id` is specified but `num_shards` is None.
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
|
||||
Examples:
|
||||
>>> yelp_review_dataset_dir = "/path/to/yelp_review_dataset_dir"
|
||||
|
|
|
@ -510,15 +510,16 @@ class GeneratorDataset(MappableDataset, UnionBaseDataset):
|
|||
required to provide either column_names or schema.
|
||||
column_types (list[mindspore.dtype], optional): List of column data types of the dataset. Default: None.
|
||||
If provided, sanity check will be performed on generator output.
|
||||
schema (Union[Schema, str], optional): Path to the JSON schema file or schema object. Default: None. Users are
|
||||
required to provide either column_names or schema. If both are provided, schema will be used.
|
||||
schema (Union[str, Schema], optional): Data format policy, which specifies the data types and shapes of the data
|
||||
column to be read. Both JSON file path and objects constructed by mindspore.dataset.Schema are acceptable.
|
||||
Default: None.
|
||||
num_samples (int, optional): The number of samples to be included in the dataset.
|
||||
Default: None, all images.
|
||||
num_parallel_workers (int, optional): Number of subprocesses used to fetch the dataset in parallel. Default: 1.
|
||||
shuffle (bool, optional): Whether or not to perform shuffle on the dataset. Random accessible input is required.
|
||||
Default: None, expected order behavior shown in the table.
|
||||
Default: None, expected order behavior shown in the table below.
|
||||
sampler (Union[Sampler, Iterable], optional): Object used to choose samples from the dataset. Random accessible
|
||||
input is required. Default: None, expected order behavior shown in the table.
|
||||
input is required. Default: None, expected order behavior shown in the table below.
|
||||
num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None.
|
||||
Random accessible input is required. When this argument is specified, `num_samples` reflects the maximum
|
||||
sample number of per shard.
|
||||
|
@ -844,15 +845,14 @@ class NumpySlicesDataset(GeneratorDataset):
|
|||
otherwise they will be named like column_0, column_1 ...
|
||||
num_samples (int, optional): The number of samples to be included in the dataset. Default: None, all samples.
|
||||
num_parallel_workers (int, optional): Number of subprocesses used to fetch the dataset in parallel. Default: 1.
|
||||
shuffle (bool, optional): Whether or not to perform shuffle on the dataset. Random accessible input is required.
|
||||
Default: None, expected order behavior shown in the table.
|
||||
sampler (Union[Sampler, Iterable], optional): Object used to choose samples from the dataset. Random accessible
|
||||
input is required. Default: None, expected order behavior shown in the table.
|
||||
shuffle (bool, optional): Whether or not to perform shuffle on the dataset.
|
||||
Default: None, expected order behavior shown in the table below.
|
||||
sampler (Union[Sampler, Iterable], optional): Object used to choose samples from the dataset.
|
||||
Default: None, expected order behavior shown in the table below.
|
||||
num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None.
|
||||
Random accessible input is required. When this argument is specified, `num_samples` reflects the max
|
||||
sample number of per shard.
|
||||
When this argument is specified, `num_samples` reflects the max sample number of per shard.
|
||||
shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument must be specified only
|
||||
when num_shards is also specified. Random accessible input is required.
|
||||
when num_shards is also specified.
|
||||
|
||||
Note:
|
||||
- This dataset can take in a `sampler`. `sampler` and `shuffle` are mutually exclusive.
|
||||
|
|
|
@ -132,10 +132,10 @@ class Caltech101Dataset(GeneratorDataset):
|
|||
Default: None, all images.
|
||||
num_parallel_workers (int, optional): Number of workers to read the data. Default: 1.
|
||||
shuffle (bool, optional): Whether or not to perform shuffle on the dataset.
|
||||
Default: None, expected order behavior shown in the table.
|
||||
Default: None, expected order behavior shown in the table below.
|
||||
decode (bool, optional): Whether or not to decode the images after reading. Default: False.
|
||||
sampler (Sampler, optional): Object used to choose samples from the
|
||||
dataset. Default: None, expected order behavior shown in the table.
|
||||
dataset. Default: None, expected order behavior shown in the table below.
|
||||
num_shards (int, optional): Number of shards that the dataset will be divided
|
||||
into. Default: None. When this argument is specified, `num_samples` reflects
|
||||
the maximum sample number of per shard.
|
||||
|
@ -144,13 +144,13 @@ class Caltech101Dataset(GeneratorDataset):
|
|||
|
||||
Raises:
|
||||
RuntimeError: If `dataset_dir` does not contain data files.
|
||||
ValueError: If `target_type` is not set correctly.
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
RuntimeError: If `sampler` and `shuffle` are specified at the same time.
|
||||
RuntimeError: If `sampler` and `num_shards`/`shard_id` are specified at the same time.
|
||||
RuntimeError: If `num_shards` is specified but `shard_id` is None.
|
||||
RuntimeError: If `shard_id` is specified but `num_shards` is None.
|
||||
ValueError: If `shard_id` is invalid (< 0 or >= `num_shards`).
|
||||
ValueError: If `target_type` is not set correctly.
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
|
||||
Note:
|
||||
- This dataset can take in a `sampler`. `sampler` and `shuffle` are mutually exclusive.
|
||||
|
@ -293,10 +293,10 @@ class Caltech256Dataset(MappableDataset, VisionBaseDataset):
|
|||
num_parallel_workers (int, optional): Number of workers to read the data.
|
||||
Default: None, set in the config.
|
||||
shuffle (bool, optional): Whether or not to perform shuffle on the dataset.
|
||||
Default: None, expected order behavior shown in the table.
|
||||
Default: None, expected order behavior shown in the table below.
|
||||
decode (bool, optional): Whether or not to decode the images after reading. Default: False.
|
||||
sampler (Sampler, optional): Object used to choose samples from the
|
||||
dataset. Default: None, expected order behavior shown in the table.
|
||||
dataset. Default: None, expected order behavior shown in the table below.
|
||||
num_shards (int, optional): Number of shards that the dataset will be divided
|
||||
into. Default: None. When this argument is specified, `num_samples` reflects
|
||||
the maximum sample number of per shard.
|
||||
|
@ -308,13 +308,13 @@ class Caltech256Dataset(MappableDataset, VisionBaseDataset):
|
|||
|
||||
Raises:
|
||||
RuntimeError: If `dataset_dir` does not contain data files.
|
||||
ValueError: If `target_type` is not 'category', 'annotation' or 'all'.
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
RuntimeError: If `sampler` and `shuffle` are specified at the same time.
|
||||
RuntimeError: If `sampler` and `num_shards`/`shard_id` are specified at the same time.
|
||||
RuntimeError: If `num_shards` is specified but `shard_id` is None.
|
||||
RuntimeError: If `shard_id` is specified but `num_shards` is None.
|
||||
ValueError: If `shard_id` is invalid (< 0 or >= `num_shards`).
|
||||
ValueError: If `target_type` is not 'category', 'annotation' or 'all'.
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
|
||||
Note:
|
||||
- This dataset can take in a `sampler`. `sampler` and `shuffle` are mutually exclusive.
|
||||
|
@ -440,13 +440,13 @@ class CelebADataset(MappableDataset, VisionBaseDataset):
|
|||
|
||||
Raises:
|
||||
RuntimeError: If `dataset_dir` does not contain data files.
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
ValueError: If `usage` is not 'train', 'valid', 'test' or 'all'.
|
||||
RuntimeError: If `sampler` and `shuffle` are specified at the same time.
|
||||
RuntimeError: If `sampler` and `num_shards`/`shard_id` are specified at the same time.
|
||||
RuntimeError: If `num_shards` is specified but `shard_id` is None.
|
||||
RuntimeError: If `shard_id` is specified but `num_shards` is None.
|
||||
ValueError: If `shard_id` is invalid (< 0 or >= `num_shards`).
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
ValueError: If `usage` is not 'train', 'valid', 'test' or 'all'.
|
||||
|
||||
Note:
|
||||
- This dataset can take in a `sampler`. `sampler` and `shuffle` are mutually exclusive.
|
||||
|
@ -595,9 +595,9 @@ class Cifar10Dataset(MappableDataset, VisionBaseDataset):
|
|||
num_parallel_workers (int, optional): Number of workers to read the data.
|
||||
Default: None, number set in the config.
|
||||
shuffle (bool, optional): Whether to perform shuffle on the dataset. Default: None, expected
|
||||
order behavior shown in the table.
|
||||
order behavior shown in the table below.
|
||||
sampler (Sampler, optional): Object used to choose samples from the
|
||||
dataset. Default: None, expected order behavior shown in the table.
|
||||
dataset. Default: None, expected order behavior shown in the table below.
|
||||
num_shards (int, optional): Number of shards that the dataset will be divided
|
||||
into. Default: None. When this argument is specified, `num_samples` reflects
|
||||
the maximum sample number of per shard.
|
||||
|
@ -609,13 +609,13 @@ class Cifar10Dataset(MappableDataset, VisionBaseDataset):
|
|||
|
||||
Raises:
|
||||
RuntimeError: If `dataset_dir` does not contain data files.
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
ValueError: If `usage` is not 'train', 'test' or 'all'.
|
||||
RuntimeError: If `sampler` and `shuffle` are specified at the same time.
|
||||
RuntimeError: If `sampler` and `num_shards`/`shard_id` are specified at the same time.
|
||||
RuntimeError: If `num_shards` is specified but `shard_id` is None.
|
||||
RuntimeError: If `shard_id` is specified but `num_shards` is None.
|
||||
ValueError: If `shard_id` is invalid (< 0 or >= `num_shards`).
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
ValueError: If `usage` is not 'train', 'test' or 'all'.
|
||||
|
||||
Note:
|
||||
- This dataset can take in a `sampler`. `sampler` and `shuffle` are mutually exclusive.
|
||||
|
@ -727,9 +727,9 @@ class Cifar100Dataset(MappableDataset, VisionBaseDataset):
|
|||
num_parallel_workers (int, optional): Number of workers to read the data.
|
||||
Default: None, number set in the config.
|
||||
shuffle (bool, optional): Whether to perform shuffle on the dataset. Default: None, expected
|
||||
order behavior shown in the table.
|
||||
order behavior shown in the table below.
|
||||
sampler (Sampler, optional): Object used to choose samples from the
|
||||
dataset. Default: None, expected order behavior shown in the table.
|
||||
dataset. Default: None, expected order behavior shown in the table below.
|
||||
num_shards (int, optional): Number of shards that the dataset will be divided
|
||||
into. Default: None. When this argument is specified, 'num_samples' reflects
|
||||
the maximum sample number of per shard.
|
||||
|
@ -741,13 +741,13 @@ class Cifar100Dataset(MappableDataset, VisionBaseDataset):
|
|||
|
||||
Raises:
|
||||
RuntimeError: If `dataset_dir` does not contain data files.
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
ValueError: If `usage` is not 'train', 'test' or 'all'.
|
||||
RuntimeError: If `sampler` and `shuffle` are specified at the same time.
|
||||
RuntimeError: If `sampler` and `num_shards`/`shard_id` are specified at the same time.
|
||||
RuntimeError: If `num_shards` is specified but `shard_id` is None.
|
||||
RuntimeError: If `shard_id` is specified but `num_shards` is None.
|
||||
ValueError: If `shard_id` is invalid (< 0 or >= `num_shards`).
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
ValueError: If `usage` is not 'train', 'test' or 'all'.
|
||||
|
||||
Note:
|
||||
- This dataset can take in a `sampler`. `sampler` and `shuffle` are mutually exclusive.
|
||||
|
@ -856,10 +856,10 @@ class CityscapesDataset(MappableDataset, VisionBaseDataset):
|
|||
num_parallel_workers (int, optional): Number of workers to read the data.
|
||||
Default: None, number set in the config.
|
||||
shuffle (bool, optional): Whether to perform shuffle on the dataset. Default: None, expected
|
||||
order behavior shown in the table.
|
||||
order behavior shown in the table below.
|
||||
decode (bool, optional): Decode the images after reading. Default: False.
|
||||
sampler (Sampler, optional): Object used to choose samples from the
|
||||
dataset. Default: None, expected order behavior shown in the table.
|
||||
dataset. Default: None, expected order behavior shown in the table below.
|
||||
num_shards (int, optional): Number of shards that the dataset will be divided
|
||||
into. Default: None. When this argument is specified, `num_samples` reflects
|
||||
the max sample number of per shard.
|
||||
|
@ -871,11 +871,11 @@ class CityscapesDataset(MappableDataset, VisionBaseDataset):
|
|||
|
||||
Raises:
|
||||
RuntimeError: If `dataset_dir` is invalid or does not contain data files.
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
RuntimeError: If `sampler` and `shuffle` are specified at the same time.
|
||||
RuntimeError: If `sampler` and `num_shards`/`shard_id` are specified at the same time.
|
||||
RuntimeError: If `num_shards` is specified but `shard_id` is None.
|
||||
RuntimeError: If `shard_id` is specified but `num_shards` is None.
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
ValueError: If `dataset_dir` is not exist.
|
||||
ValueError: If `task` is invalid.
|
||||
ValueError: If `quality_mode` is invalid.
|
||||
|
@ -1024,10 +1024,10 @@ class CocoDataset(MappableDataset, VisionBaseDataset):
|
|||
num_parallel_workers (int, optional): Number of workers to read the data.
|
||||
Default: None, number set in the configuration file.
|
||||
shuffle (bool, optional): Whether to perform shuffle on the dataset. Default: None, expected
|
||||
order behavior shown in the table.
|
||||
order behavior shown in the table below.
|
||||
decode (bool, optional): Decode the images after reading. Default: False.
|
||||
sampler (Sampler, optional): Object used to choose samples from the dataset.
|
||||
Default: None, expected order behavior shown in the table.
|
||||
Default: None, expected order behavior shown in the table below.
|
||||
num_shards (int, optional): Number of shards that the dataset will be divided
|
||||
into. Default: None. When this argument is specified, `num_samples` reflects
|
||||
the maximum sample number of per shard.
|
||||
|
@ -1273,10 +1273,10 @@ class DIV2KDataset(MappableDataset, VisionBaseDataset):
|
|||
num_parallel_workers (int, optional): Number of workers to read the data.
|
||||
Default: None, number set in the config.
|
||||
shuffle (bool, optional): Whether to perform shuffle on the dataset. Default: None, expected
|
||||
order behavior shown in the table.
|
||||
order behavior shown in the table below.
|
||||
decode (bool, optional): Decode the images after reading. Default: False.
|
||||
sampler (Sampler, optional): Object used to choose samples from the
|
||||
dataset. Default: None, expected order behavior shown in the table.
|
||||
dataset. Default: None, expected order behavior shown in the table below.
|
||||
num_shards (int, optional): Number of shards that the dataset will be divided
|
||||
into. Default: None. When this argument is specified, `num_samples` reflects
|
||||
the max sample number of per shard.
|
||||
|
@ -1803,10 +1803,10 @@ class FlickrDataset(MappableDataset, VisionBaseDataset):
|
|||
num_parallel_workers (int, optional): Number of workers to read the data.
|
||||
Default: None, number set in the config.
|
||||
shuffle (bool, optional): Whether to perform shuffle on the dataset. Default: None, expected
|
||||
order behavior shown in the table.
|
||||
order behavior shown in the table below.
|
||||
decode (bool, optional): Decode the images after reading. Default: None.
|
||||
sampler (Sampler, optional): Object used to choose samples from the
|
||||
dataset. Default: None, expected order behavior shown in the table.
|
||||
dataset. Default: None, expected order behavior shown in the table below.
|
||||
num_shards (int, optional): Number of shards that the dataset will be divided
|
||||
into. Default: None. When this argument is specified, `num_samples` reflects
|
||||
the max sample number of per shard.
|
||||
|
@ -2208,9 +2208,9 @@ class ImageFolderDataset(MappableDataset, VisionBaseDataset):
|
|||
num_parallel_workers (int, optional): Number of workers to read the data.
|
||||
Default: None, set in the config.
|
||||
shuffle (bool, optional): Whether or not to perform shuffle on the dataset.
|
||||
Default: None, expected order behavior shown in the table.
|
||||
Default: None, expected order behavior shown in the table below.
|
||||
sampler (Sampler, optional): Object used to choose samples from the
|
||||
dataset. Default: None, expected order behavior shown in the table.
|
||||
dataset. Default: None, expected order behavior shown in the table below.
|
||||
extensions (list[str], optional): List of file extensions to be
|
||||
included in the dataset. Default: None.
|
||||
class_indexing (dict, optional): A str-to-int mapping from folder name to index
|
||||
|
@ -2354,10 +2354,10 @@ class KITTIDataset(MappableDataset):
|
|||
num_parallel_workers (int, optional): Number of workers to read the data.
|
||||
Default: None, number set in the config.
|
||||
shuffle (bool, optional): Whether to perform shuffle on the dataset. Default: None, expected
|
||||
order behavior shown in the table.
|
||||
order behavior shown in the table below.
|
||||
decode (bool, optional): Decode the images after reading. Default: False.
|
||||
sampler (Sampler, optional): Object used to choose samples from the dataset.
|
||||
Default: None, expected order behavior shown in the table.
|
||||
Default: None, expected order behavior shown in the table below.
|
||||
num_shards (int, optional): Number of shards that the dataset will be divided
|
||||
into. Default: None. When this argument is specified, 'num_samples' reflects
|
||||
the max sample number of per shard.
|
||||
|
@ -2621,10 +2621,10 @@ class LFWDataset(MappableDataset, VisionBaseDataset):
|
|||
num_parallel_workers (int, optional): Number of workers to read the data.
|
||||
Default: None, set in the config.
|
||||
shuffle (bool, optional): Whether or not to perform shuffle on the dataset.
|
||||
Default: None, expected order behavior shown in the table.
|
||||
Default: None, expected order behavior shown in the table below.
|
||||
decode (bool, optional): Decode the images after reading. Default: False.
|
||||
sampler (Sampler, optional): Object used to choose samples from the
|
||||
dataset. Default: None, expected order behavior shown in the table.
|
||||
dataset. Default: None, expected order behavior shown in the table below.
|
||||
num_shards (int, optional): Number of shards that the dataset will be divided
|
||||
into. Default: None. When this argument is specified, 'num_samples' reflects
|
||||
the max sample number of per shard.
|
||||
|
@ -2771,10 +2771,10 @@ class LSUNDataset(MappableDataset, VisionBaseDataset):
|
|||
num_parallel_workers (int, optional): Number of workers to read the data.
|
||||
Default: None, set in the config.
|
||||
shuffle (bool, optional): Whether or not to perform shuffle on the dataset.
|
||||
Default: None, expected order behavior shown in the table.
|
||||
Default: None, expected order behavior shown in the table below.
|
||||
decode (bool, optional): Decode the images after reading. Default: False.
|
||||
sampler (Sampler, optional): Object used to choose samples from the
|
||||
dataset. Default: None, expected order behavior shown in the table.
|
||||
dataset. Default: None, expected order behavior shown in the table below.
|
||||
num_shards (int, optional): Number of shards that the dataset will be divided
|
||||
into. Default: None. When this argument is specified, 'num_samples' reflects
|
||||
the max sample number of per shard.
|
||||
|
@ -2899,9 +2899,9 @@ class ManifestDataset(MappableDataset, VisionBaseDataset):
|
|||
num_parallel_workers (int, optional): Number of workers to read the data.
|
||||
Default: None, will use value set in the config.
|
||||
shuffle (bool, optional): Whether to perform shuffle on the dataset. Default: None, expected
|
||||
order behavior shown in the table.
|
||||
order behavior shown in the table below.
|
||||
sampler (Sampler, optional): Object used to choose samples from the
|
||||
dataset. Default: None, expected order behavior shown in the table.
|
||||
dataset. Default: None, expected order behavior shown in the table below.
|
||||
class_indexing (dict, optional): A str-to-int mapping from label name to index.
|
||||
Default: None, the folder names will be sorted alphabetically and each
|
||||
class will be given a unique index starting from 0.
|
||||
|
@ -3021,9 +3021,9 @@ class MnistDataset(MappableDataset, VisionBaseDataset):
|
|||
num_parallel_workers (int, optional): Number of workers to read the data.
|
||||
Default: None, will use value set in the config.
|
||||
shuffle (bool, optional): Whether or not to perform shuffle on the dataset.
|
||||
Default: None, expected order behavior shown in the table.
|
||||
Default: None, expected order behavior shown in the table below.
|
||||
sampler (Sampler, optional): Object used to choose samples from the
|
||||
dataset. Default: None, expected order behavior shown in the table.
|
||||
dataset. Default: None, expected order behavior shown in the table below.
|
||||
num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None.
|
||||
When this argument is specified, `num_samples` reflects the maximum sample number of per shard.
|
||||
shard_id (int, optional): The shard ID within `num_shards`. Default: None. This
|
||||
|
@ -3142,10 +3142,10 @@ class OmniglotDataset(MappableDataset):
|
|||
num_parallel_workers (int, optional): Number of workers to read the data.
|
||||
Default: None, set in the config.
|
||||
shuffle (bool, optional): Whether or not to perform shuffle on the dataset.
|
||||
Default: None, expected order behavior shown in the table.
|
||||
Default: None, expected order behavior shown in the table below.
|
||||
decode (bool, optional): Decode the images after reading. Default: False.
|
||||
sampler (Sampler, optional): Object used to choose samples from the
|
||||
dataset. Default: None, expected order behavior shown in the table.
|
||||
dataset. Default: None, expected order behavior shown in the table below.
|
||||
num_shards (int, optional): Number of shards that the dataset will be divided
|
||||
into. Default: None. When this argument is specified, 'num_samples' reflects
|
||||
the max sample number of per shard.
|
||||
|
@ -3414,22 +3414,22 @@ class Places365Dataset(MappableDataset, VisionBaseDataset):
|
|||
|
||||
The generated dataset has two columns :py:obj:`[image, label]`.
|
||||
The tensor of column :py:obj:`image` is of the uint8 type.
|
||||
The tensor of column :py:obj:`label` is a scalar of the uint32 type.
|
||||
The tensor of column :py:obj:`label` is of the uint32 type.
|
||||
|
||||
Args:
|
||||
dataset_dir (str): Path to the root directory that contains the dataset.
|
||||
usage (str, optional): Usage of this dataset, can be 'train-standard', 'train-challenge' or 'val'.
|
||||
Default: None, will be set to 'train-standard'.
|
||||
small (bool, optional): Use 256 * 256 images (True) or high resolution images (False). Default: False.
|
||||
decode (bool, optional): Decode the images after reading. Default: True.
|
||||
decode (bool, optional): Decode the images after reading. Default: False.
|
||||
num_samples (int, optional): The number of images to be included in the dataset.
|
||||
Default: None, will read all images.
|
||||
num_parallel_workers (int, optional): Number of workers to read the data.
|
||||
Default: None, will use value set in the config.
|
||||
shuffle (bool, optional): Whether or not to perform shuffle on the dataset.
|
||||
Default: None, expected order behavior shown in the table.
|
||||
Default: None, expected order behavior shown in the table below.
|
||||
sampler (Sampler, optional): Object used to choose samples from the
|
||||
dataset. Default: None, expected order behavior shown in the table.
|
||||
dataset. Default: None, expected order behavior shown in the table below.
|
||||
num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None.
|
||||
When this argument is specified, `num_samples` reflects the max sample number of per shard.
|
||||
shard_id (int, optional): The shard ID within `num_shards`. Default: None. This
|
||||
|
@ -3440,11 +3440,11 @@ class Places365Dataset(MappableDataset, VisionBaseDataset):
|
|||
|
||||
Raises:
|
||||
RuntimeError: If `dataset_dir` does not contain data files.
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
RuntimeError: If `sampler` and `shuffle` are specified at the same time.
|
||||
RuntimeError: If `sampler` and `num_shards`/`shard_id` are specified at the same time.
|
||||
RuntimeError: If `num_shards` is specified but `shard_id` is None.
|
||||
RuntimeError: If `shard_id` is specified but `num_shards` is None.
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
ValueError: If `shard_id` is invalid (< 0 or >= `num_shards`).
|
||||
ValueError: If `usage` is not in ["train-standard", "train-challenge", "val"].
|
||||
|
||||
|
@ -3556,7 +3556,7 @@ class QMnistDataset(MappableDataset, VisionBaseDataset):
|
|||
|
||||
The generated dataset has two columns :py:obj:`[image, label]`.
|
||||
The tensor of column :py:obj:`image` is of the uint8 type.
|
||||
The tensor of column :py:obj:`label` is a scalar when `compat` is True else a tensor both of the uint32 type.
|
||||
The tensor of column :py:obj:`label` is of the uint32 type.
|
||||
|
||||
Args:
|
||||
dataset_dir (str): Path to the root directory that contains the dataset.
|
||||
|
@ -3569,9 +3569,9 @@ class QMnistDataset(MappableDataset, VisionBaseDataset):
|
|||
num_parallel_workers (int, optional): Number of workers to read the data.
|
||||
Default: None, will use value set in the config.
|
||||
shuffle (bool, optional): Whether or not to perform shuffle on the dataset.
|
||||
Default: None, expected order behavior shown in the table.
|
||||
Default: None, expected order behavior shown in the table below.
|
||||
sampler (Sampler, optional): Object used to choose samples from the
|
||||
dataset. Default: None, expected order behavior shown in the table.
|
||||
dataset. Default: None, expected order behavior shown in the table below.
|
||||
num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None.
|
||||
When this argument is specified, `num_samples` reflects the maximum sample number of per shard.
|
||||
shard_id (int, optional): The shard ID within `num_shards`. Default: None. This
|
||||
|
@ -3582,12 +3582,12 @@ class QMnistDataset(MappableDataset, VisionBaseDataset):
|
|||
|
||||
Raises:
|
||||
RuntimeError: If `dataset_dir` does not contain data files.
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
RuntimeError: If `sampler` and `shuffle` are specified at the same time.
|
||||
RuntimeError: If `sampler` and `num_shards`/`shard_id` are specified at the same time.
|
||||
RuntimeError: If `num_shards` is specified but `shard_id` is None.
|
||||
RuntimeError: If `shard_id` is specified but `num_shards` is None.
|
||||
ValueError: If `shard_id` is invalid (< 0 or >= `num_shards`).
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
|
||||
Note:
|
||||
- This dataset can take in a `sampler`. `sampler` and `shuffle` are mutually exclusive.
|
||||
|
@ -3683,8 +3683,9 @@ class RandomDataset(SourceDataset, VisionBaseDataset):
|
|||
Args:
|
||||
total_rows (int, optional): Number of samples for the dataset to generate.
|
||||
Default: None, number of samples is random.
|
||||
schema (Union[str, Schema], optional): Path to the JSON schema file or schema object. Default: None.
|
||||
If the schema is not provided, the random dataset generates a random schema.
|
||||
schema (Union[str, Schema], optional): Data format policy, which specifies the data types and shapes of the data
|
||||
column to be read. Both JSON file path and objects constructed by mindspore.dataset.Schema are acceptable.
|
||||
Default: None.
|
||||
columns_list (list[str], optional): List of column names of the dataset.
|
||||
Default: None, the columns will be named like this "c0", "c1", "c2" etc.
|
||||
num_samples (int, optional): The number of samples to be included in the dataset.
|
||||
|
@ -3695,12 +3696,33 @@ class RandomDataset(SourceDataset, VisionBaseDataset):
|
|||
`Single-Node Data Cache <https://www.mindspore.cn/tutorials/experts/en/master/dataset/cache.html>`_ .
|
||||
Default: None, which means no cache is used.
|
||||
shuffle (bool, optional): Whether or not to perform shuffle on the dataset.
|
||||
Default: None, expected order behavior shown in the table.
|
||||
Default: None, expected order behavior shown in the table below.
|
||||
num_shards (int, optional): Number of shards that the dataset will be divided
|
||||
into. Default: None. When this argument is specified, 'num_samples' reflects
|
||||
the maximum sample number of per shard.
|
||||
shard_id (int, optional): The shard ID within `num_shards`. Default: None. This
|
||||
argument can only be specified when `num_shards` is also specified.
|
||||
|
||||
Raises:
|
||||
RuntimeError: If `num_shards` is specified but `shard_id` is None.
|
||||
RuntimeError: If `shard_id` is specified but `num_shards` is None.
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
ValueError: If `shard_id` is invalid (< 0 or >= `num_shards`).
|
||||
TypeError: If `total_rows` is not of type int.
|
||||
TypeError: If `num_shards` is not of type int.
|
||||
TypeError: If `num_parallel_workers` is not of type int.
|
||||
TypeError: If `shuffle` is not of type bool.
|
||||
TypeError: If `columns_list` is not of type list.
|
||||
|
||||
Examples:
|
||||
>>> from mindspore import dtype as mstype
|
||||
>>> import mindspore.dataset as ds
|
||||
>>>
|
||||
>>> schema = ds.Schema()
|
||||
>>> schema.add_column('image', de_type=mstype.uint8, shape=[2])
|
||||
>>> schema.add_column('label', de_type=mstype.uint8, shape=[1])
|
||||
>>># apply dataset operations
|
||||
>>> ds1 = ds.RandomDataset(schema=schema, total_rows=50, num_parallel_workers=4)
|
||||
"""
|
||||
|
||||
@check_random_dataset
|
||||
|
@ -3788,11 +3810,12 @@ class SBDataset(GeneratorDataset):
|
|||
"""
|
||||
A source dataset that reads and parses Semantic Boundaries Dataset.
|
||||
|
||||
The generated dataset has two columns: :py:obj:`[image, task]`.
|
||||
By configuring the 'Task' parameter, the generated dataset has different output columns.
|
||||
|
||||
- The tensor of column :py:obj:`image` is of the uint8 type.
|
||||
- The tensor of column :py:obj:`task` contains 20 images of the uint8 type if `task` is 'Boundaries' otherwise
|
||||
contains 1 image of the uint8 type.
|
||||
- 'task' = 'Boundaries' , there are two output columns: the 'image' column has the data type uint8 and
|
||||
the 'label' column contains one image of the data type uint8.
|
||||
- 'task' = 'Segmentation' , there are two output columns: the 'image' column has the data type uint8 and
|
||||
the 'label' column contains 20 images of the data type uint8.
|
||||
|
||||
Args:
|
||||
dataset_dir (str): Path to the root directory that contains the dataset.
|
||||
|
@ -3800,13 +3823,12 @@ class SBDataset(GeneratorDataset):
|
|||
usage (str, optional): Acceptable usages include 'train', 'val', 'train_noval' and 'all'. Default: 'all'.
|
||||
num_samples (int, optional): The number of images to be included in the dataset.
|
||||
Default: None, all images.
|
||||
num_parallel_workers (int, optional): Number of workers to read the data.
|
||||
Default: None, number set in the config.
|
||||
num_parallel_workers (int, optional): Number of workers to read the data. Default: 1, number set in the config.
|
||||
shuffle (bool, optional): Whether to perform shuffle on the dataset. Default: None, expected
|
||||
order behavior shown in the table.
|
||||
order behavior shown in the table below.
|
||||
decode (bool, optional): Decode the images after reading. Default: None.
|
||||
sampler (Sampler, optional): Object used to choose samples from the
|
||||
dataset. Default: None, expected order behavior shown in the table.
|
||||
dataset. Default: None, expected order behavior shown in the table below.
|
||||
num_shards (int, optional): Number of shards that the dataset will be divided
|
||||
into. Default: None. When this argument is specified, `num_samples` reflects
|
||||
the max sample number of per shard.
|
||||
|
@ -3815,12 +3837,12 @@ class SBDataset(GeneratorDataset):
|
|||
|
||||
Raises:
|
||||
RuntimeError: If `dataset_dir` is not valid or does not contain data files.
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
RuntimeError: If `sampler` and `shuffle` are specified at the same time.
|
||||
RuntimeError: If `sampler` and `num_shards`/`shard_id` are specified at the same time.
|
||||
RuntimeError: If `num_shards` is specified but `shard_id` is None.
|
||||
RuntimeError: If `shard_id` is specified but `num_shards` is None.
|
||||
ValueError: If `dataset_dir` is not exist.
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
ValueError: If `task` is not in ['Boundaries', 'Segmentation'].
|
||||
ValueError: If `usage` is not in ['train', 'val', 'train_noval', 'all'].
|
||||
ValueError: If `shard_id` is invalid (< 0 or >= `num_shards`).
|
||||
|
@ -3928,15 +3950,15 @@ class SBUDataset(MappableDataset, VisionBaseDataset):
|
|||
|
||||
Args:
|
||||
dataset_dir (str): Path to the root directory that contains the dataset.
|
||||
decode (bool, optional): Decode the images after reading. Default: False.
|
||||
num_samples (int, optional): The number of images to be included in the dataset.
|
||||
Default: None, will read all images.
|
||||
num_parallel_workers (int, optional): Number of workers to read the data.
|
||||
Default: None, will use value set in the config.
|
||||
shuffle (bool, optional): Whether or not to perform shuffle on the dataset.
|
||||
Default: None, expected order behavior shown in the table.
|
||||
Default: None, expected order behavior shown in the table below.
|
||||
decode (bool, optional): Decode the images after reading. Default: False.
|
||||
sampler (Sampler, optional): Object used to choose samples from the
|
||||
dataset. Default: None, expected order behavior shown in the table.
|
||||
dataset. Default: None, expected order behavior shown in the table below.
|
||||
num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None.
|
||||
When this argument is specified, `num_samples` reflects the max sample number of per shard.
|
||||
shard_id (int, optional): The shard ID within `num_shards`. Default: None. This
|
||||
|
@ -3947,11 +3969,11 @@ class SBUDataset(MappableDataset, VisionBaseDataset):
|
|||
|
||||
Raises:
|
||||
RuntimeError: If `dataset_dir` does not contain data files.
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
RuntimeError: If `sampler` and `shuffle` are specified at the same time.
|
||||
RuntimeError: If `sampler` and `num_shards`/`shard_id` are specified at the same time.
|
||||
RuntimeError: If `num_shards` is specified but `shard_id` is None.
|
||||
RuntimeError: If `shard_id` is specified but `num_shards` is None.
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
ValueError: If `shard_id` is invalid (< 0 or >= `num_shards`).
|
||||
|
||||
Note:
|
||||
|
@ -4048,9 +4070,9 @@ class SemeionDataset(MappableDataset, VisionBaseDataset):
|
|||
num_parallel_workers (int, optional): Number of workers to read the data.
|
||||
Default: None, number set in the config.
|
||||
shuffle (bool, optional): Whether to perform shuffle on the dataset. Default: None, expected
|
||||
order behavior shown in the table.
|
||||
order behavior shown in the table below.
|
||||
sampler (Sampler, optional): Object used to choose samples from the
|
||||
dataset. Default: None, expected order behavior shown in the table.
|
||||
dataset. Default: None, expected order behavior shown in the table below.
|
||||
num_shards (int, optional): Number of shards that the dataset will be divided
|
||||
into. Default: None. When this argument is specified, `num_samples` reflects
|
||||
the maximum sample number of per shard.
|
||||
|
@ -4062,11 +4084,11 @@ class SemeionDataset(MappableDataset, VisionBaseDataset):
|
|||
|
||||
Raises:
|
||||
RuntimeError: If `dataset_dir` does not contain data files.
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
RuntimeError: If `sampler` and `shuffle` are specified at the same time.
|
||||
RuntimeError: If `sampler` and `num_shards`/`shard_id` are specified at the same time.
|
||||
RuntimeError: If `num_shards` is specified but `shard_id` is None.
|
||||
RuntimeError: If `shard_id` is specified but `num_shards` is None.
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
ValueError: If `shard_id` is invalid (< 0 or >= `num_shards`).
|
||||
|
||||
Note:
|
||||
|
@ -4176,9 +4198,9 @@ class STL10Dataset(MappableDataset, VisionBaseDataset):
|
|||
num_parallel_workers (int, optional): Number of workers to read the data.
|
||||
Default: None, number set in the config.
|
||||
shuffle (bool, optional): Whether to perform shuffle on the dataset. Default: None, expected
|
||||
order behavior shown in the table.
|
||||
order behavior shown in the table below.
|
||||
sampler (Sampler, optional): Object used to choose samples from the
|
||||
dataset. Default: None, expected order behavior shown in the table.
|
||||
dataset. Default: None, expected order behavior shown in the table below.
|
||||
num_shards (int, optional): Number of shards that the dataset will be divided
|
||||
into. Default: None. When this argument is specified, 'num_samples' reflects
|
||||
the max sample number of per shard.
|
||||
|
@ -4190,12 +4212,12 @@ class STL10Dataset(MappableDataset, VisionBaseDataset):
|
|||
|
||||
Raises:
|
||||
RuntimeError: If `dataset_dir` is not valid or does not exist or does not contain data files.
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
RuntimeError: If `sampler` and `shuffle` are specified at the same time.
|
||||
RuntimeError: If `sampler` and `num_shards`/`shard_id` are specified at the same time.
|
||||
RuntimeError: If `num_shards` is specified but `shard_id` is None.
|
||||
RuntimeError: If `shard_id` is specified but `num_shards` is None.
|
||||
ValueError: If `usage` is invalid.
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
ValueError: If `shard_id` is invalid (< 0 or >= `num_shards`).
|
||||
|
||||
Note:
|
||||
|
@ -4341,24 +4363,23 @@ class SVHNDataset(GeneratorDataset):
|
|||
Default: None, will read all samples.
|
||||
num_samples (int, optional): The number of samples to be included in the dataset. Default: None, all images.
|
||||
num_parallel_workers (int, optional): Number of subprocesses used to fetch the dataset in parallel. Default: 1.
|
||||
shuffle (bool, optional): Whether or not to perform shuffle on the dataset. Random accessible input is required.
|
||||
Default: None, expected order behavior shown in the table.
|
||||
shuffle (bool, optional): Whether or not to perform shuffle on the dataset.
|
||||
Default: None, expected order behavior shown in the table below.
|
||||
sampler (Sampler, optional): Object used to choose samples from the dataset. Random accessible
|
||||
input is required. Default: None, expected order behavior shown in the table.
|
||||
input is required. Default: None, expected order behavior shown in the table below.
|
||||
num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None.
|
||||
Random accessible input is required. When this argument is specified, 'num_samples' reflects the max
|
||||
sample number of per shard.
|
||||
When this argument is specified, 'num_samples' reflects the max sample number of per shard.
|
||||
shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument must be specified only
|
||||
when num_shards is also specified. Random accessible input is required.
|
||||
when num_shards is also specified.
|
||||
|
||||
Raises:
|
||||
RuntimeError: If `dataset_dir` is not valid or does not exist or does not contain data files.
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
RuntimeError: If `sampler` and `shuffle` are specified at the same time.
|
||||
RuntimeError: If `sampler` and `num_shards`/`shard_id` are specified at the same time.
|
||||
RuntimeError: If `num_shards` is specified but `shard_id` is None.
|
||||
RuntimeError: If `shard_id` is specified but `num_shards` is None.
|
||||
ValueError: If `usage` is invalid.
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
ValueError: If `shard_id` is invalid (< 0 or >= `num_shards`).
|
||||
|
||||
Note:
|
||||
|
@ -4443,7 +4464,7 @@ class USPSDataset(SourceDataset, VisionBaseDataset):
|
|||
|
||||
The generated dataset has two columns: :py:obj:`[image, label]`.
|
||||
The tensor of column :py:obj:`image` is of the uint8 type.
|
||||
The tensor of column :py:obj:`label` is of a scalar of uint32 type.
|
||||
The tensor of column :py:obj:`label` is of the uint32 type.
|
||||
|
||||
Args:
|
||||
dataset_dir (str): Path to the root directory that contains the dataset.
|
||||
|
@ -4455,10 +4476,10 @@ class USPSDataset(SourceDataset, VisionBaseDataset):
|
|||
num_parallel_workers (int, optional): Number of workers to read the data.
|
||||
Default: None, will use value set in the config.
|
||||
shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch.
|
||||
Default: Shuffle.GLOBAL. Bool type and Shuffle enum are both supported to pass in.
|
||||
If shuffle is False, no shuffling will be performed;
|
||||
If shuffle is True, the behavior is the same as setting shuffle to be Shuffle.GLOBAL
|
||||
Otherwise, there are two levels of shuffling:
|
||||
Bool type and Shuffle enum are both supported to pass in. Default: `Shuffle.GLOBAL` .
|
||||
If shuffle is False, no shuffling will be performed.
|
||||
If shuffle is True, it is equivalent to setting `shuffle` to mindspore.dataset.Shuffle.GLOBAL.
|
||||
Set the mode of data shuffling by passing in enumeration variables:
|
||||
|
||||
- Shuffle.GLOBAL: Shuffle both the files and samples.
|
||||
|
||||
|
@ -4474,10 +4495,10 @@ class USPSDataset(SourceDataset, VisionBaseDataset):
|
|||
|
||||
Raises:
|
||||
RuntimeError: If `dataset_dir` is not valid or does not exist or does not contain data files.
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
RuntimeError: If `num_shards` is specified but `shard_id` is None.
|
||||
RuntimeError: If `shard_id` is specified but `num_shards` is None.
|
||||
ValueError: If `usage` is invalid.
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
ValueError: If `shard_id` is invalid (< 0 or >= `num_shards`).
|
||||
|
||||
Examples:
|
||||
|
@ -4560,10 +4581,10 @@ class VOCDataset(MappableDataset, VisionBaseDataset):
|
|||
num_parallel_workers (int, optional): Number of workers to read the data.
|
||||
Default: None, number set in the config.
|
||||
shuffle (bool, optional): Whether to perform shuffle on the dataset. Default: None, expected
|
||||
order behavior shown in the table.
|
||||
order behavior shown in the table below.
|
||||
decode (bool, optional): Decode the images after reading. Default: False.
|
||||
sampler (Sampler, optional): Object used to choose samples from the dataset.
|
||||
Default: None, expected order behavior shown in the table.
|
||||
Default: None, expected order behavior shown in the table below.
|
||||
num_shards (int, optional): Number of shards that the dataset will be divided
|
||||
into. Default: None. When this argument is specified, `num_samples` reflects
|
||||
the maximum sample number of per shard.
|
||||
|
@ -4759,10 +4780,10 @@ class WIDERFaceDataset(MappableDataset, VisionBaseDataset):
|
|||
num_parallel_workers (int, optional): Number of workers to read the data.
|
||||
Default: None, will use value set in the config.
|
||||
shuffle (bool, optional): Whether or not to perform shuffle on the dataset.
|
||||
Default: None, expected order behavior shown in the table.
|
||||
Default: None, expected order behavior shown in the table below.
|
||||
decode (bool, optional): Decode the images after reading. Default: False.
|
||||
sampler (Sampler, optional): Object used to choose samples from the dataset.
|
||||
Default: None, expected order behavior shown in the table.
|
||||
Default: None, expected order behavior shown in the table below.
|
||||
num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None.
|
||||
When this argument is specified, `num_samples` reflects the maximum sample number of per shard.
|
||||
shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only be specified
|
||||
|
@ -4773,13 +4794,13 @@ class WIDERFaceDataset(MappableDataset, VisionBaseDataset):
|
|||
|
||||
Raises:
|
||||
RuntimeError: If `dataset_dir` does not contain data files.
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
RuntimeError: If `sampler` and `shuffle` are specified at the same time.
|
||||
RuntimeError: If `sampler` and `num_shards`/`shard_id` are specified at the same time.
|
||||
RuntimeError: If `num_shards` is specified but `shard_id` is None.
|
||||
RuntimeError: If `shard_id` is specified but `num_shards` is None.
|
||||
ValueError: If `shard_id` is invalid (< 0 or >= `num_shards`).
|
||||
ValueError: If `usage` is not in ['train', 'test', 'valid', 'all'].
|
||||
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
|
||||
ValueError: If `annotation_file` is not exist.
|
||||
ValueError: If `dataset_dir` is not exist.
|
||||
|
||||
|
|
|
@ -1282,13 +1282,13 @@ class InMemoryGraphDataset(GeneratorDataset):
|
|||
Default: 'graph'.
|
||||
num_samples (int, optional): The number of samples to be included in the dataset. Default: None, all samples.
|
||||
num_parallel_workers (int, optional): Number of subprocesses used to fetch the dataset in parallel. Default: 1.
|
||||
shuffle (bool, optional): Whether or not to perform shuffle on the dataset. Random accessible input is required.
|
||||
Default: None, expected order behavior shown in the table.
|
||||
shuffle (bool, optional): Whether or not to perform shuffle on the dataset.
|
||||
Default: None, expected order behavior shown in the table below.
|
||||
num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None.
|
||||
Random accessible input is required. When this argument is specified, `num_samples` reflects the max
|
||||
When this argument is specified, `num_samples` reflects the max
|
||||
sample number of per shard.
|
||||
shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument must be specified only
|
||||
when num_shards is also specified. Random accessible input is required.
|
||||
when num_shards is also specified.
|
||||
python_multiprocessing (bool, optional): Parallelize Python operations with multiple worker process. This
|
||||
option could be beneficial if the Python operation is computational heavy. Default: True.
|
||||
max_rowsize(int, optional): Maximum size of row in MB that is used for shared memory allocation to copy
|
||||
|
@ -1386,8 +1386,8 @@ class ArgoverseDataset(InMemoryGraphDataset):
|
|||
recommend to specify it with
|
||||
`column_names=["edge_index", "x", "y", "cluster", "valid_len", "time_step_len"]` like the following example.
|
||||
num_parallel_workers (int, optional): Number of subprocesses used to fetch the dataset in parallel. Default: 1.
|
||||
shuffle (bool, optional): Whether or not to perform shuffle on the dataset. Random accessible input is required.
|
||||
Default: None, expected order behavior shown in the table.
|
||||
shuffle (bool, optional): Whether or not to perform shuffle on the dataset.
|
||||
Default: None, expected order behavior shown in the table below.
|
||||
python_multiprocessing (bool, optional): Parallelize Python operations with multiple worker process. This
|
||||
option could be beneficial if the Python operation is computational heavy. Default: True.
|
||||
perf_mode(bool, optional): mode for obtaining higher performance when iterate created dataset(will call
|
||||
|
|
Loading…
Reference in New Issue