Fix chinese api info

This commit is contained in:
shenwei41 2022-11-04 14:39:15 +08:00
parent 3008a3cee8
commit 73bea6d638
51 changed files with 394 additions and 325 deletions

View File

@ -171,9 +171,9 @@ mindspore.dataset.CLUEDataset
- **ValueError** - `task` 参数不为 'AFQMC'、'TNEWS'、'IFLYTEK'、'CMNLI'、'WSC' 或 'CSL'。
- **ValueError** - `usage` 参数不为 'train'、'test' 或 'eval'。
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
- **ValueError** - `shard_id` 参数错误小于0或者大于等于 `num_shards`
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
- **ValueError** - `shard_id` 参数错误小于0或者大于等于 `num_shards` )。
**关于CLUE数据集**
@ -204,5 +204,5 @@ mindspore.dataset.CLUEDataset
howpublished = {https://github.com/CLUEbenchmark/CLUE}
}
.. include:: mindspore.dataset.api_list_nlp.rst
.. include:: mindspore.dataset.api_list_nlp.rst

View File

@ -29,7 +29,7 @@
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
- **ValueError** - `shard_id` 参数错误小于0或者大于等于 `num_shards`
- **ValueError** - `shard_id` 参数错误小于0或者大于等于 `num_shards`
.. include:: mindspore.dataset.api_list_nlp.rst
.. include:: mindspore.dataset.api_list_nlp.rst

View File

@ -29,13 +29,13 @@ mindspore.dataset.Caltech101Dataset
异常:
- **RuntimeError** - `dataset_dir` 路径下不包含任何数据文件。
- **ValueError** - `target_type` 参数取值不为'category'、'annotation'或'all'。
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
- **RuntimeError** - 同时指定了 `sampler``shuffle` 参数。
- **RuntimeError** - 同时指定了 `sampler``num_shards` 参数或同时指定了 `sampler``shard_id` 参数。
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
- **ValueError** - `shard_id` 参数值错误小于0或者大于等于 `num_shards` )。
- **ValueError** - `shard_id` 参数错误小于0或者大于等于 `num_shards`
- **ValueError** - `target_type` 参数取值不为'category'、'annotation'或'all'。
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
.. note:: 此数据集可以指定参数 `sampler` ,但参数 `sampler` 和参数 `shuffle` 的行为是互斥的。下表展示了几种合法的输入参数组合及预期的行为。
@ -110,5 +110,5 @@ mindspore.dataset.Caltech101Dataset
url = {http://data.caltech.edu/records/20086},
}
.. include:: mindspore.dataset.api_list_vision.rst
.. include:: mindspore.dataset.api_list_vision.rst

View File

@ -20,13 +20,13 @@ mindspore.dataset.Caltech256Dataset
异常:
- **RuntimeError** - `dataset_dir` 路径下不包含任何数据文件。
- **ValueError** - `target_type` 参数取值不为'category'、'annotation'或'all'。
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
- **RuntimeError** - 同时指定了 `sampler``shuffle` 参数。
- **RuntimeError** - 同时指定了 `sampler``num_shards` 参数或同时指定了 `sampler``shard_id` 参数。
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
- **ValueError** - `shard_id` 参数值错误小于0或者大于等于 `num_shards` )。
- **ValueError** - `shard_id` 参数错误小于0或者大于等于 `num_shards`
- **ValueError** - `target_type` 参数取值不为'category'、'annotation'或'all'。
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
.. note:: 此数据集可以指定参数 `sampler` ,但参数 `sampler` 和参数 `shuffle` 的行为是互斥的。下表展示了几种合法的输入参数组合及预期的行为。
@ -97,5 +97,5 @@ mindspore.dataset.Caltech256Dataset
year = {2007}
}
.. include:: mindspore.dataset.api_list_vision.rst
.. include:: mindspore.dataset.api_list_vision.rst

View File

@ -23,13 +23,13 @@ mindspore.dataset.CelebADataset
异常:
- **RuntimeError** - `dataset_dir` 路径下不包含任何数据文件。
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
- **ValueError** - `usage` 参数取值不为'train'、'valid'、'test'或'all'。
- **RuntimeError** - 同时指定了 `sampler``shuffle` 参数。
- **RuntimeError** - 同时指定了 `sampler``num_shards` 参数或同时指定了 `sampler``shard_id` 参数。
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
- **ValueError** - `shard_id` 参数值错误小于0或者大于等于 `num_shards` )。
- **ValueError** - `shard_id` 参数错误小于0或者大于等于 `num_shards`
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
- **ValueError** - `usage` 参数取值不为'train'、'valid'、'test'或'all'。
.. note:: 此数据集可以指定参数 `sampler` ,但参数 `sampler` 和参数 `shuffle` 的行为是互斥的。下表展示了几种合法的输入参数组合及预期的行为。
@ -120,5 +120,5 @@ mindspore.dataset.CelebADataset
howpublished = {http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html}
}
.. include:: mindspore.dataset.api_list_vision.rst
.. include:: mindspore.dataset.api_list_vision.rst

View File

@ -21,13 +21,13 @@ mindspore.dataset.Cifar100Dataset
异常:
- **RuntimeError** - `dataset_dir` 路径下不包含数据文件。
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
- **ValueError** - `usage` 参数取值不为'train'、'test'或'all'。
- **RuntimeError** - 同时指定了 `sampler``shuffle` 参数。
- **RuntimeError** - 同时指定了 `sampler``num_shards` 参数或同时指定了 `sampler``shard_id` 参数。
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
- **ValueError** - `shard_id` 参数错误小于0或者大于等于 `num_shards`)。
- **ValueError** - `shard_id` 参数错误小于0或者大于等于 `num_shards`
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
- **ValueError** - `usage` 参数取值不为'train'、'test'或'all'。
.. note:: 此数据集可以指定参数 `sampler` ,但参数 `sampler` 和参数 `shuffle` 的行为是互斥的。下表展示了几种合法的输入参数组合及预期的行为。
@ -84,5 +84,5 @@ mindspore.dataset.Cifar100Dataset
howpublished = {http://www.cs.toronto.edu/~kriz/cifar.html}
}
.. include:: mindspore.dataset.api_list_vision.rst
.. include:: mindspore.dataset.api_list_vision.rst

View File

@ -21,13 +21,13 @@ mindspore.dataset.Cifar10Dataset
异常:
- **RuntimeError** - `dataset_dir` 路径下不包含数据文件。
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
- **ValueError** - `usage` 参数取值不为'train'、'test'或'all'。
- **RuntimeError** - 同时指定了 `sampler``shuffle` 参数。
- **RuntimeError** - 同时指定了 `sampler``num_shards` 参数或同时指定了 `sampler``shard_id` 参数。
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
- **ValueError** - `shard_id` 参数错误小于0或者大于等于 `num_shards` )。
- **ValueError** - `shard_id` 参数错误小于0或者大于等于 `num_shards`
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
- **ValueError** - `usage` 参数取值不为'train'、'test'或'all'。
.. note:: 此数据集可以指定参数 `sampler` ,但参数 `sampler` 和参数 `shuffle` 的行为是互斥的。下表展示了几种合法的输入参数组合及预期的行为。
@ -88,4 +88,4 @@ mindspore.dataset.Cifar10Dataset
howpublished = {http://www.cs.toronto.edu/~kriz/cifar.html}
}
.. include:: mindspore.dataset.api_list_vision.rst
.. include:: mindspore.dataset.api_list_vision.rst

View File

@ -25,16 +25,16 @@ mindspore.dataset.CityscapesDataset
异常:
- **RuntimeError** - `dataset_dir` 路径下不包含任何数据文件。
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
- **RuntimeError** - 同时指定了 `sampler``shuffle` 参数。
- **RuntimeError** - 同时指定了 `sampler``num_shards` 参数或同时指定了 `sampler``shard_id` 参数。
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
- **ValueError** - `dataset_dir` 路径非法或不存在。
- **ValueError** - `task` 参数取值不为'instance'、'semantic'、'polygon'或'color'。
- **ValueError** - `quality_mode` 参数取值不为'fine'或'coarse'。
- **ValueError** - `usage` 参数取值不在给定的字段中。
- **ValueError** - `shard_id` 参数值错误(小于0或者大于等于 `num_shards`
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards`
.. note:: 此数据集可以指定参数 `sampler` ,但参数 `sampler` 和参数 `shuffle` 的行为是互斥的。下表展示了几种合法的输入参数组合及预期的行为。
@ -121,5 +121,5 @@ mindspore.dataset.CityscapesDataset
year = {2016}
}
.. include:: mindspore.dataset.api_list_vision.rst
.. include:: mindspore.dataset.api_list_vision.rst

View File

@ -67,7 +67,7 @@
- **ValueError** - `task` 参数取值不为 `Detection``Stuff``Panoptic``Keypoint`
- **ValueError** - `annotation_file` 参数对应的文件不存在。
- **ValueError** - `dataset_dir` 参数路径不存在。
- **ValueError** - `shard_id` 参数值错误(小于0或者大于等于 `num_shards`
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards`
.. note::
- 当参数 `extra_metadata` 为True时还需使用 `rename` 操作删除额外数据列'_meta-filename'的前缀'_meta-'
@ -151,5 +151,5 @@
bibsource = {dblp computer science bibliography, https://dblp.org}
}
.. include:: mindspore.dataset.api_list_vision.rst
.. include:: mindspore.dataset.api_list_vision.rst

View File

@ -29,7 +29,7 @@ mindspore.dataset.DBpediaDataset
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
- **ValueError** - `shard_id` 参数错误小于0或者大于等于 `num_shards`
- **ValueError** - `shard_id` 参数错误小于0或者大于等于 `num_shards`
**关于DBpedia数据集**

View File

@ -35,7 +35,7 @@ mindspore.dataset.DIV2KDataset
- **ValueError** - `scale` 参数取值不在给定的字段中,或与 `downgrade` 参数的值不匹配。
- **ValueError** - `scale` 参数取值为8`downgrade` 参数的值不为 'bicubic'。
- **ValueError** - `downgrade` 参数取值为'mild'、'difficult'或'wild',但 `scale` 参数的值不为4。
- **ValueError** - `shard_id` 参数值错误(小于0或者大于等于 `num_shards`
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards`
.. note:: 此数据集可以指定参数 `sampler` ,但参数 `sampler` 和参数 `shuffle` 的行为是互斥的。下表展示了几种合法的输入参数组合及预期的行为。
@ -138,5 +138,5 @@ mindspore.dataset.DIV2KDataset
year = {2017}
}
.. include:: mindspore.dataset.api_list_vision.rst
.. include:: mindspore.dataset.api_list_vision.rst

View File

@ -28,7 +28,7 @@
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
- **ValueError** - `annotation_file` 参数对应的文件不存在。
- **ValueError** - `dataset_dir` 参数路径不存在。
- **ValueError** - `shard_id` 参数值错误(小于0或者大于等于 `num_shards`
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards`
.. note:: 此数据集可以指定参数 `sampler` ,但参数 `sampler` 和参数 `shuffle` 的行为是互斥的。下表展示了几种合法的输入参数组合及预期的行为。
@ -126,5 +126,5 @@
bibsource = {dblp computer science bibliography, https://dblp.org}
}
.. include:: mindspore.dataset.api_list_vision.rst
.. include:: mindspore.dataset.api_list_vision.rst

View File

@ -29,7 +29,7 @@ mindspore.dataset.Flowers102Dataset
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
- **ValueError** - `shard_id` 参数错误小于0或者大于等于 `num_shards`
- **ValueError** - `shard_id` 参数错误小于0或者大于等于 `num_shards`
.. note:: 此数据集可以指定参数 `sampler` ,但参数 `sampler` 和参数 `shuffle` 的行为是互斥的。下表展示了几种合法的输入参数组合及预期的行为。

View File

@ -14,9 +14,8 @@
- **column_names** (Union[str, list[str]],可选) - 指定数据集生成的列名。默认值None不指定。用户可以通过此参数或 `schema` 参数指定列名。
- **column_types** (list[mindspore.dtype],可选) - 指定生成数据集各个数据列的数据类型。默认值None不指定。
如果未指定该参数,则自动推断类型;如果指定了该参数,将在数据输出时做类型匹配检查。
- **schema** (Union[Schema, str],可选) - 读取模式策略,用于指定读取数据列的数据类型、数据维度等信息。
支持传入JSON文件路径或 mindspore.dataset.Schema 构造的对象。默认值None不指定。
用户可以通过提供 `column_names``schema` 指定数据集的列名,但如果同时指定两者,则将优先从 `schema` 中获取列名信息。
- **schema** (Union[str, Schema], 可选) - 数据格式策略,用于指定读取数据列的数据类型、数据维度等信息。
支持传入JSON文件路径或 mindspore.dataset.Schema 构造的对象。默认值None。
- **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值None读取全部样本。
- **num_parallel_workers** (int, 可选) - 指定读取数据的工作进程数/线程数(由参数 `python_multiprocessing` 决定当前为多进程模式或多线程模式。默认值1。
- **shuffle** (bool可选) - 是否混洗数据集。只有输入的 `source` 参数带有可随机访问属性(`__getitem__`才可以指定该参数。默认值None。下表中会展示不同配置的预期行为。
@ -34,7 +33,7 @@
- **ValueError** - 同时指定了 `sampler``num_shards` 参数或同时指定了 `sampler``shard_id` 参数。
- **ValueError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
- **ValueError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
- **ValueError** - `shard_id` 参数值错误(小于0或者大于等于 `num_shards`
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards`
.. note::
- `source` 参数接收用户自定义的Python函数PyFuncs不要将 `mindspore.nn``mindspore.ops` 目录下或其他的网络计算算子添加
@ -67,5 +66,5 @@
- False
- 不允许
.. include:: mindspore.dataset.api_list_nlp.rst
.. include:: mindspore.dataset.api_list_nlp.rst

View File

@ -27,7 +27,7 @@ mindspore.dataset.IMDBDataset
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
- **ValueError** - `shard_id` 参数错误小于0或者大于等于 `num_shards`
- **ValueError** - `shard_id` 参数错误小于0或者大于等于 `num_shards`
.. note:: 此数据集可以指定参数 `sampler` ,但参数 `sampler` 和参数 `shuffle` 的行为是互斥的。下表展示了几种合法的输入参数组合及预期的行为。

View File

@ -29,7 +29,7 @@ mindspore.dataset.ImageFolderDataset
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
- **RuntimeError** - `class_indexing` 参数的类型不是dict。
- **ValueError** - `shard_id` 参数值错误(小于0或者大于等于 `num_shards`
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards`
.. note::
- 如果 `decode` 参数的值为False则得到的 `image` 列的shape为[undecoded_image_size]如果为True则 `image` 列的shape为[H,W,C]。

View File

@ -30,7 +30,7 @@
- **ValueError** - `num_parallel_workers` 参数超过最大线程数。
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
- **ValueError** - `shard_id` 参数值错误(小于0或者大于等于 `num_shards`
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards`
.. note:: 此数据集可以指定参数 `sampler` ,但参数 `sampler` 和参数 `shuffle` 的行为是互斥的。下表展示了几种合法的输入参数组合及预期的行为。
@ -60,5 +60,5 @@
- False
- 不允许
.. include:: mindspore.dataset.api_list_nlp.rst
.. include:: mindspore.dataset.api_list_nlp.rst

View File

@ -27,7 +27,7 @@ mindspore.dataset.MnistDataset
- **RuntimeError** - 同时指定了 `sampler``num_shards` 参数或同时指定了 `sampler``shard_id` 参数。
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
- **ValueError** - `shard_id` 参数错误小于0或者大于等于 `num_shards`
- **ValueError** - `shard_id` 参数错误小于0或者大于等于 `num_shards`
.. note:: 此数据集可以指定参数 `sampler` ,但参数 `sampler` 和参数 `shuffle` 的行为是互斥的。下表展示了几种合法的输入参数组合及预期的行为。

View File

@ -54,7 +54,7 @@ mindspore.dataset.NumpySlicesDataset
- **ValueError** - 同时指定了 `sampler``num_shards` 参数或同时指定了 `sampler``shard_id` 参数。
- **ValueError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
- **ValueError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
- **ValueError** - `shard_id` 参数值错误(小于0或者大于等于 `num_shards`
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards`
.. include:: mindspore.dataset.api_list_nlp.rst
.. include:: mindspore.dataset.api_list_nlp.rst

View File

@ -33,12 +33,12 @@
- **ValueError** - `columns_list` 参数无效。
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
- **ValueError** - `shard_id` 参数值错误(小于0或者大于等于 `num_shards`
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards`
.. note::
- 需要用户提前在云存储上创建同步用的目录,然后通过 `sync_obs_path` 指定。
- 如果线下训练,建议为每次训练设置 `BATCH_JOB_ID` 环境变量。
- 分布式训练中假如使用多个节点服务器则必须使用每个节点全部的8张卡。如果只有一个节点服务器则没有这样的限制。
.. include:: mindspore.dataset.api_list_nlp.rst
.. include:: mindspore.dataset.api_list_nlp.rst

View File

@ -3,7 +3,7 @@ mindspore.dataset.Places365Dataset
.. py:class:: mindspore.dataset.Places365Dataset(dataset_dir, usage=None, small=True, decode=False, num_samples=None, num_parallel_workers=None, shuffle=None, sampler=None, num_shards=None, shard_id=None, cache=None)
读取和解析PhotoTour数据集的源数据集。
读取和解析Places365数据集的源数据集。
生成的数据集有两列: `[image, label]`
`image` 列的数据类型为uint8。 `label` 列的数据类型为uint32。
@ -23,12 +23,12 @@ mindspore.dataset.Places365Dataset
异常:
- **RuntimeError** - `dataset_dir` 路径下不包含数据文件。
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
- **RuntimeError** - 同时指定了 `sampler``shuffle` 参数。
- **RuntimeError** - 同时指定了 `sampler``num_shards` 参数或同时指定了 `sampler``shard_id` 参数。
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
- **ValueError** - `shard_id` 参数错误小于0或者大于等于 `num_shards` )。
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
- **ValueError** - `shard_id` 参数错误参数小于0或者大于等于 `num_shards`
- **ValueError** - `usage` 不是['train-standard', 'train-challenge', 'val']中的任何一个。
.. note:: 此数据集可以指定参数 `sampler` ,但参数 `sampler` 和参数 `shuffle` 的行为是互斥的。下表展示了几种合法的输入参数组合及预期的行为。

View File

@ -21,12 +21,12 @@ mindspore.dataset.QMnistDataset
异常:
- **RuntimeError** - `dataset_dir` 路径下不包含数据文件。
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
- **RuntimeError** - 同时指定了 `sampler``shuffle` 参数。
- **RuntimeError** - 同时指定了 `sampler``num_shards` 参数或同时指定了 `sampler``shard_id` 参数。
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
- **ValueError** - `shard_id` 参数错误小于0或者大于等于 `num_shards` )。
- **ValueError** - `shard_id` 参数错误小于0或者大于等于 `num_shards`
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
.. note:: 此数据集可以指定参数 `sampler` ,但参数 `sampler` 和参数 `shuffle` 的行为是互斥的。下表展示了几种合法的输入参数组合及预期的行为。
@ -87,5 +87,5 @@ mindspore.dataset.QMnistDataset
publisher = {Curran Associates, Inc.},
}
.. include:: mindspore.dataset.api_list_vision.rst
.. include:: mindspore.dataset.api_list_vision.rst

View File

@ -7,9 +7,9 @@ mindspore.dataset.RandomDataset
参数:
- **total_rows** (int, 可选) - 随机生成样本数据的数量。默认值None生成随机数量的样本。
- **schema** (Union[str, Schema], 可选) - 读取模式策略,用于指定读取数据列的数据类型、数据维度等信息。
支持传入JSON文件路径或 mindspore.dataset.Schema 构造的对象。默认值None,不指定
- **columns_list** (list[str], 可选) - 指定生成数据集的列名。默认值None生成的数据列将以"c0""c1""c2" ... "cn"的规则命名。
- **schema** (Union[str, Schema], 可选) - 数据格式策略,用于指定读取数据列的数据类型、数据维度等信息。
支持传入JSON文件路径或 mindspore.dataset.Schema 构造的对象。默认值None。
- **columns_list** (list[str], 可选) - 指定生成数据集的列名。默认值None生成的数据列将以"c0"、"c1"、"c2" ... "cn"的规则命名。
- **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值None读取所有样本。
- **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值None使用mindspore.dataset.config中配置的线程数。
- **cache** (DatasetCache, 可选) - 单节点数据缓存服务,用于加快数据集处理,详情请阅读 `单节点数据缓存 <https://www.mindspore.cn/tutorials/experts/zh-CN/master/dataset/cache.html>`_ 。默认值None不使用缓存。
@ -17,5 +17,16 @@ mindspore.dataset.RandomDataset
- **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值None。指定此参数后 `num_samples` 表示每个分片的最大样本数。
- **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值None。只有当指定了 `num_shards` 时才能指定此参数。
.. include:: mindspore.dataset.api_list_nlp.rst
异常:
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
- **ValueError** - `shard_id` 参数错误小于0或者大于等于 `num_shards`
- **TypeError** - `total_rows` 的类型不是int。
- **TypeError** - `num_shards` 的类型不是int。
- **TypeError** - `num_parallel_workers` 的类型不是int。
- **TypeError** - `shuffle` 的类型不是bool。
- **TypeError** - `columns_list` 的类型不是list。
.. include:: mindspore.dataset.api_list_nlp.rst

View File

@ -5,7 +5,7 @@ mindspore.dataset.SBDataset
读取和解析Semantic Boundaries数据集的源文件构建数据集。
根据给定的 `task` 配置,生成数据集具有不同的输出列:
通过配置 `task` 参数,生成的数据集具有不同的输出列:
- `task` = 'Boundaries',有两个输出列: `image` 列的数据类型为uint8`label` 列包含1个的数据类型为uint8的图像。
- `task` = 'Segmentation',有两个输出列: `image` 列的数据类型为uint8。 `label` 列包含20个的数据类型为uint8的图像。
@ -15,7 +15,7 @@ mindspore.dataset.SBDataset
- **task** (str, 可选) - 指定读取SB数据集的任务类型支持'Boundaries'和'Segmentation'。默认值:'Boundaries'。
- **usage** (str, 可选) - 指定数据集的子集,可取值为'train'、'val'、'train_noval'和'all'。默认值:'train'。
- **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值None所有图像样本。
- **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None使用mindspore.dataset.config中配置的线程数。
- **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:1使用mindspore.dataset.config中配置的线程数。
- **shuffle** (bool, 可选) - 是否混洗数据集。默认值None。下表中会展示不同参数配置的预期行为。
- **decode** (bool, 可选) - 是否对读取的图片进行解码操作。默认值False不解码。
- **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值None。下表中会展示不同配置的预期行为。
@ -24,15 +24,15 @@ mindspore.dataset.SBDataset
异常:
- **RuntimeError** - `dataset_dir` 路径下不包含任何数据文件。
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
- **RuntimeError** - 同时指定了 `sampler``shuffle` 参数。
- **RuntimeError** - 同时指定了 `sampler``num_shards` 参数或同时指定了 `sampler``shard_id` 参数。
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
- **ValueError** - `dataset_dir` 不存在。
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
- **ValueError** - `task` 不是['Boundaries', 'Segmentation']中的任何一个。
- **ValueError** - `usage` 不是['train', 'val', 'train_noval', 'all']中的任何一个。
- **ValueError** - `shard_id` 参数值错误(小于0或者大于等于 `num_shards`
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards`
.. note:: 此数据集可以指定参数 `sampler` ,但参数 `sampler` 和参数 `shuffle` 的行为是互斥的。下表展示了几种合法的输入参数组合及预期的行为。
@ -102,5 +102,5 @@ mindspore.dataset.SBDataset
year = "2011",
}
.. include:: mindspore.dataset.api_list_vision.rst
.. include:: mindspore.dataset.api_list_vision.rst

View File

@ -9,10 +9,10 @@ mindspore.dataset.SBUDataset
参数:
- **dataset_dir** (str) - 包含数据集文件的根目录的路径。
- **decode** (bool, 可选) - 是否对读取的图片进行解码操作。默认值False不解码。
- **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值None所有图像样本。
- **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值None使用mindspore.dataset.config中配置的线程数。
- **shuffle** (bool, 可选) - 是否混洗数据集。默认值None。下表中会展示不同参数配置的预期行为。
- **decode** (bool, 可选) - 是否对读取的图片进行解码操作。默认值False不解码。
- **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值None。下表中会展示不同配置的预期行为。
- **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值None。指定此参数后 `num_samples` 表示每个分片的最大样本数。
- **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值None。只有当指定了 `num_shards` 时才能指定此参数。
@ -20,12 +20,12 @@ mindspore.dataset.SBUDataset
异常:
- **RuntimeError** - `dataset_dir` 路径下不包含任何数据文件。
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
- **RuntimeError** - 同时指定了 `sampler``shuffle` 参数。
- **RuntimeError** - 同时指定了 `sampler``num_shards` 参数或同时指定了 `sampler``shard_id` 参数。
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
- **ValueError** - `shard_id` 参数值错误小于0或者大于等于 `num_shards` )。
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
- **ValueError** - `shard_id` 参数错误小于0或者大于等于 `num_shards`
.. note:: 此数据集可以指定参数 `sampler` ,但参数 `sampler` 和参数 `shuffle` 的行为是互斥的。下表展示了几种合法的输入参数组合及预期的行为。
@ -83,5 +83,5 @@ mindspore.dataset.SBUDataset
Year = {2011},
}
.. include:: mindspore.dataset.api_list_vision.rst
.. include:: mindspore.dataset.api_list_vision.rst

View File

@ -22,13 +22,13 @@ mindspore.dataset.STL10Dataset
异常:
- **RuntimeError** - `dataset_dir` 路径下不包含数据文件。
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
- **RuntimeError** - 同时指定了 `sampler``shuffle` 参数。
- **RuntimeError** - 同时指定了 `sampler``num_shards` 参数或同时指定了 `sampler``shard_id` 参数。
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
- **ValueError** - `usage` 参数无效。
- **ValueError** - `shard_id` 参数错误小于0或者大于等于 `num_shards` )。
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
- **ValueError** - `shard_id` 参数错误小于0或者大于等于 `num_shards`
.. note:: 此数据集可以指定参数 `sampler` ,但参数 `sampler` 和参数 `shuffle` 的行为是互斥的。下表展示了几种合法的输入参数组合及预期的行为。
@ -95,5 +95,5 @@ mindspore.dataset.STL10Dataset
}
}
.. include:: mindspore.dataset.api_list_vision.rst
.. include:: mindspore.dataset.api_list_vision.rst

View File

@ -11,7 +11,7 @@ mindspore.dataset.SVHNDataset
- **dataset_dir** (str) - 包含数据集文件的根目录路径。
- **usage** (str, 可选) - 指定数据集的子集,可取值为'train'、'test'、'extra'或'all'。默认值None读取全部样本图片。
- **num_samples** (int, 可选) - 指定从数据集中读取的样本数可以小于数据集总数。默认值None读取全部样本图片。
- **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None使用mindspore.dataset.config中配置的线程数。
- **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:1使用mindspore.dataset.config中配置的线程数。
- **shuffle** (bool, 可选) - 是否混洗数据集。默认值None。下表中会展示不同参数配置的预期行为。
- **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值None。下表中会展示不同配置的预期行为。
- **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值None。指定此参数后 `num_samples` 表示每个分片的最大样本数。
@ -19,13 +19,13 @@ mindspore.dataset.SVHNDataset
异常:
- **RuntimeError** - `dataset_dir` 路径下不包含数据文件。
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
- **RuntimeError** - 同时指定了 `sampler``shuffle` 参数。
- **RuntimeError** - 同时指定了 `sampler``num_shards` 参数或同时指定了 `sampler``shard_id` 参数。
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
- **ValueError** - `usage` 参数无效。
- **ValueError** - `shard_id` 参数错误小于0或者大于等于 `num_shards` )。
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
- **ValueError** - `shard_id` 参数错误小于0或者大于等于 `num_shards`
.. note:: 此数据集可以指定参数 `sampler` ,但参数 `sampler` 和参数 `shuffle` 的行为是互斥的。下表展示了几种合法的输入参数组合及预期的行为。

View File

@ -19,12 +19,12 @@ mindspore.dataset.SemeionDataset
异常:
- **RuntimeError** - `dataset_dir` 路径下不包含任何数据文件。
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
- **RuntimeError** - 同时指定了 `sampler``shuffle` 参数。
- **RuntimeError** - 同时指定了 `sampler``num_shards` 参数或同时指定了 `sampler``shard_id` 参数。
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
- **ValueError** - `shard_id` 参数值错误小于0或者大于等于 `num_shards` )。
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
- **ValueError** - `shard_id` 参数错误小于0或者大于等于 `num_shards`
.. note:: 此数据集可以指定参数 `sampler` ,但参数 `sampler` 和参数 `shuffle` 的行为是互斥的。下表展示了几种合法的输入参数组合及预期的行为。
@ -77,5 +77,5 @@ mindspore.dataset.SemeionDataset
author={M Buscema, MetaNet},
}
.. include:: mindspore.dataset.api_list_vision.rst
.. include:: mindspore.dataset.api_list_vision.rst

View File

@ -11,9 +11,8 @@ mindspore.dataset.SogouNewsDataset
- **dataset_dir** (str) - 包含数据集文件的根目录路径。
- **usage** (str, 可选) - 指定数据集的子集,可取值为'train''test'或'all'。默认值None读取全部样本。
取值为'train'时将会读取45万个训练样本取值为'test'时将会读取6万个测试样本取值为'all'时将会读取全部51万个样本。默认值None读取全部样本。
- **num_samples** (int, 可选) - 指定从数据集中读取的样本数。
- **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值None使用mindspore.dataset.config中配置的线程数。
- **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式支持传入bool类型与枚举类型进行指定。默认值mindspore.dataset.Shuffle.GLOBAL。
- **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值None 读取全部样本。
- **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式支持传入bool类型与枚举类型进行指定。默认值 `Shuffle.GLOBAL`
如果 `shuffle` 为False则不混洗如果 `shuffle` 为True等同于将 `shuffle` 设置为mindspore.dataset.Shuffle.GLOBAL。
通过传入枚举变量设置数据混洗的模式:
@ -22,13 +21,14 @@ mindspore.dataset.SogouNewsDataset
- **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值None。指定此参数后 `num_samples` 表示每个分片的最大样本数。
- **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值None。只有当指定了 `num_shards` 时才能指定此参数。
- **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值None使用mindspore.dataset.config中配置的线程数。
- **cache** (DatasetCache, 可选) - 单节点数据缓存服务,用于加快数据集处理,详情请阅读 `单节点数据缓存 <https://www.mindspore.cn/tutorials/experts/zh-CN/master/dataset/cache.html>`_ 。默认值None不使用缓存。
异常:
- **RuntimeError** - `dataset_dir` 参数所指向的文件目录不存在或缺少数据集文件。
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
**关于SogouNew数据集**
@ -60,5 +60,5 @@ mindspore.dataset.SogouNewsDataset
primaryClass={cs.LG}
}
.. include:: mindspore.dataset.api_list_nlp.rst
.. include:: mindspore.dataset.api_list_nlp.rst

View File

@ -22,12 +22,12 @@ mindspore.dataset.SpeechCommandsDataset
异常:
- **RuntimeError** - `dataset_dir` 路径下不包含任何数据文件。
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
- **RuntimeError** - 同时指定了 `sampler``shuffle` 参数。
- **RuntimeError** - 同时指定了 `sampler``num_shards` 参数或同时指定了 `sampler``shard_id` 参数。
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
- **ValueError** - `shard_id` 参数值错误小于0或者大于等于 `num_shards` )。
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
- **ValueError** - `shard_id` 参数错误小于0或者大于等于 `num_shards`
.. note:: 此数据集可以指定参数 `sampler` ,但参数 `sampler` 和参数 `shuffle` 的行为是互斥的。下表展示了几种合法的输入参数组合及预期的行为。
@ -87,5 +87,5 @@ mindspore.dataset.SpeechCommandsDataset
year={2018}
}
.. include:: mindspore.dataset.api_list_audio.rst
.. include:: mindspore.dataset.api_list_audio.rst

View File

@ -7,8 +7,8 @@ mindspore.dataset.TFRecordDataset
参数:
- **dataset_files** (Union[str, list[str]]) - 数据集文件路径支持单文件路径字符串、多文件路径字符串列表或可被glob库模式匹配的字符串文件列表将在内部进行字典排序。
- **schema** (Union[str, Schema], 可选) - 读取模式策略,用于指定读取数据列的数据类型、数据维度等信息。
支持传入JSON文件路径或 mindspore.dataset.Schema 构造的对象。默认值None,不指定
- **schema** (Union[str, Schema], 可选) - 数据格式策略,用于指定读取数据列的数据类型、数据维度等信息。
支持传入JSON文件路径或 mindspore.dataset.Schema 构造的对象。默认值None。
- **columns_list** (list[str], 可选) - 指定从TFRecord文件中读取的数据列。默认值None读取所有列。
- **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值None读取全部样本。
@ -33,7 +33,7 @@ mindspore.dataset.TFRecordDataset
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
- **ValueError** - `shard_id` 参数值错误(小于0或者大于等于 `num_shards`
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards`
.. include:: mindspore.dataset.api_list_nlp.rst
.. include:: mindspore.dataset.api_list_nlp.rst

View File

@ -25,12 +25,12 @@ mindspore.dataset.TedliumDataset
异常:
- **RuntimeError** - `dataset_dir` 路径下不包含任何数据文件。
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
- **RuntimeError** - 同时指定了 `sampler``shuffle` 参数。
- **RuntimeError** - 同时指定了 `sampler``num_shards` 参数或同时指定了 `sampler``shard_id` 参数。
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
- **ValueError** - `shard_id` 参数值错误小于0或者大于等于 `num_shards` )。
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
- **ValueError** - `shard_id` 参数错误小于0或者大于等于 `num_shards`
.. note:: 此数据集可以指定参数 `sampler` ,但参数 `sampler` 和参数 `shuffle` 的行为是互斥的。下表展示了几种合法的输入参数组合及预期的行为。
@ -149,5 +149,5 @@ mindspore.dataset.TedliumDataset
biburl={https://www.openslr.org/51/}
}
.. include:: mindspore.dataset.api_list_audio.rst
.. include:: mindspore.dataset.api_list_audio.rst

View File

@ -25,7 +25,7 @@
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
- **ValueError** - `shard_id` 参数值错误(小于0或者大于等于 `num_shards`
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards`
.. include:: mindspore.dataset.api_list_nlp.rst
.. include:: mindspore.dataset.api_list_nlp.rst

View File

@ -10,9 +10,9 @@ mindspore.dataset.UDPOSDataset
参数:
- **dataset_dir** (str) - 包含数据集文件的根目录路径。
- **usage** (str, 可选) - 指定数据集的子集,可取值为'train'、'test'、'valid'或'all'。
取值为'train'时将会读取12,543个样本取值为'test'时将会读取2,077个测试样本取值为'test'时将会读取9,981个样本取值为'valid'时将会读取2,002个样本取值为'all'时将会读取全部16,622个样本。默认值None读取全部样本。
取值为'train'时将会读取12,543个样本取值为'test'时将会读取2,077个测试样本取值为'valid'时将会读取2,002个样本取值为'all'时将会读取全部16,622个样本。默认值None读取全部样本。
- **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值None读取全部样本。
- **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式支持传入bool类型与枚举类型进行指定。默认值mindspore.dataset.Shuffle.GLOBAL
- **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式支持传入bool类型与枚举类型进行指定。默认值 `Shuffle.GLOBAL`
如果 `shuffle` 为False则不混洗如果 `shuffle` 为True等同于将 `shuffle` 设置为mindspore.dataset.Shuffle.GLOBAL。
通过传入枚举变量设置数据混洗的模式:
@ -26,9 +26,27 @@ mindspore.dataset.UDPOSDataset
异常:
- **RuntimeError** - `dataset_dir` 参数所指向的文件目录不存在或缺少数据集文件。
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
.. include:: mindspore.dataset.api_list_nlp.rst
**关于UDPOS数据集**
UDPOS是一个解析的文本语料库数据集用于阐明句法或者语义句子结构。
该语料库包含254,830个单词和16,622个句子取自各种网络媒体包括博客、新闻组、电子邮件和评论。
**引用:**
.. code-block::
@inproceedings{silveira14gold,
year = {2014},
author = {Natalia Silveira and Timothy Dozat and Marie-Catherine de Marneffe and Samuel Bowman
and Miriam Connor and John Bauer and Christopher D. Manning},
title = {A Gold Standard Dependency Corpus for {E}nglish},
booktitle = {Proceedings of the Ninth International Conference on Language
Resources and Evaluation (LREC-2014)}
}
.. include:: mindspore.dataset.api_list_nlp.rst

View File

@ -3,17 +3,17 @@ mindspore.dataset.USPSDataset
.. py:class:: mindspore.dataset.USPSDataset(dataset_dir, usage=None, num_samples=None, num_parallel_workers=None, shuffle=Shuffle.GLOBAL, num_shards=None, shard_id=None, cache=None)
读取和解析UDPOS数据集的源数据集。
读取和解析USPS数据集的源数据集。
生成的数据集有两列: `[image, label]``image` 列的数据类型为uint8。 `label` 列的数据类型为uint32。
参数:
- **dataset_dir** (str) - 包含数据集文件的根目录路径。
- **usage** (str, 可选) - 指定数据集的子集,可取值为'train'、'test'、或'all'。
取值为'train'时将会读取7,291个样本取值为'test'时将会读取2,077个测试样本取值为'test'时将会读取2,007个样本,取值为'all'时将会读取全部9,298个样本。默认值None读取全部样本。
取值为'train'时将会读取7,291个样本取值为'test'时将会读取2,007个测试样本,取值为'all'时将会读取全部9,298个样本。默认值None读取全部样本。
- **num_samples** (int, 可选) - 指定从数据集中读取的样本数。
- **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值None使用mindspore.dataset.config中配置的线程数。
- **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式支持传入bool类型与枚举类型进行指定。默认值mindspore.dataset.Shuffle.GLOBAL
- **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式支持传入bool类型与枚举类型进行指定。默认值`Shuffle.GLOBAL`
如果 `shuffle` 为False则不混洗如果 `shuffle` 为True等同于将 `shuffle` 设置为mindspore.dataset.Shuffle.GLOBAL。
通过传入枚举变量设置数据混洗的模式:
@ -26,11 +26,11 @@ mindspore.dataset.USPSDataset
异常:
- **RuntimeError** - `dataset_dir` 路径下不包含数据文件。
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
- **ValueError** - `usage` 参数无效。
- **ValueError** - `shard_id` 参数错误小于0或者大于等于 `num_shards` )。
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
- **ValueError** - `shard_id` 参数错误小于0或者大于等于 `num_shards`
**关于USPS数据集**
@ -61,5 +61,5 @@ mindspore.dataset.USPSDataset
publisher={IEEE}
}
.. include:: mindspore.dataset.api_list_vision.rst
.. include:: mindspore.dataset.api_list_vision.rst

View File

@ -43,7 +43,7 @@ mindspore.dataset.VOCDataset
- **ValueError** - 指定的任务不为'Segmentation'或'Detection'。
- **ValueError** - 指定任务为'Segmentation'时, `class_indexing` 参数不为None。
- **ValueError** - 与 `usage` 参数相关的txt文件不存在。
- **ValueError** - `shard_id` 参数值错误(小于0或者大于等于 `num_shards`
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards`
.. note::
- 当参数 `extra_metadata` 为True时还需使用 `rename` 操作删除额外数据列'_meta-filename'的前缀'_meta-'
@ -125,5 +125,5 @@ mindspore.dataset.VOCDataset
howpublished = {http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html}
}
.. include:: mindspore.dataset.api_list_vision.rst
.. include:: mindspore.dataset.api_list_vision.rst

View File

@ -11,7 +11,7 @@ mindspore.dataset.WIDERFaceDataset
参数:
- **dataset_dir** (str) - 包含数据集文件的根目录路径。
- **usage** (str, 可选) - 指定数据集的子集,可取值为'train'、'test'、'valid'或'all'。
取值为'train'时将会读取12,880个样本取值为'test'时将会读取2,077个测试样本取值为'test'时将会读取16,097个样本取值为'valid'时将会读取3,226个样本取值为'all'时将会读取全部类别样本。默认值None读取全部样本。
取值为'train'时将会读取12,880个样本取值为'test'时将会读取16,097个样本取值为'valid'时将会读取3,226个样本取值为'all'时将会读取全部类别样本。默认值None读取全部样本。
- **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值None读取全部样本。
- **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值None使用mindspore.dataset.config中配置的线程数。
- **shuffle** (bool, 可选) - 是否混洗数据集。默认值None。下表中会展示不同参数配置的预期行为。
@ -23,13 +23,13 @@ mindspore.dataset.WIDERFaceDataset
异常:
- **RuntimeError** - `dataset_dir` 不包含任何数据文件。
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
- **RuntimeError** - 同时指定了 `sampler``shuffle` 参数。
- **RuntimeError** - 同时指定了 `sampler``num_shards` 参数或同时指定了 `sampler``shard_id` 参数。
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
- **ValueError** - `shard_id` 参数值错误(小于0或者大于等于 `num_shards`
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards`
- **ValueError** - `usage` 不在['train', 'test', 'valid', 'all']中。
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
- **ValueError** - `annotation_file` 不存在。
- **ValueError** - `dataset_dir` 不存在。
@ -109,5 +109,5 @@ mindspore.dataset.WIDERFaceDataset
year={2016},
}
.. include:: mindspore.dataset.api_list_vision.rst
.. include:: mindspore.dataset.api_list_vision.rst

View File

@ -9,10 +9,10 @@ mindspore.dataset.WikiTextDataset
参数:
- **dataset_dir** (str) - 包含数据集文件的根目录路径。
- **usage** (str, 可选) - 指定数据集的子集,可取值为'train', 'test', 'valid'或'all'。默认值None读取全部样本。
- **usage** (str, 可选) - 指定数据集的子集,可取值为'train'、'test'、'valid'或'all'。默认值None读取全部样本。
- **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值None读取全部样本。
- **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值None使用mindspore.dataset.config中配置的线程数。
- **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式支持传入bool类型与枚举类型进行指定。默认值mindspore.dataset.Shuffle.GLOBAL
- **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式支持传入bool类型与枚举类型进行指定。默认值`Shuffle.GLOBAL`
如果 `shuffle` 为False则不混洗如果 `shuffle` 为True等同于将 `shuffle` 设置为mindspore.dataset.Shuffle.GLOBAL。
通过传入枚举变量设置数据混洗的模式:
@ -22,14 +22,14 @@ mindspore.dataset.WikiTextDataset
- **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值None。指定此参数后 `num_samples` 表示每个分片的最大样本数。
- **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值None。只有当指定了 `num_shards` 时才能指定此参数。
- **cache** (DatasetCache, 可选) - 单节点数据缓存服务,用于加快数据集处理,详情请阅读 `单节点数据缓存 <https://www.mindspore.cn/tutorials/experts/zh-CN/master/dataset/cache.html>`_ 。默认值None不使用缓存。
异常:
- **RuntimeError** - `dataset_dir` 参数所指向的文件目录不存在或缺少数据集文件。
- **ValueError** - `num_samples` 参数值错误小于0
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数
- **ValueError** - `shard_id` 参数值错误小于0或者大于等于 `num_shards`
异常:
- **RuntimeError** - `dataset_dir` 参数所指向的文件目录不存在或缺少数据集文件。
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards`数。
- **ValueError** - `shard_id` 参数错误小于0或者大于等于 `num_shards`
- **ValueError** - `num_samples` 参数值错误小于0
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数
**关于WikiText数据集**
@ -59,5 +59,5 @@ mindspore.dataset.WikiTextDataset
year={2016}
}
.. include:: mindspore.dataset.api_list_nlp.rst
.. include:: mindspore.dataset.api_list_nlp.rst

View File

@ -9,11 +9,11 @@ mindspore.dataset.YahooAnswersDataset
参数:
- **dataset_dir** (str) - 包含数据集文件的根目录路径。
- **usage** (str, 可选) - 指定数据集的子集,可取值为'train', 'test'或'all'。
- **usage** (str, 可选) - 指定数据集的子集,可取值为'train''test'或'all'。
取值为'train'时将会读取1,400,000个训练样本取值为'test'时将会读取60,000个测试样本取值为'all'时将会读取全部1,460,000个样本。默认值None读取全部样本。
- **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值None读取全部样本。
- **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值None使用mindspore.dataset.config中配置的线程数。
- **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式支持传入bool类型与枚举类型进行指定。默认值mindspore.dataset.Shuffle.GLOBAL
- **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式支持传入bool类型与枚举类型进行指定。默认值`Shuffle.GLOBAL`
如果 `shuffle` 为False则不混洗如果 `shuffle` 为True等同于将 `shuffle` 设置为mindspore.dataset.Shuffle.GLOBAL。
通过传入枚举变量设置数据混洗的模式:
@ -26,10 +26,10 @@ mindspore.dataset.YahooAnswersDataset
异常:
- **RuntimeError** - `dataset_dir` 参数所指向的文件目录不存在或缺少数据集文件。
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
- **ValueError** - `shard_id` 参数值错误小于0或者大于等于 `num_shards` )。
- **ValueError** - `shard_id` 参数错误小于0或者大于等于 `num_shards`
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
**关于YahooAnswers数据集**
@ -59,5 +59,5 @@ mindspore.dataset.YahooAnswersDataset
howpublished = {}
}
.. include:: mindspore.dataset.api_list_nlp.rst
.. include:: mindspore.dataset.api_list_nlp.rst

View File

@ -13,7 +13,7 @@ mindspore.dataset.YelpReviewDataset
对于Polarity数据集'train'将读取560,000个训练样本'test'将读取38,000个测试样本'all'将读取所有598,000个样本。
对于Full数据集'train'将读取650,000个训练样本'test'将读取50,000个测试样本'all'将读取所有700,000个样本。默认值None读取所有样本。
- **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值None读取全部样本。
- **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式支持传入bool类型与枚举类型进行指定。默认值mindspore.dataset.Shuffle.GLOBAL
- **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式支持传入bool类型与枚举类型进行指定。默认值`Shuffle.GLOBAL`
如果 `shuffle` 为False则不混洗如果 `shuffle` 为True等同于将 `shuffle` 设置为mindspore.dataset.Shuffle.GLOBAL。
通过传入枚举变量设置数据混洗的模式:
@ -27,9 +27,9 @@ mindspore.dataset.YelpReviewDataset
异常:
- **RuntimeError** - `dataset_dir` 参数所指向的文件目录不存在或缺少数据集文件。
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
- **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
**关于YelpReview数据集**
@ -88,5 +88,5 @@ mindspore.dataset.YelpReviewDataset
year = {2015},
}
.. include:: mindspore.dataset.api_list_nlp.rst
.. include:: mindspore.dataset.api_list_nlp.rst

View File

@ -25,7 +25,7 @@ mindspore.dataset.YesNoDataset
- **RuntimeError** - 同时指定了 `sampler``num_shards` 参数或同时指定了 `sampler``shard_id` 参数。
- **RuntimeError** - 指定了 `num_shards` 参数,但是未指定 `shard_id` 参数。
- **RuntimeError** - 指定了 `shard_id` 参数,但是未指定 `num_shards` 参数。
- **ValueError** - `shard_id` 参数值错误(小于0或者大于等于 `num_shards`
- **ValueError** - `shard_id` 参数错误,小于0或者大于等于 `num_shards`
.. note:: 此数据集可以指定参数 `sampler` ,但参数 `sampler` 和参数 `shuffle` 的行为是互斥的。下表展示了几种合法的输入参数组合及预期的行为。
@ -79,5 +79,5 @@ mindspore.dataset.YesNoDataset
url = "http://wwww.openslr.org/1/"
}
.. include:: mindspore.dataset.api_list_audio.rst
.. include:: mindspore.dataset.api_list_audio.rst

View File

@ -11,7 +11,7 @@ mindspore.dataset.vision.AdjustGamma
更多详细信息,请参见 `Gamma矫正 <https://en.wikipedia.org/wiki/Gamma_correction>`_
参数:
- **gamma** (float) - 输出图像像素值与输入图像像素值呈指数相关。 `gamma` 大于1使阴影更暗`gamma` 小于1使黑暗区域更亮。
- **gamma** (float) - 非负实数。输出图像像素值与输入图像像素值呈指数相关。 `gamma` 大于1使阴影更暗`gamma` 小于1使黑暗区域更亮。
- **gain** (float, 可选) - 常数乘数。默认值1.0。
异常:

View File

@ -11,7 +11,7 @@ mindspore.dataset.vision.PadToSize
- **offset** (Union[int, Sequence[int, int]], 可选) - 顶部和左侧要填充的长度。
如果输入整型,使用此值填充图像上侧和左侧。
如果提供了序列[int, int],则应按[top, left]的顺序排列,填充图像上侧和左侧。
默认值None表示对称填充。
默认值None表示对称填充,保持原始图像处于中心位置
- **fill_value** (Union[int, tuple[int, int, int]], 可选) - 填充的像素值,仅在 `padding_mode` 取值为Border.CONSTANT时有效。
如果是3元素元组则分别用于填充R、G、B通道。
如果是整数,则用于所有 RGB 通道。

View File

@ -3,7 +3,7 @@ mindspore.dataset.vision.RandomAdjustSharpness
.. py:class:: mindspore.dataset.vision.RandomAdjustSharpness(degree, prob=0.5)
以给定的概率随机调整输入图像的清晰度。
以给定的概率随机调整输入图像的度。
参数:
- **degree** (float) - 锐度调整度,必须是非负的。

View File

@ -7,7 +7,7 @@ mindspore.dataset.vision.RandomAutoContrast
参数:
- **cutoff** (float, 可选) - 输入图像直方图中最亮和最暗像素的百分比。该值必须在 [0.0, 50.0) 范围内。默认值0.0。
- **ignore** (Union[int, sequence], 可选) - 要忽略的背景像素值,忽略值必须在 [0, 255] 范围内。默认值None。
- **ignore** (Union[int, sequence], 可选) - 要忽略的背景像素值,值必须在 [0, 255] 范围内。默认值None。
- **prob** (float, 可选) - 图像被调整对比度的概率,取值范围:[0.0, 1.0]。默认值0.5。
异常:

View File

@ -52,9 +52,9 @@ class CMUArcticDataset(MappableDataset, AudioBaseDataset):
num_parallel_workers (int, optional): Number of workers to read the data.
Default: None, will use value set in the config.
shuffle (bool, optional): Whether or not to perform shuffle on the dataset.
Default: None, expected order behavior shown in the table.
Default: None, expected order behavior shown in the table below.
sampler (Sampler, optional): Object used to choose samples from the
dataset. Default: None, expected order behavior shown in the table.
dataset. Default: None, expected order behavior shown in the table below.
num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None.
When this argument is specified, `num_samples` reflects the max sample number of per shard.
shard_id (int, optional): The shard ID within `num_shards`. Default: None. This
@ -188,9 +188,9 @@ class GTZANDataset(MappableDataset, AudioBaseDataset):
num_parallel_workers (int, optional): Number of workers to read the data.
Default: None, will use value set in the config.
shuffle (bool, optional): Whether or not to perform shuffle on the dataset.
Default: None, expected order behavior shown in the table.
Default: None, expected order behavior shown in the table below.
sampler (Sampler, optional): Object used to choose samples from the
dataset. Default: None, expected order behavior shown in the table.
dataset. Default: None, expected order behavior shown in the table below.
num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None.
When this argument is specified, `num_samples` reflects the max sample number of per shard.
shard_id (int, optional): The shard ID within `num_shards`. Default: None. This
@ -324,9 +324,9 @@ class LibriTTSDataset(MappableDataset, AudioBaseDataset):
num_parallel_workers (int, optional): Number of workers to read the data.
Default: None, will use value set in the config.
shuffle (bool, optional): Whether or not to perform shuffle on the dataset.
Default: None, expected order behavior shown in the table.
Default: None, expected order behavior shown in the table below.
sampler (Sampler, optional): Object used to choose samples from the
dataset. Default: None, expected order behavior shown in the table.
dataset. Default: None, expected order behavior shown in the table below.
num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None.
When this argument is specified, `num_samples` reflects the max sample number of per shard.
shard_id (int, optional): The shard ID within `num_shards`. Default: None. This
@ -610,9 +610,9 @@ class SpeechCommandsDataset(MappableDataset, AudioBaseDataset):
num_parallel_workers (int, optional): Number of workers to read the data.
Default: None, will use value set in the config.
shuffle (bool, optional): Whether or not to perform shuffle on the dataset.
Default: None, expected order behavior shown in the table.
Default: None, expected order behavior shown in the table below.
sampler (Sampler, optional): Object used to choose samples from the dataset.
Default: None, expected order behavior shown in the table.
Default: None, expected order behavior shown in the table below.
num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None.
When this argument is specified, `num_samples` reflects the maximum sample number of per shard.
shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only be specified
@ -623,11 +623,11 @@ class SpeechCommandsDataset(MappableDataset, AudioBaseDataset):
Raises:
RuntimeError: If `dataset_dir` does not contain data files.
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
RuntimeError: If `sampler` and `shuffle` are specified at the same time.
RuntimeError: If `sampler` and `num_shards`/`shard_id` are specified at the same time.
RuntimeError: If `num_shards` is specified but `shard_id` is None.
RuntimeError: If `shard_id` is specified but `num_shards` is None.
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
ValueError: If `shard_id` is invalid (< 0 or >= `num_shards`).
Note:
@ -743,9 +743,9 @@ class TedliumDataset(MappableDataset, AudioBaseDataset):
num_parallel_workers (int, optional): Number of workers to read the data.
Default: None, number set in the config.
shuffle (bool, optional): Whether to perform shuffle on the dataset. Default: None, expected
order behavior shown in the table.
order behavior shown in the table below.
sampler (Sampler, optional): Object used to choose samples from the
dataset. Default: None, expected order behavior shown in the table.
dataset. Default: None, expected order behavior shown in the table below.
num_shards (int, optional): Number of shards that the dataset will be divided
into. Default: None. When this argument is specified, `num_samples` reflects
the maximum sample number of per shard.
@ -757,11 +757,11 @@ class TedliumDataset(MappableDataset, AudioBaseDataset):
Raises:
RuntimeError: If `dataset_dir` does not contain stm files.
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
RuntimeError: If `sampler` and `shuffle` are specified at the same time.
RuntimeError: If `sampler` and `num_shards`/`shard_id` are specified at the same time.
RuntimeError: If `num_shards` is specified but `shard_id` is None.
RuntimeError: If `shard_id` is specified but `num_shards` is None.
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
ValueError: If `shard_id` is invalid (< 0 or >= `num_shards`).
Note:
@ -942,9 +942,9 @@ class YesNoDataset(MappableDataset, AudioBaseDataset):
num_parallel_workers (int, optional): Number of workers to read the data.
Default: None, will use value set in the config.
shuffle (bool, optional): Whether or not to perform shuffle on the dataset.
Default: None, expected order behavior shown in the table.
Default: None, expected order behavior shown in the table below.
sampler (Sampler, optional): Object used to choose samples from the
dataset. Default: None, expected order behavior shown in the table.
dataset. Default: None, expected order behavior shown in the table below.
num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None.
When this argument is specified, `num_samples` reflects the maximum sample number of per shard.
shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only

View File

@ -248,8 +248,9 @@ class TFRecordDataset(SourceDataset, UnionBaseDataset):
Args:
dataset_files (Union[str, list[str]]): String or list of files to be read or glob strings to search for a
pattern of files. The list will be sorted in a lexicographical order.
schema (Union[str, Schema], optional): Path to the JSON schema file or schema object. Default: None.
If the schema is not provided, the meta data from the TFData file is considered the schema.
schema (Union[str, Schema], optional): Data format policy, which specifies the data types and shapes of the data
column to be read. Both JSON file path and objects constructed by mindspore.dataset.Schema are acceptable.
Default: None.
columns_list (list[str], optional): List of columns to be read. Default: None, read all columns.
num_samples (int, optional): The number of samples (rows) to be included in the dataset. Default: None.
If num_samples is None and numRows(parsed from schema) does not exist, read the full dataset;

View File

@ -402,9 +402,9 @@ class CLUEDataset(SourceDataset, TextBaseDataset):
ValueError: task is not in 'AFQMC', 'TNEWS', 'IFLYTEK', 'CMNLI', 'WSC' or 'CSL'.
ValueError: usage is not in 'train', 'test' or 'eval'.
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
ValueError: If `shard_id` is invalid (< 0 or >= `num_shards`).
RuntimeError: If `num_shards` is specified but `shard_id` is None.
RuntimeError: If `shard_id` is specified but `num_shards` is None.
ValueError: If `shard_id` is invalid (< 0 or >= `num_shards`).
Examples:
>>> clue_dataset_dir = ["/path/to/clue_dataset_file"] # contains 1 or multiple clue files
@ -1118,10 +1118,10 @@ class Multi30kDataset(SourceDataset, TextBaseDataset):
num_parallel_workers (int, optional): Number of workers to read the data.
Default: None, number set in the config.
shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch.
Default: `Shuffle.GLOBAL`. Bool type and Shuffle enum are both supported to pass in.
If shuffle is False, no shuffling will be performed;
If shuffle is True, the behavior is the same as setting shuffle to be Shuffle.GLOBAL
Otherwise, there are two levels of shuffling:
Bool type and Shuffle enum are both supported to pass in. Default: `Shuffle.GLOBAL` .
If shuffle is False, no shuffling will be performed.
If shuffle is True, it is equivalent to setting `shuffle` to mindspore.dataset.Shuffle.GLOBAL.
Set the mode of data shuffling by passing in enumeration variables:
- Shuffle.GLOBAL: Shuffle both the files and samples.
@ -1312,10 +1312,10 @@ class SogouNewsDataset(SourceDataset, TextBaseDataset):
'all' will read from all 510,000 samples. Default: None, all samples.
num_samples (int, optional): Number of samples (rows) to read. Default: None, read all samples.
shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch.
Default: `Shuffle.GLOBAL`. Bool type and Shuffle enum are both supported to pass in.
Bool type and Shuffle enum are both supported to pass in. Default: `Shuffle.GLOBAL` .
If shuffle is False, no shuffling will be performed.
If shuffle is True, performs global shuffle.
There are three levels of shuffling, desired shuffle enum defined by mindspore.dataset.Shuffle.
If shuffle is True, it is equivalent to setting `shuffle` to mindspore.dataset.Shuffle.GLOBAL.
Set the mode of data shuffling by passing in enumeration variables:
- Shuffle.GLOBAL: Shuffle both the files and samples, same as setting shuffle to True.
@ -1332,9 +1332,9 @@ class SogouNewsDataset(SourceDataset, TextBaseDataset):
Raises:
RuntimeError: If `dataset_dir` does not contain data files.
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
RuntimeError: If `num_shards` is specified but `shard_id` is None.
RuntimeError: If `shard_id` is specified but `num_shards` is None.
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
Examples:
>>> sogou_news_dataset_dir = "/path/to/sogou_news_dataset_dir"
@ -1404,10 +1404,10 @@ class SQuADDataset(SourceDataset, TextBaseDataset):
num_parallel_workers (int, optional): Number of workers to read the data.
Default: None, number set in the config.
shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch.
Default: `Shuffle.GLOBAL`. Bool type and Shuffle enum are both supported to pass in.
If shuffle is False, no shuffling will be performed;
If shuffle is True, the behavior is the same as setting shuffle to be Shuffle.GLOBAL
Otherwise, there are two levels of shuffling:
Bool type and Shuffle enum are both supported to pass in. Default: `Shuffle.GLOBAL` .
If shuffle is False, no shuffling will be performed.
If shuffle is True, it is equivalent to setting `shuffle` to mindspore.dataset.Shuffle.GLOBAL.
Set the mode of data shuffling by passing in enumeration variables:
- Shuffle.GLOBAL: Shuffle both the files and samples.
@ -1565,10 +1565,10 @@ class UDPOSDataset(SourceDataset, TextBaseDataset):
'all' will read from all 16,622 samples. Default: None, all samples.
num_samples (int, optional): Number of samples (rows) to read. Default: None, reads the full dataset.
shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch.
Default: `Shuffle.GLOBAL`. Bool type and Shuffle enum are both supported to pass in.
If shuffle is False, no shuffling will be performed;
If shuffle is True, the behavior is the same as setting shuffle to be Shuffle.GLOBAL
Otherwise, there are two levels of shuffling:
Bool type and Shuffle enum are both supported to pass in. Default: `Shuffle.GLOBAL` .
If shuffle is False, no shuffling will be performed.
If shuffle is True, it is equivalent to setting `shuffle` to mindspore.dataset.Shuffle.GLOBAL.
Set the mode of data shuffling by passing in enumeration variables:
- Shuffle.GLOBAL: Shuffle both the files and samples.
@ -1586,13 +1586,32 @@ class UDPOSDataset(SourceDataset, TextBaseDataset):
Raises:
RuntimeError: If `dataset_dir` does not contain data files.
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
RuntimeError: If `num_shards` is specified but `shard_id` is None.
RuntimeError: If `shard_id` is specified but `num_shards` is None.
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
Examples:
>>> udpos_dataset_dir = "/path/to/udpos_dataset_dir"
>>> dataset = ds.UDPOSDataset(dataset_dir=udpos_dataset_dir, usage='all')
About UDPOS dataset:
Text corpus dataset that clarifies syntactic or semantic sentence structure.
The corpus comprises 254,830 words and 16,622 sentences, taken from various web media including
weblogs, newsgroups, emails and reviews.
Citation:
.. code-block::
@inproceedings{silveira14gold,
year = {2014},
author = {Natalia Silveira and Timothy Dozat and Marie-Catherine de Marneffe and Samuel Bowman
and Miriam Connor and John Bauer and Christopher D. Manning},
title = {A Gold Standard Dependency Corpus for {E}nglish},
booktitle = {Proceedings of the Ninth International Conference on Language
Resources and Evaluation (LREC-2014)}
}
"""
@check_udpos_dataset
@ -1622,10 +1641,10 @@ class WikiTextDataset(SourceDataset, TextBaseDataset):
num_parallel_workers (int, optional): Number of workers to read the data.
Default: None, number set in the config.
shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch.
Default: `Shuffle.GLOBAL`. Bool type and Shuffle enum are both supported to pass in.
If shuffle is False, no shuffling will be performed;
If shuffle is True, the behavior is the same as setting shuffle to be Shuffle.GLOBAL
Otherwise, there are two levels of shuffling:
Bool type and Shuffle enum are both supported to pass in. Default: `Shuffle.GLOBAL` .
If shuffle is False, no shuffling will be performed.
If shuffle is True, it is equivalent to setting `shuffle` to mindspore.dataset.Shuffle.GLOBAL.
Set the mode of data shuffling by passing in enumeration variables:
- Shuffle.GLOBAL: Shuffle both the files and samples.
@ -1641,11 +1660,11 @@ class WikiTextDataset(SourceDataset, TextBaseDataset):
Raises:
RuntimeError: If `dataset_dir` does not contain data files or invalid.
ValueError: If `num_samples` is invalid (< 0).
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
RuntimeError: If `num_shards` is specified but `shard_id` is None.
RuntimeError: If `shard_id` is specified but `num_shards` is None.
ValueError: If `shard_id` is invalid (< 0 or >= `num_shards`).
ValueError: If `num_samples` is invalid (< 0).
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
About WikiTextDataset dataset:
@ -1711,10 +1730,10 @@ class YahooAnswersDataset(SourceDataset, TextBaseDataset):
num_parallel_workers (int, optional): Number of workers to read the data.
Default: None, number set in the config.
shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch.
Default: `Shuffle.GLOBAL`. Bool type and Shuffle enum are both supported to pass in.
If shuffle is False, no shuffling will be performed;
If shuffle is True, the behavior is the same as setting shuffle to be Shuffle.GLOBAL
Otherwise, there are two levels of shuffling:
Bool type and Shuffle enum are both supported to pass in. Default: `Shuffle.GLOBAL` .
If shuffle is False, no shuffling will be performed.
If shuffle is True, it is equivalent to setting `shuffle` to mindspore.dataset.Shuffle.GLOBAL.
Set the mode of data shuffling by passing in enumeration variables:
- Shuffle.GLOBAL: Shuffle both the files and samples.
@ -1730,10 +1749,10 @@ class YahooAnswersDataset(SourceDataset, TextBaseDataset):
Raises:
RuntimeError: If `dataset_dir` does not contain data files.
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
RuntimeError: If `num_shards` is specified but `shard_id` is None.
RuntimeError: If `shard_id` is specified but `num_shards` is None.
ValueError: If `shard_id` is invalid (< 0 or >= `num_shards`).
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
Examples:
>>> yahoo_answers_dataset_dir = "/path/to/yahoo_answers_dataset_directory"
@ -1804,10 +1823,10 @@ class YelpReviewDataset(SourceDataset, TextBaseDataset):
'all' will read from all 700,000 samples. Default: None, all samples.
num_samples (int, optional): Number of samples (rows) to read. Default: None, reads all samples.
shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch.
Default: `Shuffle.GLOBAL`. Bool type and Shuffle enum are both supported to pass in.
If shuffle is False, no shuffling will be performed;
If shuffle is True, the behavior is the same as setting shuffle to be Shuffle.GLOBAL
Otherwise, there are two levels of shuffling:
Bool type and Shuffle enum are both supported to pass in. Default: `Shuffle.GLOBAL` .
If shuffle is False, no shuffling will be performed.
If shuffle is True, it is equivalent to setting `shuffle` to mindspore.dataset.Shuffle.GLOBAL.
Set the mode of data shuffling by passing in enumeration variables:
- Shuffle.GLOBAL: Shuffle both the files and samples.
@ -1824,9 +1843,9 @@ class YelpReviewDataset(SourceDataset, TextBaseDataset):
Raises:
RuntimeError: If `dataset_dir` does not contain data files.
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
RuntimeError: If `num_shards` is specified but `shard_id` is None.
RuntimeError: If `shard_id` is specified but `num_shards` is None.
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
Examples:
>>> yelp_review_dataset_dir = "/path/to/yelp_review_dataset_dir"

View File

@ -510,15 +510,16 @@ class GeneratorDataset(MappableDataset, UnionBaseDataset):
required to provide either column_names or schema.
column_types (list[mindspore.dtype], optional): List of column data types of the dataset. Default: None.
If provided, sanity check will be performed on generator output.
schema (Union[Schema, str], optional): Path to the JSON schema file or schema object. Default: None. Users are
required to provide either column_names or schema. If both are provided, schema will be used.
schema (Union[str, Schema], optional): Data format policy, which specifies the data types and shapes of the data
column to be read. Both JSON file path and objects constructed by mindspore.dataset.Schema are acceptable.
Default: None.
num_samples (int, optional): The number of samples to be included in the dataset.
Default: None, all images.
num_parallel_workers (int, optional): Number of subprocesses used to fetch the dataset in parallel. Default: 1.
shuffle (bool, optional): Whether or not to perform shuffle on the dataset. Random accessible input is required.
Default: None, expected order behavior shown in the table.
Default: None, expected order behavior shown in the table below.
sampler (Union[Sampler, Iterable], optional): Object used to choose samples from the dataset. Random accessible
input is required. Default: None, expected order behavior shown in the table.
input is required. Default: None, expected order behavior shown in the table below.
num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None.
Random accessible input is required. When this argument is specified, `num_samples` reflects the maximum
sample number of per shard.
@ -844,15 +845,14 @@ class NumpySlicesDataset(GeneratorDataset):
otherwise they will be named like column_0, column_1 ...
num_samples (int, optional): The number of samples to be included in the dataset. Default: None, all samples.
num_parallel_workers (int, optional): Number of subprocesses used to fetch the dataset in parallel. Default: 1.
shuffle (bool, optional): Whether or not to perform shuffle on the dataset. Random accessible input is required.
Default: None, expected order behavior shown in the table.
sampler (Union[Sampler, Iterable], optional): Object used to choose samples from the dataset. Random accessible
input is required. Default: None, expected order behavior shown in the table.
shuffle (bool, optional): Whether or not to perform shuffle on the dataset.
Default: None, expected order behavior shown in the table below.
sampler (Union[Sampler, Iterable], optional): Object used to choose samples from the dataset.
Default: None, expected order behavior shown in the table below.
num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None.
Random accessible input is required. When this argument is specified, `num_samples` reflects the max
sample number of per shard.
When this argument is specified, `num_samples` reflects the max sample number of per shard.
shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument must be specified only
when num_shards is also specified. Random accessible input is required.
when num_shards is also specified.
Note:
- This dataset can take in a `sampler`. `sampler` and `shuffle` are mutually exclusive.

View File

@ -132,10 +132,10 @@ class Caltech101Dataset(GeneratorDataset):
Default: None, all images.
num_parallel_workers (int, optional): Number of workers to read the data. Default: 1.
shuffle (bool, optional): Whether or not to perform shuffle on the dataset.
Default: None, expected order behavior shown in the table.
Default: None, expected order behavior shown in the table below.
decode (bool, optional): Whether or not to decode the images after reading. Default: False.
sampler (Sampler, optional): Object used to choose samples from the
dataset. Default: None, expected order behavior shown in the table.
dataset. Default: None, expected order behavior shown in the table below.
num_shards (int, optional): Number of shards that the dataset will be divided
into. Default: None. When this argument is specified, `num_samples` reflects
the maximum sample number of per shard.
@ -144,13 +144,13 @@ class Caltech101Dataset(GeneratorDataset):
Raises:
RuntimeError: If `dataset_dir` does not contain data files.
ValueError: If `target_type` is not set correctly.
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
RuntimeError: If `sampler` and `shuffle` are specified at the same time.
RuntimeError: If `sampler` and `num_shards`/`shard_id` are specified at the same time.
RuntimeError: If `num_shards` is specified but `shard_id` is None.
RuntimeError: If `shard_id` is specified but `num_shards` is None.
ValueError: If `shard_id` is invalid (< 0 or >= `num_shards`).
ValueError: If `target_type` is not set correctly.
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
Note:
- This dataset can take in a `sampler`. `sampler` and `shuffle` are mutually exclusive.
@ -293,10 +293,10 @@ class Caltech256Dataset(MappableDataset, VisionBaseDataset):
num_parallel_workers (int, optional): Number of workers to read the data.
Default: None, set in the config.
shuffle (bool, optional): Whether or not to perform shuffle on the dataset.
Default: None, expected order behavior shown in the table.
Default: None, expected order behavior shown in the table below.
decode (bool, optional): Whether or not to decode the images after reading. Default: False.
sampler (Sampler, optional): Object used to choose samples from the
dataset. Default: None, expected order behavior shown in the table.
dataset. Default: None, expected order behavior shown in the table below.
num_shards (int, optional): Number of shards that the dataset will be divided
into. Default: None. When this argument is specified, `num_samples` reflects
the maximum sample number of per shard.
@ -308,13 +308,13 @@ class Caltech256Dataset(MappableDataset, VisionBaseDataset):
Raises:
RuntimeError: If `dataset_dir` does not contain data files.
ValueError: If `target_type` is not 'category', 'annotation' or 'all'.
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
RuntimeError: If `sampler` and `shuffle` are specified at the same time.
RuntimeError: If `sampler` and `num_shards`/`shard_id` are specified at the same time.
RuntimeError: If `num_shards` is specified but `shard_id` is None.
RuntimeError: If `shard_id` is specified but `num_shards` is None.
ValueError: If `shard_id` is invalid (< 0 or >= `num_shards`).
ValueError: If `target_type` is not 'category', 'annotation' or 'all'.
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
Note:
- This dataset can take in a `sampler`. `sampler` and `shuffle` are mutually exclusive.
@ -440,13 +440,13 @@ class CelebADataset(MappableDataset, VisionBaseDataset):
Raises:
RuntimeError: If `dataset_dir` does not contain data files.
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
ValueError: If `usage` is not 'train', 'valid', 'test' or 'all'.
RuntimeError: If `sampler` and `shuffle` are specified at the same time.
RuntimeError: If `sampler` and `num_shards`/`shard_id` are specified at the same time.
RuntimeError: If `num_shards` is specified but `shard_id` is None.
RuntimeError: If `shard_id` is specified but `num_shards` is None.
ValueError: If `shard_id` is invalid (< 0 or >= `num_shards`).
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
ValueError: If `usage` is not 'train', 'valid', 'test' or 'all'.
Note:
- This dataset can take in a `sampler`. `sampler` and `shuffle` are mutually exclusive.
@ -595,9 +595,9 @@ class Cifar10Dataset(MappableDataset, VisionBaseDataset):
num_parallel_workers (int, optional): Number of workers to read the data.
Default: None, number set in the config.
shuffle (bool, optional): Whether to perform shuffle on the dataset. Default: None, expected
order behavior shown in the table.
order behavior shown in the table below.
sampler (Sampler, optional): Object used to choose samples from the
dataset. Default: None, expected order behavior shown in the table.
dataset. Default: None, expected order behavior shown in the table below.
num_shards (int, optional): Number of shards that the dataset will be divided
into. Default: None. When this argument is specified, `num_samples` reflects
the maximum sample number of per shard.
@ -609,13 +609,13 @@ class Cifar10Dataset(MappableDataset, VisionBaseDataset):
Raises:
RuntimeError: If `dataset_dir` does not contain data files.
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
ValueError: If `usage` is not 'train', 'test' or 'all'.
RuntimeError: If `sampler` and `shuffle` are specified at the same time.
RuntimeError: If `sampler` and `num_shards`/`shard_id` are specified at the same time.
RuntimeError: If `num_shards` is specified but `shard_id` is None.
RuntimeError: If `shard_id` is specified but `num_shards` is None.
ValueError: If `shard_id` is invalid (< 0 or >= `num_shards`).
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
ValueError: If `usage` is not 'train', 'test' or 'all'.
Note:
- This dataset can take in a `sampler`. `sampler` and `shuffle` are mutually exclusive.
@ -727,9 +727,9 @@ class Cifar100Dataset(MappableDataset, VisionBaseDataset):
num_parallel_workers (int, optional): Number of workers to read the data.
Default: None, number set in the config.
shuffle (bool, optional): Whether to perform shuffle on the dataset. Default: None, expected
order behavior shown in the table.
order behavior shown in the table below.
sampler (Sampler, optional): Object used to choose samples from the
dataset. Default: None, expected order behavior shown in the table.
dataset. Default: None, expected order behavior shown in the table below.
num_shards (int, optional): Number of shards that the dataset will be divided
into. Default: None. When this argument is specified, 'num_samples' reflects
the maximum sample number of per shard.
@ -741,13 +741,13 @@ class Cifar100Dataset(MappableDataset, VisionBaseDataset):
Raises:
RuntimeError: If `dataset_dir` does not contain data files.
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
ValueError: If `usage` is not 'train', 'test' or 'all'.
RuntimeError: If `sampler` and `shuffle` are specified at the same time.
RuntimeError: If `sampler` and `num_shards`/`shard_id` are specified at the same time.
RuntimeError: If `num_shards` is specified but `shard_id` is None.
RuntimeError: If `shard_id` is specified but `num_shards` is None.
ValueError: If `shard_id` is invalid (< 0 or >= `num_shards`).
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
ValueError: If `usage` is not 'train', 'test' or 'all'.
Note:
- This dataset can take in a `sampler`. `sampler` and `shuffle` are mutually exclusive.
@ -856,10 +856,10 @@ class CityscapesDataset(MappableDataset, VisionBaseDataset):
num_parallel_workers (int, optional): Number of workers to read the data.
Default: None, number set in the config.
shuffle (bool, optional): Whether to perform shuffle on the dataset. Default: None, expected
order behavior shown in the table.
order behavior shown in the table below.
decode (bool, optional): Decode the images after reading. Default: False.
sampler (Sampler, optional): Object used to choose samples from the
dataset. Default: None, expected order behavior shown in the table.
dataset. Default: None, expected order behavior shown in the table below.
num_shards (int, optional): Number of shards that the dataset will be divided
into. Default: None. When this argument is specified, `num_samples` reflects
the max sample number of per shard.
@ -871,11 +871,11 @@ class CityscapesDataset(MappableDataset, VisionBaseDataset):
Raises:
RuntimeError: If `dataset_dir` is invalid or does not contain data files.
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
RuntimeError: If `sampler` and `shuffle` are specified at the same time.
RuntimeError: If `sampler` and `num_shards`/`shard_id` are specified at the same time.
RuntimeError: If `num_shards` is specified but `shard_id` is None.
RuntimeError: If `shard_id` is specified but `num_shards` is None.
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
ValueError: If `dataset_dir` is not exist.
ValueError: If `task` is invalid.
ValueError: If `quality_mode` is invalid.
@ -1024,10 +1024,10 @@ class CocoDataset(MappableDataset, VisionBaseDataset):
num_parallel_workers (int, optional): Number of workers to read the data.
Default: None, number set in the configuration file.
shuffle (bool, optional): Whether to perform shuffle on the dataset. Default: None, expected
order behavior shown in the table.
order behavior shown in the table below.
decode (bool, optional): Decode the images after reading. Default: False.
sampler (Sampler, optional): Object used to choose samples from the dataset.
Default: None, expected order behavior shown in the table.
Default: None, expected order behavior shown in the table below.
num_shards (int, optional): Number of shards that the dataset will be divided
into. Default: None. When this argument is specified, `num_samples` reflects
the maximum sample number of per shard.
@ -1273,10 +1273,10 @@ class DIV2KDataset(MappableDataset, VisionBaseDataset):
num_parallel_workers (int, optional): Number of workers to read the data.
Default: None, number set in the config.
shuffle (bool, optional): Whether to perform shuffle on the dataset. Default: None, expected
order behavior shown in the table.
order behavior shown in the table below.
decode (bool, optional): Decode the images after reading. Default: False.
sampler (Sampler, optional): Object used to choose samples from the
dataset. Default: None, expected order behavior shown in the table.
dataset. Default: None, expected order behavior shown in the table below.
num_shards (int, optional): Number of shards that the dataset will be divided
into. Default: None. When this argument is specified, `num_samples` reflects
the max sample number of per shard.
@ -1803,10 +1803,10 @@ class FlickrDataset(MappableDataset, VisionBaseDataset):
num_parallel_workers (int, optional): Number of workers to read the data.
Default: None, number set in the config.
shuffle (bool, optional): Whether to perform shuffle on the dataset. Default: None, expected
order behavior shown in the table.
order behavior shown in the table below.
decode (bool, optional): Decode the images after reading. Default: None.
sampler (Sampler, optional): Object used to choose samples from the
dataset. Default: None, expected order behavior shown in the table.
dataset. Default: None, expected order behavior shown in the table below.
num_shards (int, optional): Number of shards that the dataset will be divided
into. Default: None. When this argument is specified, `num_samples` reflects
the max sample number of per shard.
@ -2208,9 +2208,9 @@ class ImageFolderDataset(MappableDataset, VisionBaseDataset):
num_parallel_workers (int, optional): Number of workers to read the data.
Default: None, set in the config.
shuffle (bool, optional): Whether or not to perform shuffle on the dataset.
Default: None, expected order behavior shown in the table.
Default: None, expected order behavior shown in the table below.
sampler (Sampler, optional): Object used to choose samples from the
dataset. Default: None, expected order behavior shown in the table.
dataset. Default: None, expected order behavior shown in the table below.
extensions (list[str], optional): List of file extensions to be
included in the dataset. Default: None.
class_indexing (dict, optional): A str-to-int mapping from folder name to index
@ -2354,10 +2354,10 @@ class KITTIDataset(MappableDataset):
num_parallel_workers (int, optional): Number of workers to read the data.
Default: None, number set in the config.
shuffle (bool, optional): Whether to perform shuffle on the dataset. Default: None, expected
order behavior shown in the table.
order behavior shown in the table below.
decode (bool, optional): Decode the images after reading. Default: False.
sampler (Sampler, optional): Object used to choose samples from the dataset.
Default: None, expected order behavior shown in the table.
Default: None, expected order behavior shown in the table below.
num_shards (int, optional): Number of shards that the dataset will be divided
into. Default: None. When this argument is specified, 'num_samples' reflects
the max sample number of per shard.
@ -2621,10 +2621,10 @@ class LFWDataset(MappableDataset, VisionBaseDataset):
num_parallel_workers (int, optional): Number of workers to read the data.
Default: None, set in the config.
shuffle (bool, optional): Whether or not to perform shuffle on the dataset.
Default: None, expected order behavior shown in the table.
Default: None, expected order behavior shown in the table below.
decode (bool, optional): Decode the images after reading. Default: False.
sampler (Sampler, optional): Object used to choose samples from the
dataset. Default: None, expected order behavior shown in the table.
dataset. Default: None, expected order behavior shown in the table below.
num_shards (int, optional): Number of shards that the dataset will be divided
into. Default: None. When this argument is specified, 'num_samples' reflects
the max sample number of per shard.
@ -2771,10 +2771,10 @@ class LSUNDataset(MappableDataset, VisionBaseDataset):
num_parallel_workers (int, optional): Number of workers to read the data.
Default: None, set in the config.
shuffle (bool, optional): Whether or not to perform shuffle on the dataset.
Default: None, expected order behavior shown in the table.
Default: None, expected order behavior shown in the table below.
decode (bool, optional): Decode the images after reading. Default: False.
sampler (Sampler, optional): Object used to choose samples from the
dataset. Default: None, expected order behavior shown in the table.
dataset. Default: None, expected order behavior shown in the table below.
num_shards (int, optional): Number of shards that the dataset will be divided
into. Default: None. When this argument is specified, 'num_samples' reflects
the max sample number of per shard.
@ -2899,9 +2899,9 @@ class ManifestDataset(MappableDataset, VisionBaseDataset):
num_parallel_workers (int, optional): Number of workers to read the data.
Default: None, will use value set in the config.
shuffle (bool, optional): Whether to perform shuffle on the dataset. Default: None, expected
order behavior shown in the table.
order behavior shown in the table below.
sampler (Sampler, optional): Object used to choose samples from the
dataset. Default: None, expected order behavior shown in the table.
dataset. Default: None, expected order behavior shown in the table below.
class_indexing (dict, optional): A str-to-int mapping from label name to index.
Default: None, the folder names will be sorted alphabetically and each
class will be given a unique index starting from 0.
@ -3021,9 +3021,9 @@ class MnistDataset(MappableDataset, VisionBaseDataset):
num_parallel_workers (int, optional): Number of workers to read the data.
Default: None, will use value set in the config.
shuffle (bool, optional): Whether or not to perform shuffle on the dataset.
Default: None, expected order behavior shown in the table.
Default: None, expected order behavior shown in the table below.
sampler (Sampler, optional): Object used to choose samples from the
dataset. Default: None, expected order behavior shown in the table.
dataset. Default: None, expected order behavior shown in the table below.
num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None.
When this argument is specified, `num_samples` reflects the maximum sample number of per shard.
shard_id (int, optional): The shard ID within `num_shards`. Default: None. This
@ -3142,10 +3142,10 @@ class OmniglotDataset(MappableDataset):
num_parallel_workers (int, optional): Number of workers to read the data.
Default: None, set in the config.
shuffle (bool, optional): Whether or not to perform shuffle on the dataset.
Default: None, expected order behavior shown in the table.
Default: None, expected order behavior shown in the table below.
decode (bool, optional): Decode the images after reading. Default: False.
sampler (Sampler, optional): Object used to choose samples from the
dataset. Default: None, expected order behavior shown in the table.
dataset. Default: None, expected order behavior shown in the table below.
num_shards (int, optional): Number of shards that the dataset will be divided
into. Default: None. When this argument is specified, 'num_samples' reflects
the max sample number of per shard.
@ -3414,22 +3414,22 @@ class Places365Dataset(MappableDataset, VisionBaseDataset):
The generated dataset has two columns :py:obj:`[image, label]`.
The tensor of column :py:obj:`image` is of the uint8 type.
The tensor of column :py:obj:`label` is a scalar of the uint32 type.
The tensor of column :py:obj:`label` is of the uint32 type.
Args:
dataset_dir (str): Path to the root directory that contains the dataset.
usage (str, optional): Usage of this dataset, can be 'train-standard', 'train-challenge' or 'val'.
Default: None, will be set to 'train-standard'.
small (bool, optional): Use 256 * 256 images (True) or high resolution images (False). Default: False.
decode (bool, optional): Decode the images after reading. Default: True.
decode (bool, optional): Decode the images after reading. Default: False.
num_samples (int, optional): The number of images to be included in the dataset.
Default: None, will read all images.
num_parallel_workers (int, optional): Number of workers to read the data.
Default: None, will use value set in the config.
shuffle (bool, optional): Whether or not to perform shuffle on the dataset.
Default: None, expected order behavior shown in the table.
Default: None, expected order behavior shown in the table below.
sampler (Sampler, optional): Object used to choose samples from the
dataset. Default: None, expected order behavior shown in the table.
dataset. Default: None, expected order behavior shown in the table below.
num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None.
When this argument is specified, `num_samples` reflects the max sample number of per shard.
shard_id (int, optional): The shard ID within `num_shards`. Default: None. This
@ -3440,11 +3440,11 @@ class Places365Dataset(MappableDataset, VisionBaseDataset):
Raises:
RuntimeError: If `dataset_dir` does not contain data files.
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
RuntimeError: If `sampler` and `shuffle` are specified at the same time.
RuntimeError: If `sampler` and `num_shards`/`shard_id` are specified at the same time.
RuntimeError: If `num_shards` is specified but `shard_id` is None.
RuntimeError: If `shard_id` is specified but `num_shards` is None.
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
ValueError: If `shard_id` is invalid (< 0 or >= `num_shards`).
ValueError: If `usage` is not in ["train-standard", "train-challenge", "val"].
@ -3556,7 +3556,7 @@ class QMnistDataset(MappableDataset, VisionBaseDataset):
The generated dataset has two columns :py:obj:`[image, label]`.
The tensor of column :py:obj:`image` is of the uint8 type.
The tensor of column :py:obj:`label` is a scalar when `compat` is True else a tensor both of the uint32 type.
The tensor of column :py:obj:`label` is of the uint32 type.
Args:
dataset_dir (str): Path to the root directory that contains the dataset.
@ -3569,9 +3569,9 @@ class QMnistDataset(MappableDataset, VisionBaseDataset):
num_parallel_workers (int, optional): Number of workers to read the data.
Default: None, will use value set in the config.
shuffle (bool, optional): Whether or not to perform shuffle on the dataset.
Default: None, expected order behavior shown in the table.
Default: None, expected order behavior shown in the table below.
sampler (Sampler, optional): Object used to choose samples from the
dataset. Default: None, expected order behavior shown in the table.
dataset. Default: None, expected order behavior shown in the table below.
num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None.
When this argument is specified, `num_samples` reflects the maximum sample number of per shard.
shard_id (int, optional): The shard ID within `num_shards`. Default: None. This
@ -3582,12 +3582,12 @@ class QMnistDataset(MappableDataset, VisionBaseDataset):
Raises:
RuntimeError: If `dataset_dir` does not contain data files.
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
RuntimeError: If `sampler` and `shuffle` are specified at the same time.
RuntimeError: If `sampler` and `num_shards`/`shard_id` are specified at the same time.
RuntimeError: If `num_shards` is specified but `shard_id` is None.
RuntimeError: If `shard_id` is specified but `num_shards` is None.
ValueError: If `shard_id` is invalid (< 0 or >= `num_shards`).
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
Note:
- This dataset can take in a `sampler`. `sampler` and `shuffle` are mutually exclusive.
@ -3683,8 +3683,9 @@ class RandomDataset(SourceDataset, VisionBaseDataset):
Args:
total_rows (int, optional): Number of samples for the dataset to generate.
Default: None, number of samples is random.
schema (Union[str, Schema], optional): Path to the JSON schema file or schema object. Default: None.
If the schema is not provided, the random dataset generates a random schema.
schema (Union[str, Schema], optional): Data format policy, which specifies the data types and shapes of the data
column to be read. Both JSON file path and objects constructed by mindspore.dataset.Schema are acceptable.
Default: None.
columns_list (list[str], optional): List of column names of the dataset.
Default: None, the columns will be named like this "c0", "c1", "c2" etc.
num_samples (int, optional): The number of samples to be included in the dataset.
@ -3695,12 +3696,33 @@ class RandomDataset(SourceDataset, VisionBaseDataset):
`Single-Node Data Cache <https://www.mindspore.cn/tutorials/experts/en/master/dataset/cache.html>`_ .
Default: None, which means no cache is used.
shuffle (bool, optional): Whether or not to perform shuffle on the dataset.
Default: None, expected order behavior shown in the table.
Default: None, expected order behavior shown in the table below.
num_shards (int, optional): Number of shards that the dataset will be divided
into. Default: None. When this argument is specified, 'num_samples' reflects
the maximum sample number of per shard.
shard_id (int, optional): The shard ID within `num_shards`. Default: None. This
argument can only be specified when `num_shards` is also specified.
Raises:
RuntimeError: If `num_shards` is specified but `shard_id` is None.
RuntimeError: If `shard_id` is specified but `num_shards` is None.
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
ValueError: If `shard_id` is invalid (< 0 or >= `num_shards`).
TypeError: If `total_rows` is not of type int.
TypeError: If `num_shards` is not of type int.
TypeError: If `num_parallel_workers` is not of type int.
TypeError: If `shuffle` is not of type bool.
TypeError: If `columns_list` is not of type list.
Examples:
>>> from mindspore import dtype as mstype
>>> import mindspore.dataset as ds
>>>
>>> schema = ds.Schema()
>>> schema.add_column('image', de_type=mstype.uint8, shape=[2])
>>> schema.add_column('label', de_type=mstype.uint8, shape=[1])
>>># apply dataset operations
>>> ds1 = ds.RandomDataset(schema=schema, total_rows=50, num_parallel_workers=4)
"""
@check_random_dataset
@ -3788,11 +3810,12 @@ class SBDataset(GeneratorDataset):
"""
A source dataset that reads and parses Semantic Boundaries Dataset.
The generated dataset has two columns: :py:obj:`[image, task]`.
By configuring the 'Task' parameter, the generated dataset has different output columns.
- The tensor of column :py:obj:`image` is of the uint8 type.
- The tensor of column :py:obj:`task` contains 20 images of the uint8 type if `task` is 'Boundaries' otherwise
contains 1 image of the uint8 type.
- 'task' = 'Boundaries' , there are two output columns: the 'image' column has the data type uint8 and
the 'label' column contains one image of the data type uint8.
- 'task' = 'Segmentation' , there are two output columns: the 'image' column has the data type uint8 and
the 'label' column contains 20 images of the data type uint8.
Args:
dataset_dir (str): Path to the root directory that contains the dataset.
@ -3800,13 +3823,12 @@ class SBDataset(GeneratorDataset):
usage (str, optional): Acceptable usages include 'train', 'val', 'train_noval' and 'all'. Default: 'all'.
num_samples (int, optional): The number of images to be included in the dataset.
Default: None, all images.
num_parallel_workers (int, optional): Number of workers to read the data.
Default: None, number set in the config.
num_parallel_workers (int, optional): Number of workers to read the data. Default: 1, number set in the config.
shuffle (bool, optional): Whether to perform shuffle on the dataset. Default: None, expected
order behavior shown in the table.
order behavior shown in the table below.
decode (bool, optional): Decode the images after reading. Default: None.
sampler (Sampler, optional): Object used to choose samples from the
dataset. Default: None, expected order behavior shown in the table.
dataset. Default: None, expected order behavior shown in the table below.
num_shards (int, optional): Number of shards that the dataset will be divided
into. Default: None. When this argument is specified, `num_samples` reflects
the max sample number of per shard.
@ -3815,12 +3837,12 @@ class SBDataset(GeneratorDataset):
Raises:
RuntimeError: If `dataset_dir` is not valid or does not contain data files.
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
RuntimeError: If `sampler` and `shuffle` are specified at the same time.
RuntimeError: If `sampler` and `num_shards`/`shard_id` are specified at the same time.
RuntimeError: If `num_shards` is specified but `shard_id` is None.
RuntimeError: If `shard_id` is specified but `num_shards` is None.
ValueError: If `dataset_dir` is not exist.
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
ValueError: If `task` is not in ['Boundaries', 'Segmentation'].
ValueError: If `usage` is not in ['train', 'val', 'train_noval', 'all'].
ValueError: If `shard_id` is invalid (< 0 or >= `num_shards`).
@ -3928,15 +3950,15 @@ class SBUDataset(MappableDataset, VisionBaseDataset):
Args:
dataset_dir (str): Path to the root directory that contains the dataset.
decode (bool, optional): Decode the images after reading. Default: False.
num_samples (int, optional): The number of images to be included in the dataset.
Default: None, will read all images.
num_parallel_workers (int, optional): Number of workers to read the data.
Default: None, will use value set in the config.
shuffle (bool, optional): Whether or not to perform shuffle on the dataset.
Default: None, expected order behavior shown in the table.
Default: None, expected order behavior shown in the table below.
decode (bool, optional): Decode the images after reading. Default: False.
sampler (Sampler, optional): Object used to choose samples from the
dataset. Default: None, expected order behavior shown in the table.
dataset. Default: None, expected order behavior shown in the table below.
num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None.
When this argument is specified, `num_samples` reflects the max sample number of per shard.
shard_id (int, optional): The shard ID within `num_shards`. Default: None. This
@ -3947,11 +3969,11 @@ class SBUDataset(MappableDataset, VisionBaseDataset):
Raises:
RuntimeError: If `dataset_dir` does not contain data files.
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
RuntimeError: If `sampler` and `shuffle` are specified at the same time.
RuntimeError: If `sampler` and `num_shards`/`shard_id` are specified at the same time.
RuntimeError: If `num_shards` is specified but `shard_id` is None.
RuntimeError: If `shard_id` is specified but `num_shards` is None.
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
ValueError: If `shard_id` is invalid (< 0 or >= `num_shards`).
Note:
@ -4048,9 +4070,9 @@ class SemeionDataset(MappableDataset, VisionBaseDataset):
num_parallel_workers (int, optional): Number of workers to read the data.
Default: None, number set in the config.
shuffle (bool, optional): Whether to perform shuffle on the dataset. Default: None, expected
order behavior shown in the table.
order behavior shown in the table below.
sampler (Sampler, optional): Object used to choose samples from the
dataset. Default: None, expected order behavior shown in the table.
dataset. Default: None, expected order behavior shown in the table below.
num_shards (int, optional): Number of shards that the dataset will be divided
into. Default: None. When this argument is specified, `num_samples` reflects
the maximum sample number of per shard.
@ -4062,11 +4084,11 @@ class SemeionDataset(MappableDataset, VisionBaseDataset):
Raises:
RuntimeError: If `dataset_dir` does not contain data files.
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
RuntimeError: If `sampler` and `shuffle` are specified at the same time.
RuntimeError: If `sampler` and `num_shards`/`shard_id` are specified at the same time.
RuntimeError: If `num_shards` is specified but `shard_id` is None.
RuntimeError: If `shard_id` is specified but `num_shards` is None.
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
ValueError: If `shard_id` is invalid (< 0 or >= `num_shards`).
Note:
@ -4176,9 +4198,9 @@ class STL10Dataset(MappableDataset, VisionBaseDataset):
num_parallel_workers (int, optional): Number of workers to read the data.
Default: None, number set in the config.
shuffle (bool, optional): Whether to perform shuffle on the dataset. Default: None, expected
order behavior shown in the table.
order behavior shown in the table below.
sampler (Sampler, optional): Object used to choose samples from the
dataset. Default: None, expected order behavior shown in the table.
dataset. Default: None, expected order behavior shown in the table below.
num_shards (int, optional): Number of shards that the dataset will be divided
into. Default: None. When this argument is specified, 'num_samples' reflects
the max sample number of per shard.
@ -4190,12 +4212,12 @@ class STL10Dataset(MappableDataset, VisionBaseDataset):
Raises:
RuntimeError: If `dataset_dir` is not valid or does not exist or does not contain data files.
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
RuntimeError: If `sampler` and `shuffle` are specified at the same time.
RuntimeError: If `sampler` and `num_shards`/`shard_id` are specified at the same time.
RuntimeError: If `num_shards` is specified but `shard_id` is None.
RuntimeError: If `shard_id` is specified but `num_shards` is None.
ValueError: If `usage` is invalid.
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
ValueError: If `shard_id` is invalid (< 0 or >= `num_shards`).
Note:
@ -4341,24 +4363,23 @@ class SVHNDataset(GeneratorDataset):
Default: None, will read all samples.
num_samples (int, optional): The number of samples to be included in the dataset. Default: None, all images.
num_parallel_workers (int, optional): Number of subprocesses used to fetch the dataset in parallel. Default: 1.
shuffle (bool, optional): Whether or not to perform shuffle on the dataset. Random accessible input is required.
Default: None, expected order behavior shown in the table.
shuffle (bool, optional): Whether or not to perform shuffle on the dataset.
Default: None, expected order behavior shown in the table below.
sampler (Sampler, optional): Object used to choose samples from the dataset. Random accessible
input is required. Default: None, expected order behavior shown in the table.
input is required. Default: None, expected order behavior shown in the table below.
num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None.
Random accessible input is required. When this argument is specified, 'num_samples' reflects the max
sample number of per shard.
When this argument is specified, 'num_samples' reflects the max sample number of per shard.
shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument must be specified only
when num_shards is also specified. Random accessible input is required.
when num_shards is also specified.
Raises:
RuntimeError: If `dataset_dir` is not valid or does not exist or does not contain data files.
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
RuntimeError: If `sampler` and `shuffle` are specified at the same time.
RuntimeError: If `sampler` and `num_shards`/`shard_id` are specified at the same time.
RuntimeError: If `num_shards` is specified but `shard_id` is None.
RuntimeError: If `shard_id` is specified but `num_shards` is None.
ValueError: If `usage` is invalid.
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
ValueError: If `shard_id` is invalid (< 0 or >= `num_shards`).
Note:
@ -4443,7 +4464,7 @@ class USPSDataset(SourceDataset, VisionBaseDataset):
The generated dataset has two columns: :py:obj:`[image, label]`.
The tensor of column :py:obj:`image` is of the uint8 type.
The tensor of column :py:obj:`label` is of a scalar of uint32 type.
The tensor of column :py:obj:`label` is of the uint32 type.
Args:
dataset_dir (str): Path to the root directory that contains the dataset.
@ -4455,10 +4476,10 @@ class USPSDataset(SourceDataset, VisionBaseDataset):
num_parallel_workers (int, optional): Number of workers to read the data.
Default: None, will use value set in the config.
shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch.
Default: Shuffle.GLOBAL. Bool type and Shuffle enum are both supported to pass in.
If shuffle is False, no shuffling will be performed;
If shuffle is True, the behavior is the same as setting shuffle to be Shuffle.GLOBAL
Otherwise, there are two levels of shuffling:
Bool type and Shuffle enum are both supported to pass in. Default: `Shuffle.GLOBAL` .
If shuffle is False, no shuffling will be performed.
If shuffle is True, it is equivalent to setting `shuffle` to mindspore.dataset.Shuffle.GLOBAL.
Set the mode of data shuffling by passing in enumeration variables:
- Shuffle.GLOBAL: Shuffle both the files and samples.
@ -4474,10 +4495,10 @@ class USPSDataset(SourceDataset, VisionBaseDataset):
Raises:
RuntimeError: If `dataset_dir` is not valid or does not exist or does not contain data files.
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
RuntimeError: If `num_shards` is specified but `shard_id` is None.
RuntimeError: If `shard_id` is specified but `num_shards` is None.
ValueError: If `usage` is invalid.
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
ValueError: If `shard_id` is invalid (< 0 or >= `num_shards`).
Examples:
@ -4560,10 +4581,10 @@ class VOCDataset(MappableDataset, VisionBaseDataset):
num_parallel_workers (int, optional): Number of workers to read the data.
Default: None, number set in the config.
shuffle (bool, optional): Whether to perform shuffle on the dataset. Default: None, expected
order behavior shown in the table.
order behavior shown in the table below.
decode (bool, optional): Decode the images after reading. Default: False.
sampler (Sampler, optional): Object used to choose samples from the dataset.
Default: None, expected order behavior shown in the table.
Default: None, expected order behavior shown in the table below.
num_shards (int, optional): Number of shards that the dataset will be divided
into. Default: None. When this argument is specified, `num_samples` reflects
the maximum sample number of per shard.
@ -4759,10 +4780,10 @@ class WIDERFaceDataset(MappableDataset, VisionBaseDataset):
num_parallel_workers (int, optional): Number of workers to read the data.
Default: None, will use value set in the config.
shuffle (bool, optional): Whether or not to perform shuffle on the dataset.
Default: None, expected order behavior shown in the table.
Default: None, expected order behavior shown in the table below.
decode (bool, optional): Decode the images after reading. Default: False.
sampler (Sampler, optional): Object used to choose samples from the dataset.
Default: None, expected order behavior shown in the table.
Default: None, expected order behavior shown in the table below.
num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None.
When this argument is specified, `num_samples` reflects the maximum sample number of per shard.
shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only be specified
@ -4773,13 +4794,13 @@ class WIDERFaceDataset(MappableDataset, VisionBaseDataset):
Raises:
RuntimeError: If `dataset_dir` does not contain data files.
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
RuntimeError: If `sampler` and `shuffle` are specified at the same time.
RuntimeError: If `sampler` and `num_shards`/`shard_id` are specified at the same time.
RuntimeError: If `num_shards` is specified but `shard_id` is None.
RuntimeError: If `shard_id` is specified but `num_shards` is None.
ValueError: If `shard_id` is invalid (< 0 or >= `num_shards`).
ValueError: If `usage` is not in ['train', 'test', 'valid', 'all'].
ValueError: If `num_parallel_workers` exceeds the max thread numbers.
ValueError: If `annotation_file` is not exist.
ValueError: If `dataset_dir` is not exist.

View File

@ -1282,13 +1282,13 @@ class InMemoryGraphDataset(GeneratorDataset):
Default: 'graph'.
num_samples (int, optional): The number of samples to be included in the dataset. Default: None, all samples.
num_parallel_workers (int, optional): Number of subprocesses used to fetch the dataset in parallel. Default: 1.
shuffle (bool, optional): Whether or not to perform shuffle on the dataset. Random accessible input is required.
Default: None, expected order behavior shown in the table.
shuffle (bool, optional): Whether or not to perform shuffle on the dataset.
Default: None, expected order behavior shown in the table below.
num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None.
Random accessible input is required. When this argument is specified, `num_samples` reflects the max
When this argument is specified, `num_samples` reflects the max
sample number of per shard.
shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument must be specified only
when num_shards is also specified. Random accessible input is required.
when num_shards is also specified.
python_multiprocessing (bool, optional): Parallelize Python operations with multiple worker process. This
option could be beneficial if the Python operation is computational heavy. Default: True.
max_rowsize(int, optional): Maximum size of row in MB that is used for shared memory allocation to copy
@ -1386,8 +1386,8 @@ class ArgoverseDataset(InMemoryGraphDataset):
recommend to specify it with
`column_names=["edge_index", "x", "y", "cluster", "valid_len", "time_step_len"]` like the following example.
num_parallel_workers (int, optional): Number of subprocesses used to fetch the dataset in parallel. Default: 1.
shuffle (bool, optional): Whether or not to perform shuffle on the dataset. Random accessible input is required.
Default: None, expected order behavior shown in the table.
shuffle (bool, optional): Whether or not to perform shuffle on the dataset.
Default: None, expected order behavior shown in the table below.
python_multiprocessing (bool, optional): Parallelize Python operations with multiple worker process. This
option could be beneficial if the Python operation is computational heavy. Default: True.
perf_mode(bool, optional): mode for obtaining higher performance when iterate created dataset(will call