diff --git a/docs/api/api_python/dataset/dataset_method/attribute/mindspore.dataset.Dataset.get_repeat_count.rst b/docs/api/api_python/dataset/dataset_method/attribute/mindspore.dataset.Dataset.get_repeat_count.rst index bfc82f33d8b..19cbdf2132b 100644 --- a/docs/api/api_python/dataset/dataset_method/attribute/mindspore.dataset.Dataset.get_repeat_count.rst +++ b/docs/api/api_python/dataset/dataset_method/attribute/mindspore.dataset.Dataset.get_repeat_count.rst @@ -3,7 +3,7 @@ mindspore.dataset.Dataset.get_repeat_count .. py:method:: mindspore.dataset.Dataset.get_repeat_count() - 获取 `RepeatDataset` 中定义的repeat操作的次数,默认值:1。 + 获取 `RepeatDataset` 中定义的repeat操作的次数。默认值:1。 返回: int,repeat操作的次数。 diff --git a/docs/api/api_python/dataset/dataset_method/attribute/mindspore.dataset.Dataset.output_shapes.rst b/docs/api/api_python/dataset/dataset_method/attribute/mindspore.dataset.Dataset.output_shapes.rst index 096f40f4341..df36aa4b55c 100644 --- a/docs/api/api_python/dataset/dataset_method/attribute/mindspore.dataset.Dataset.output_shapes.rst +++ b/docs/api/api_python/dataset/dataset_method/attribute/mindspore.dataset.Dataset.output_shapes.rst @@ -7,7 +7,7 @@ mindspore.dataset.Dataset.output_shapes 参数: - **estimate** (bool) - 如果 `estimate` 为 False,将返回数据集第一条数据的shape。 - 否则将遍历整个数据集以获取数据集的真实shape信息,其中动态变化的维度将被标记为None(可用于动态shape数据集场景),默认值:False。 + 否则将遍历整个数据集以获取数据集的真实shape信息,其中动态变化的维度将被标记为None(可用于动态shape数据集场景)。默认值:False。 返回: list,每列数据的shape列表。 diff --git a/docs/api/api_python/dataset/dataset_method/operation/mindspore.dataset.Dataset.save.rst b/docs/api/api_python/dataset/dataset_method/operation/mindspore.dataset.Dataset.save.rst index a9bd83c57a4..fec893306f7 100644 --- a/docs/api/api_python/dataset/dataset_method/operation/mindspore.dataset.Dataset.save.rst +++ b/docs/api/api_python/dataset/dataset_method/operation/mindspore.dataset.Dataset.save.rst @@ -62,5 +62,5 @@ mindspore.dataset.Dataset.save 参数: - **file_name** (str) - 数据集文件的路径。 - - **num_files** (int, 可选) - 数据集文件的数量,默认值:1。 - - **file_type** (str, 可选) - 数据集格式,默认值:'mindrecord'。 + - **num_files** (int, 可选) - 数据集文件的数量。默认值:1。 + - **file_type** (str, 可选) - 数据集格式。默认值:'mindrecord'。 diff --git a/docs/api/api_python/dataset/dataset_method/operation/mindspore.dataset.Dataset.split.rst b/docs/api/api_python/dataset/dataset_method/operation/mindspore.dataset.Dataset.split.rst index 95fc4963502..5a9b3e19436 100644 --- a/docs/api/api_python/dataset/dataset_method/operation/mindspore.dataset.Dataset.split.rst +++ b/docs/api/api_python/dataset/dataset_method/operation/mindspore.dataset.Dataset.split.rst @@ -14,7 +14,7 @@ mindspore.dataset.Dataset.split - 如果子数据集大小的总和小于K,K - sigma(round(fi * k))的值将添加到第一个子数据集,sigma为求和操作。 - 如果子数据集大小的总和大于K,sigma(round(fi * K)) - K的值将从第一个足够大的子数据集中删除,且删除后的子数据集大小至少大于1。 - - **randomize** (bool, 可选) - 确定是否随机拆分数据,默认值:True,数据集将被随机拆分。否则将按顺序拆分为多个不重叠的子数据集。 + - **randomize** (bool, 可选) - 确定是否随机拆分数据。默认值:True,数据集将被随机拆分。否则将按顺序拆分为多个不重叠的子数据集。 .. note:: 1. 如果进行拆分操作的数据集对象为MappableDataset类型,则将自动调用一个优化后的split操作。 diff --git a/docs/api/api_python/dataset/dataset_method/operation/mindspore.dataset.Dataset.take.rst b/docs/api/api_python/dataset/dataset_method/operation/mindspore.dataset.Dataset.take.rst index b0f788cdadd..5fdf2b7c9ee 100644 --- a/docs/api/api_python/dataset/dataset_method/operation/mindspore.dataset.Dataset.take.rst +++ b/docs/api/api_python/dataset/dataset_method/operation/mindspore.dataset.Dataset.take.rst @@ -10,7 +10,7 @@ mindspore.dataset.Dataset.take 2. take和batch操作顺序很重要,如果take在batch操作之前,则取给定条数,否则取给定batch数。 参数: - - **count** (int, 可选) - 要从数据集对象中获取的数据条数,默认值:-1,获取所有数据。 + - **count** (int, 可选) - 要从数据集对象中获取的数据条数。默认值:-1,获取所有数据。 返回: TakeDataset,take操作后的数据集对象。 diff --git a/docs/api/api_python/dataset/dataset_method/operation/mindspore.dataset.Dataset.zip.rst b/docs/api/api_python/dataset/dataset_method/operation/mindspore.dataset.Dataset.zip.rst index 61838b9ddac..01f5aa7c4f2 100644 --- a/docs/api/api_python/dataset/dataset_method/operation/mindspore.dataset.Dataset.zip.rst +++ b/docs/api/api_python/dataset/dataset_method/operation/mindspore.dataset.Dataset.zip.rst @@ -6,10 +6,10 @@ mindspore.dataset.Dataset.zip 将多个dataset对象按列进行合并压缩,多个dataset对象不能有相同的列名。 参数: - - **datasets** (tuple[Dataset]) - 要合并的(多个)dataset对象。 + - **datasets** (Union[Dataset, tuple[Dataset]]) - 要合并的(多个)dataset对象。 返回: ZipDataset,合并后的dataset对象。 异常: - - **TypeError** - `datasets` 参数不是dataset对象/tuple(dataset)。 + - **TypeError** - `datasets` 参数不是dataset对象/tuple[dataset]。 diff --git a/docs/api/api_python/dataset/dataset_method/others/mindspore.dataset.Dataset.device_que.rst b/docs/api/api_python/dataset/dataset_method/others/mindspore.dataset.Dataset.device_que.rst index 1545731d09a..ad3309f212d 100644 --- a/docs/api/api_python/dataset/dataset_method/others/mindspore.dataset.Dataset.device_que.rst +++ b/docs/api/api_python/dataset/dataset_method/others/mindspore.dataset.Dataset.device_que.rst @@ -6,7 +6,7 @@ mindspore.dataset.Dataset.device_que 将数据异步传输到Ascend/GPU设备上。 参数: - - **send_epoch_end** (bool, 可选) - 数据发送完成后是否发送结束标识到设备上,默认值:True。 + - **send_epoch_end** (bool, 可选) - 数据发送完成后是否发送结束标识到设备上。默认值:True。 - **create_data_info_queue** (bool, 可选) - 是否创建一个队列,用于存储每条数据的数据类型和shape。默认值:False,不创建。 .. note:: diff --git a/docs/api/api_python/dataset/dataset_method/others/mindspore.dataset.Dataset.sync_update.rst b/docs/api/api_python/dataset/dataset_method/others/mindspore.dataset.Dataset.sync_update.rst index 28f32d8f099..c905c20ca1c 100644 --- a/docs/api/api_python/dataset/dataset_method/others/mindspore.dataset.Dataset.sync_update.rst +++ b/docs/api/api_python/dataset/dataset_method/others/mindspore.dataset.Dataset.sync_update.rst @@ -7,5 +7,5 @@ mindspore.dataset.Dataset.sync_update 参数: - **condition_name** (str) - 用于触发发送下一个数据行的条件名称。 - - **num_batch** (Union[int, None]) - 释放的batch(row)数。当 `num_batch` 为None时,将默认为 `sync_wait` 操作指定的值,默认值:None。 - - **data** (Any) - 用户自定义传递给回调函数的数据,默认值:None。 + - **num_batch** (Union[int, None]) - 释放的batch(row)数。当 `num_batch` 为None时,将默认为 `sync_wait` 操作指定的值。默认值:None。 + - **data** (Any) - 用户自定义传递给回调函数的数据。默认值:None。 diff --git a/docs/api/api_python/dataset/mindspore.dataset.ArgoverseDataset.rst b/docs/api/api_python/dataset/mindspore.dataset.ArgoverseDataset.rst index 04f1776e2dc..ab6ff014707 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.ArgoverseDataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.ArgoverseDataset.rst @@ -10,10 +10,10 @@ 参数: - **data_dir** (str) - 加载数据集的目录,这里包含原始格式的数据,并将在 `process` 方法中被加载。 - - **column_names** (Union[str, list[str]],可选) - dataset包含的单个列名或多个列名组成的列表,默认值:'Graph'。当实现类似 `__getitem__` 等方法时,列名的数量应该等于该方法中返回数据的条数,如下述示例,建议初始化时明确它的取值如:`column_names=["edge_index", "x", "y", "cluster", "valid_len", "time_step_len"]`。 - - **num_parallel_workers** (int, 可选) - 指定读取数据的工作进程数/线程数(由参数 `python_multiprocessing` 决定当前为多进程模式或多线程模式),默认值:1。 + - **column_names** (Union[str, list[str]],可选) - dataset包含的单个列名或多个列名组成的列表。默认值:'Graph'。当实现类似 `__getitem__` 等方法时,列名的数量应该等于该方法中返回数据的条数,如下述示例,建议初始化时明确它的取值如:`column_names=["edge_index", "x", "y", "cluster", "valid_len", "time_step_len"]`。 + - **num_parallel_workers** (int, 可选) - 指定读取数据的工作进程数/线程数(由参数 `python_multiprocessing` 决定当前为多进程模式或多线程模式)。默认值:1。 - **shuffle** (bool,可选) - 是否混洗数据集。当实现的Dataset带有可随机访问属性( `__getitem__` )时,才可以指定该参数。默认值:None。 - - **python_multiprocessing** (bool,可选) - 启用Python多进程模式加速运算,默认值:True。当传入 `source` 的Python对象的计算量很大时,开启此选项可能会有较好效果。 + - **python_multiprocessing** (bool,可选) - 启用Python多进程模式加速运算。默认值:True。当传入 `source` 的Python对象的计算量很大时,开启此选项可能会有较好效果。 - **perf_mode** (bool,可选) - 遍历创建的dataset对象时获得更高性能的模式(在此过程中将调用 `__getitem__` 方法)。默认值:True,将Graph的所有数据(如边的索引、节点特征和图的特征)都作为图特征进行存储。 diff --git a/docs/api/api_python/dataset/mindspore.dataset.CLUEDataset.rst b/docs/api/api_python/dataset/mindspore.dataset.CLUEDataset.rst index ea0ccafb2ab..5011e688a0f 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.CLUEDataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.CLUEDataset.rst @@ -10,18 +10,18 @@ mindspore.dataset.CLUEDataset 参数: - **dataset_files** (Union[str, list[str]]) - 数据集文件路径,支持单文件路径字符串、多文件路径字符串列表或可被glob库模式匹配的字符串,文件列表将在内部进行字典排序。 - **task** (str, 可选) - 任务类型,可取值为 'AFQMC' 、'TNEWS'、'IFLYTEK'、'CMNLI'、'WSC' 或 'CSL'。默认值:'AFQMC'。 - - **usage** (str, 可选) - 指定数据集的子集,可取值为'train','test'或'eval',默认值:'train'。 + - **usage** (str, 可选) - 指定数据集的子集,可取值为'train','test'或'eval'。默认值:'train'。 - **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值:None,读取所有样本。 - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。 - - **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式,支持传入bool类型与枚举类型进行指定,默认值:mindspore.dataset.Shuffle.GLOBAL。 + - **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式,支持传入bool类型与枚举类型进行指定。默认值:mindspore.dataset.Shuffle.GLOBAL。 如果 `shuffle` 为False,则不混洗,如果 `shuffle` 为True,等同于将 `shuffle` 设置为mindspore.dataset.Shuffle.GLOBAL。 通过传入枚举变量设置数据混洗的模式: - **Shuffle.GLOBAL**:混洗文件和样本。 - **Shuffle.FILES**:仅混洗文件。 - - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数,默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号,默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 + - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 + - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 - **cache** (DatasetCache, 可选) - 单节点数据缓存服务,用于加快数据集处理,详情请阅读 `单节点数据缓存 `_ 。默认值:None,不使用缓存。 根据给定的 `task` 参数 和 `usage` 配置,数据集会生成不同的输出列: diff --git a/docs/api/api_python/dataset/mindspore.dataset.CSVDataset.rst b/docs/api/api_python/dataset/mindspore.dataset.CSVDataset.rst index 7bf45556325..08edf54d989 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.CSVDataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.CSVDataset.rst @@ -7,20 +7,20 @@ 参数: - **dataset_files** (Union[str, list[str]]) - 数据集文件路径,支持单文件路径字符串、多文件路径字符串列表或可被glob库模式匹配的字符串,文件列表将在内部进行字典排序。 - - **field_delim** (str, 可选) - 指定用于分隔字段的分隔符,默认值:','。 + - **field_delim** (str, 可选) - 指定用于分隔字段的分隔符。默认值:','。 - **column_defaults** (list, 可选) - 指定每个数据列的数据类型,有效的类型包括float、int或string。默认值:None,不指定。如果未指定该参数,则所有列的数据类型将被视为string。 - **column_names** (list[str], 可选) - 指定数据集生成的列名。默认值:None,不指定。如果未指定该列表,则将CSV文件首行提供的字段作为列名生成。 - **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值:None,读取全部样本。 - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。 - - **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式,支持传入bool类型与枚举类型进行指定,默认值:mindspore.dataset.Shuffle.GLOBAL。 + - **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式,支持传入bool类型与枚举类型进行指定。默认值:mindspore.dataset.Shuffle.GLOBAL。 如果 `shuffle` 为False,则不混洗,如果 `shuffle` 为True,等同于将 `shuffle` 设置为mindspore.dataset.Shuffle.GLOBAL。 通过传入枚举变量设置数据混洗的模式: - **Shuffle.GLOBAL**:混洗文件和文件中的数据。 - **Shuffle.FILES**:仅混洗文件。 - - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数,默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号,默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 + - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 + - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 - **cache** (DatasetCache, 可选) - 单节点数据缓存服务,用于加快数据集处理,详情请阅读 `单节点数据缓存 `_ 。默认值:None,不使用缓存。 异常: diff --git a/docs/api/api_python/dataset/mindspore.dataset.Caltech101Dataset.rst b/docs/api/api_python/dataset/mindspore.dataset.Caltech101Dataset.rst index c78cd775f45..47f8e4336db 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.Caltech101Dataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.Caltech101Dataset.rst @@ -21,11 +21,11 @@ mindspore.dataset.Caltech101Dataset 取值为'all'时将同时输出图像的类别标注和轮廓标注。默认值:None,表示'category'。 - **num_samples** (int, 可选) - 指定从数据集中读取的样本数,可以小于数据集总数。默认值:None,读取全部样本图片。 - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。 - - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None,下表中会展示不同参数配置的预期行为。 - - **decode** (bool, 可选) - 是否对读取的图片进行解码操作,默认值:False,不解码。 - - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器,默认值:None,下表中会展示不同配置的预期行为。 - - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数,默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号,默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 + - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None。下表中会展示不同参数配置的预期行为。 + - **decode** (bool, 可选) - 是否对读取的图片进行解码操作。默认值:False,不解码。 + - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值:None。下表中会展示不同配置的预期行为。 + - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 + - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 异常: - **RuntimeError** - `dataset_dir` 路径下不包含任何数据文件。 diff --git a/docs/api/api_python/dataset/mindspore.dataset.Caltech256Dataset.rst b/docs/api/api_python/dataset/mindspore.dataset.Caltech256Dataset.rst index e428291f493..3c106173b3f 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.Caltech256Dataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.Caltech256Dataset.rst @@ -11,11 +11,11 @@ mindspore.dataset.Caltech256Dataset - **dataset_dir** (str) - 包含数据集文件的根目录路径。 - **num_samples** (int, 可选) - 指定从数据集中读取的样本数,可以小于数据集总数。默认值:None,读取全部样本图片。 - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。 - - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None,下表中会展示不同参数配置的预期行为。 - - **decode** (bool, 可选) - 是否对读取的图片进行解码操作,默认值:False,不解码。 - - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器,默认值:None,下表中会展示不同配置的预期行为。 - - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数,默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号,默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 + - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None。下表中会展示不同参数配置的预期行为。 + - **decode** (bool, 可选) - 是否对读取的图片进行解码操作。默认值:False,不解码。 + - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值:None。下表中会展示不同配置的预期行为。 + - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 + - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 - **cache** (DatasetCache, 可选) - 单节点数据缓存服务,用于加快数据集处理,详情请阅读 `单节点数据缓存 `_ 。默认值:None,不使用缓存。 异常: diff --git a/docs/api/api_python/dataset/mindspore.dataset.CelebADataset.rst b/docs/api/api_python/dataset/mindspore.dataset.CelebADataset.rst index 81dabb39685..42ae0ac0faa 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.CelebADataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.CelebADataset.rst @@ -10,14 +10,14 @@ mindspore.dataset.CelebADataset 参数: - **dataset_dir** (str) - 包含数据集文件的根目录路径。 - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。 - - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None,下表中会展示不同参数配置的预期行为。 + - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None。下表中会展示不同参数配置的预期行为。 - **usage** (str, 可选) - 指定数据集的子集,可取值为'train','valid','test'或'all'。默认值:'all',全部样本图片。 - - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器,默认值:None,下表中会展示不同配置的预期行为。 - - **decode** (bool, 可选) - 是否对读取的图片进行解码操作,默认值:False,不解码。 - - **extensions** (list[str], 可选) - 指定文件的扩展名,仅读取与指定扩展名匹配的文件到数据集中,默认值:None。 + - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值:None。下表中会展示不同配置的预期行为。 + - **decode** (bool, 可选) - 是否对读取的图片进行解码操作。默认值:False,不解码。 + - **extensions** (list[str], 可选) - 指定文件的扩展名,仅读取与指定扩展名匹配的文件到数据集中。默认值:None。 - **num_samples** (int, 可选) - 指定从数据集中读取的样本数,可以小于数据集总数。默认值:None,读取全部样本图片。 - - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数,默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号,默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 + - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 + - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 - **cache** (DatasetCache, 可选) - 单节点数据缓存服务,用于加快数据集处理,详情请阅读 `单节点数据缓存 `_ 。默认值:None,不使用缓存。 - **decrypt** (callable, 可选) - 图像解密函数,接受加密的图片路径并返回bytes类型的解密数据。默认值:None,不进行解密。 diff --git a/docs/api/api_python/dataset/mindspore.dataset.Cifar100Dataset.rst b/docs/api/api_python/dataset/mindspore.dataset.Cifar100Dataset.rst index 42ec9e78670..fa032ebc8c4 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.Cifar100Dataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.Cifar100Dataset.rst @@ -13,10 +13,10 @@ mindspore.dataset.Cifar100Dataset 取值为'train'时将会读取50,000个训练样本,取值为'test'时将会读取10,000个测试样本,取值为'all'时将会读取全部60,000个样本。默认值:None,读取全部样本图片。 - **num_samples** (int, 可选) - 指定从数据集中读取的样本数,可以小于数据集总数。默认值:None,读取全部样本图片。 - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。 - - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None,下表中会展示不同参数配置的预期行为。 - - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器,默认值:None,下表中会展示不同配置的预期行为。 - - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数,默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号,默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 + - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None。下表中会展示不同参数配置的预期行为。 + - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值:None。下表中会展示不同配置的预期行为。 + - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 + - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 - **cache** (DatasetCache, 可选) - 单节点数据缓存服务,用于加快数据集处理,详情请阅读 `单节点数据缓存 `_ 。默认值:None,不使用缓存。 异常: diff --git a/docs/api/api_python/dataset/mindspore.dataset.Cifar10Dataset.rst b/docs/api/api_python/dataset/mindspore.dataset.Cifar10Dataset.rst index e3685bea6ee..21a9d0e6d46 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.Cifar10Dataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.Cifar10Dataset.rst @@ -13,10 +13,10 @@ mindspore.dataset.Cifar10Dataset 取值为'train'时将会读取50,000个训练样本,取值为'test'时将会读取10,000个测试样本,取值为'all'时将会读取全部60,000个样本。默认值:None,读取全部样本图片。 - **num_samples** (int, 可选) - 指定从数据集中读取的样本数,可以小于数据集总数。默认值:None,读取全部样本图片。 - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。 - - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None,下表中会展示不同参数配置的预期行为。 - - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器,默认值:None,下表中会展示不同配置的预期行为。 - - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数,默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号,默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 + - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None。下表中会展示不同参数配置的预期行为。 + - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值:None。下表中会展示不同配置的预期行为。 + - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 + - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 - **cache** (DatasetCache, 可选) - 单节点数据缓存服务,用于加快数据集处理,详情请阅读 `单节点数据缓存 `_ 。默认值:None,不使用缓存。 异常: diff --git a/docs/api/api_python/dataset/mindspore.dataset.CityscapesDataset.rst b/docs/api/api_python/dataset/mindspore.dataset.CityscapesDataset.rst index 24a808b1e74..ccad6fcfda1 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.CityscapesDataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.CityscapesDataset.rst @@ -16,11 +16,11 @@ mindspore.dataset.CityscapesDataset - **task** (str, 可选) - 指定数据集的任务类型,可取值为'instance'、'semantic'、'polygon'或'color'。默认值:'instance'。 - **num_samples** (int, 可选) - 指定从数据集中读取的样本数,可以小于数据集总数。默认值:None,读取全部样本图片。 - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。 - - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None,下表中会展示不同参数配置的预期行为。 - - **decode** (bool, 可选) - 是否对读取的图片进行解码操作,默认值:False,不解码。 - - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器,默认值:None,下表中会展示不同配置的预期行为。 - - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数,默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号,默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 + - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None。下表中会展示不同参数配置的预期行为。 + - **decode** (bool, 可选) - 是否对读取的图片进行解码操作。默认值:False,不解码。 + - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值:None。下表中会展示不同配置的预期行为。 + - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 + - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 - **cache** (DatasetCache, 可选) - 单节点数据缓存服务,用于加快数据集处理,详情请阅读 `单节点数据缓存 `_ 。默认值:None,不使用缓存。 异常: diff --git a/docs/api/api_python/dataset/mindspore.dataset.CocoDataset.rst b/docs/api/api_python/dataset/mindspore.dataset.CocoDataset.rst index 5eeef89f147..015cd5dbf46 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.CocoDataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.CocoDataset.rst @@ -10,14 +10,14 @@ - **annotation_file** (str) - 数据集标注JSON文件的路径。 - **task** (str, 可选) - 指定COCO数据的任务类型。支持的任务类型包括:'Detection'、'Stuff' 、'Panoptic'和'Keypoint'。默认值:'Detection'。 - **num_samples** (int, 可选) - 指定从数据集中读取的样本数,可以小于数据集总数。默认值:None,读取全部样本图片。 - - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数,默认值:使用mindspore.dataset.config中配置的线程数。 + - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:使用mindspore.dataset.config中配置的线程数。 - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None,表2中会展示不同参数配置的预期行为。 - - **decode** (bool, 可选) - 是否对读取的图片进行解码操作,默认值:False,不解码。 - - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器,默认值:None,表2中会展示不同配置的预期行为。 - - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数,默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号,默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 + - **decode** (bool, 可选) - 是否对读取的图片进行解码操作。默认值:False,不解码。 + - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值:None,表2中会展示不同配置的预期行为。 + - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 + - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 - **cache** (DatasetCache, 可选) - 单节点数据缓存服务,用于加快数据集处理,详情请阅读 `单节点数据缓存 `_ 。默认值:None,不使用缓存。 - - **extra_metadata** (bool, 可选) - 用于指定是否额外输出一个数据列用于表示图片元信息。如果为True,则将额外输出一个名为 `[_meta-filename, dtype=string]` 的数据列,默认值:False。 + - **extra_metadata** (bool, 可选) - 用于指定是否额外输出一个数据列用于表示图片元信息。如果为True,则将额外输出一个名为 `[_meta-filename, dtype=string]` 的数据列。默认值:False。 - **decrypt** (callable, 可选) - 图像解密函数,接受加密的图片路径并返回bytes类型的解密数据。默认值:None,不进行解密。 [表1] 根据不同 `task` 参数设置,生成数据集具有不同的输出列: diff --git a/docs/api/api_python/dataset/mindspore.dataset.DIV2KDataset.rst b/docs/api/api_python/dataset/mindspore.dataset.DIV2KDataset.rst index 43ec0b334aa..0ef4ca1996c 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.DIV2KDataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.DIV2KDataset.rst @@ -15,11 +15,11 @@ mindspore.dataset.DIV2KDataset 当参数 `downgrade` 取值为'unknown'时,此参数可以取值为2、3、4。当参数 `downgrade` 取值为'mild'、'difficult'、'wild'时,此参数仅可以取值为4。默认值:2。 - **num_samples** (int, 可选) - 指定从数据集中读取的样本数,可以小于数据集总数。默认值:None,读取全部样本图片。 - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。 - - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None,下表中会展示不同参数配置的预期行为。 - - **decode** (bool, 可选) - 是否对读取的图片进行解码操作,默认值:False,不解码。 - - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器,默认值:None,下表中会展示不同配置的预期行为。 - - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数,默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号,默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 + - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None。下表中会展示不同参数配置的预期行为。 + - **decode** (bool, 可选) - 是否对读取的图片进行解码操作。默认值:False,不解码。 + - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值:None。下表中会展示不同配置的预期行为。 + - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 + - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 - **cache** (DatasetCache, 可选) - 单节点数据缓存服务,用于加快数据集处理,详情请阅读 `单节点数据缓存 `_ 。默认值:None,不使用缓存。 异常: diff --git a/docs/api/api_python/dataset/mindspore.dataset.DSCallback.rst b/docs/api/api_python/dataset/mindspore.dataset.DSCallback.rst index 28e5cf6f457..7b03856a0f5 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.DSCallback.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.DSCallback.rst @@ -8,7 +8,7 @@ mindspore.dataset.DSCallback 用户可通过 `ds_run_context` 获取数据处理管道相关信息,包括 `cur_epoch_num` (当前epoch数)、 `cur_step_num_in_epoch` (当前epoch的step数)、 `cur_step_num` (当前step数)。 参数: - - **step_size** (int, 可选) - 定义相邻的 `ds_step_begin`/`ds_step_end` 调用之间相隔的step数,默认值:1,表示每个step都会调用。 + - **step_size** (int, 可选) - 定义相邻的 `ds_step_begin`/`ds_step_end` 调用之间相隔的step数。默认值:1,表示每个step都会调用。 .. py:method:: ds_begin(ds_run_context) diff --git a/docs/api/api_python/dataset/mindspore.dataset.DistributedSampler.rst b/docs/api/api_python/dataset/mindspore.dataset.DistributedSampler.rst index c18c713fac7..b17e35d3ff0 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.DistributedSampler.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.DistributedSampler.rst @@ -8,8 +8,8 @@ mindspore.dataset.DistributedSampler 参数: - **num_shards** (int) - 数据集分片数量。 - **shard_id** (int) - 当前分片的分片ID,应在[0, num_shards-1]范围内。 - - **shuffle** (bool, 可选) - 是否混洗采样得到的样本,默认值:True,混洗样本。 - - **num_samples** (int, 可选) - 获取的样本数,可用于部分获取采样得到的样本,默认值:None,获取采样到的所有样本。 + - **shuffle** (bool, 可选) - 是否混洗采样得到的样本。默认值:True,混洗样本。 + - **num_samples** (int, 可选) - 获取的样本数,可用于部分获取采样得到的样本。默认值:None,获取采样到的所有样本。 - **offset** (int, 可选) - 分布式采样结果进行分配时的起始分片ID号,值不能大于参数 `num_shards` 。从不同的分片ID开始分配数据可能会影响每个分片的最终样本数。仅当ConcatDataset以DistributedSampler为采样器时,此参数才有效。默认值:-1,每个分片具有相同的样本数。 异常: diff --git a/docs/api/api_python/dataset/mindspore.dataset.EMnistDataset.rst b/docs/api/api_python/dataset/mindspore.dataset.EMnistDataset.rst index df28951d637..f71a9bcaccc 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.EMnistDataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.EMnistDataset.rst @@ -14,8 +14,8 @@ mindspore.dataset.EMnistDataset 取值为'train'时将会读取60,000个训练样本,取值为'test'时将会读取10,000个测试样本,取值为'all'时将会读取全部70,000个样本。默认值:None,读取全部样本图片。 - **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值:None,读取全部样本图片。 - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。 - - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None,下表中会展示不同参数配置的预期行为。 - - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值:None,下表中会展示不同配置的预期行为。 + - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None。下表中会展示不同参数配置的预期行为。 + - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值:None。下表中会展示不同配置的预期行为。 - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 - **cache** (DatasetCache, 可选) - 单节点数据缓存服务,用于加快数据集处理,详情请阅读 `单节点数据缓存 `_ 。默认值:None,不使用缓存。 diff --git a/docs/api/api_python/dataset/mindspore.dataset.FakeImageDataset.rst b/docs/api/api_python/dataset/mindspore.dataset.FakeImageDataset.rst index 3598639b665..a38458dd4b1 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.FakeImageDataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.FakeImageDataset.rst @@ -14,8 +14,8 @@ mindspore.dataset.FakeImageDataset - **base_seed** (int, 可选) - 生成随机图像的随机种子。默认值:0。 - **num_samples** (int, 可选) - 指定从数据集中读取的样本数,可以小于数据集总数。默认值:None,读取全部样本图片。 - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。 - - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None,下表中会展示不同参数配置的预期行为。 - - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值:None,下表中会展示不同配置的预期行为。 + - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None。下表中会展示不同参数配置的预期行为。 + - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值:None。下表中会展示不同配置的预期行为。 - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 - **cache** (DatasetCache, 可选) - 单节点数据缓存服务,用于加快数据集处理,详情请阅读 `单节点数据缓存 `_ 。默认值:None,不使用缓存。 diff --git a/docs/api/api_python/dataset/mindspore.dataset.FashionMnistDataset.rst b/docs/api/api_python/dataset/mindspore.dataset.FashionMnistDataset.rst index 6704fc3ca98..43e5f6594f5 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.FashionMnistDataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.FashionMnistDataset.rst @@ -13,8 +13,8 @@ mindspore.dataset.FashionMnistDataset 取值为'train'时将会读取60,000个训练样本,取值为'test'时将会读取10,000个测试样本,取值为'all'时将会读取全部70,000个样本。默认值:None,读取全部样本图片。 - **num_samples** (int, 可选) - 指定从数据集中读取的样本数,可以小于数据集总数。默认值:None,读取全部样本图片。 - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。 - - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None,下表中会展示不同参数配置的预期行为。 - - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值:None,下表中会展示不同配置的预期行为。 + - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None。下表中会展示不同参数配置的预期行为。 + - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值:None。下表中会展示不同配置的预期行为。 - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 - **cache** (DatasetCache, 可选) - 单节点数据缓存服务,用于加快数据集处理,详情请阅读 `单节点数据缓存 `_ 。默认值:None,不使用缓存。 diff --git a/docs/api/api_python/dataset/mindspore.dataset.FlickrDataset.rst b/docs/api/api_python/dataset/mindspore.dataset.FlickrDataset.rst index dc137c99842..99d47bfa0bc 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.FlickrDataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.FlickrDataset.rst @@ -11,12 +11,12 @@ - **dataset_dir** (str) - 包含数据集文件的根目录路径。 - **annotation_file** (str) - 数据集标注JSON文件的路径。 - **num_samples** (int, 可选) - 指定从数据集中读取的样本数,可以小于数据集总数。默认值:None,读取全部样本图片。 - - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数,默认值:使用mindspore.dataset.config中配置的线程数。 + - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:使用mindspore.dataset.config中配置的线程数。 - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None,表2中会展示不同参数配置的预期行为。 - - **decode** (bool, 可选) - 是否对读取的图片进行解码操作,默认值:None,不解码。 - - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器,默认值:None,表2中会展示不同配置的预期行为。 - - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数,默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号,默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 + - **decode** (bool, 可选) - 是否对读取的图片进行解码操作。默认值:None,不解码。 + - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值:None,表2中会展示不同配置的预期行为。 + - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 + - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 - **cache** (DatasetCache, 可选) - 单节点数据缓存服务,用于加快数据集处理,详情请阅读 `单节点数据缓存 `_ 。默认值:None,不使用缓存。 异常: diff --git a/docs/api/api_python/dataset/mindspore.dataset.Flowers102Dataset.rst b/docs/api/api_python/dataset/mindspore.dataset.Flowers102Dataset.rst index 141dd344dbb..886a5b99227 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.Flowers102Dataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.Flowers102Dataset.rst @@ -16,9 +16,9 @@ mindspore.dataset.Flowers102Dataset - **usage** (str, 可选) - 指定数据集的子集,可取值为'train','valid','test'或'all'。默认值:'all',读取全部样本。 - **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值:None,所有图像样本。 - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:1。 - - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None,下表中会展示不同参数配置的预期行为。 + - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None。下表中会展示不同参数配置的预期行为。 - **decode** (bool, 可选) - 是否对读取的图片进行解码操作。默认值:False,不解码。 - - **sampler** (Union[Sampler, Iterable], 可选) - 指定从数据集中选取样本的采样器。默认值:None,下表中会展示不同配置的预期行为。 + - **sampler** (Union[Sampler, Iterable], 可选) - 指定从数据集中选取样本的采样器。默认值:None。下表中会展示不同配置的预期行为。 - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 diff --git a/docs/api/api_python/dataset/mindspore.dataset.GeneratorDataset.rst b/docs/api/api_python/dataset/mindspore.dataset.GeneratorDataset.rst index ee7f7dd23bb..1f2a1114087 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.GeneratorDataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.GeneratorDataset.rst @@ -11,20 +11,20 @@ - 如果 `source` 是可调用对象,要求 `source` 对象可以通过 `source().next()` 的方式返回一个由NumPy数组构成的元组。 - 如果 `source` 是可迭代对象,要求 `source` 对象通过 `iter(source).next()` 的方式返回一个由NumPy数组构成的元组。 - 如果 `source` 是支持随机访问的对象,要求 `source` 对象通过 `source[idx]` 的方式返回一个由NumPy数组构成的元组。 - - **column_names** (Union[str, list[str]],可选) - 指定数据集生成的列名,默认值:None,不指定。用户可以通过此参数或 `schema` 参数指定列名。 - - **column_types** (list[mindspore.dtype],可选) - 指定生成数据集各个数据列的数据类型,默认值:None,不指定。 + - **column_names** (Union[str, list[str]],可选) - 指定数据集生成的列名。默认值:None,不指定。用户可以通过此参数或 `schema` 参数指定列名。 + - **column_types** (list[mindspore.dtype],可选) - 指定生成数据集各个数据列的数据类型。默认值:None,不指定。 如果未指定该参数,则自动推断类型;如果指定了该参数,将在数据输出时做类型匹配检查。 - **schema** (Union[Schema, str],可选) - 读取模式策略,用于指定读取数据列的数据类型、数据维度等信息。 支持传入JSON文件路径或 mindspore.dataset.Schema 构造的对象。默认值:None,不指定。 用户可以通过提供 `column_names` 或 `schema` 指定数据集的列名,但如果同时指定两者,则将优先从 `schema` 中获取列名信息。 - - **num_samples** (int, 可选) - 指定从数据集中读取的样本数,默认值:None,读取全部样本。 - - **num_parallel_workers** (int, 可选) - 指定读取数据的工作进程数/线程数(由参数 `python_multiprocessing` 决定当前为多进程模式或多线程模式),默认值:1。 - - **shuffle** (bool,可选) - 是否混洗数据集。只有输入的 `source` 参数带有可随机访问属性(`__getitem__`)时,才可以指定该参数。默认值:None,下表中会展示不同配置的预期行为。 - - **sampler** (Union[Sampler, Iterable],可选) - 指定从数据集中选取样本的采样器。只有输入的 `source` 参数带有可随机访问属性(`__getitem__`)时,才可以指定该参数。默认值:None,下表中会展示不同配置的预期行为。 - - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数,默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号,默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 - - **python_multiprocessing** (bool,可选) - 启用Python多进程模式加速运算,默认值:True。当传入 `source` 的Python对象的计算量很大时,开启此选项可能会有较好效果。 - - **max_rowsize** (int, 可选) - 指定在多进程之间复制数据时,共享内存分配的最大空间,默认值:6,单位为MB。仅当参数 `python_multiprocessing` 设为True时,此参数才会生效。 + - **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值:None,读取全部样本。 + - **num_parallel_workers** (int, 可选) - 指定读取数据的工作进程数/线程数(由参数 `python_multiprocessing` 决定当前为多进程模式或多线程模式)。默认值:1。 + - **shuffle** (bool,可选) - 是否混洗数据集。只有输入的 `source` 参数带有可随机访问属性(`__getitem__`)时,才可以指定该参数。默认值:None。下表中会展示不同配置的预期行为。 + - **sampler** (Union[Sampler, Iterable],可选) - 指定从数据集中选取样本的采样器。只有输入的 `source` 参数带有可随机访问属性(`__getitem__`)时,才可以指定该参数。默认值:None。下表中会展示不同配置的预期行为。 + - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 + - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 + - **python_multiprocessing** (bool,可选) - 启用Python多进程模式加速运算。默认值:True。当传入 `source` 的Python对象的计算量很大时,开启此选项可能会有较好效果。 + - **max_rowsize** (int, 可选) - 指定在多进程之间复制数据时,共享内存分配的最大空间。默认值:6,单位为MB。仅当参数 `python_multiprocessing` 设为True时,此参数才会生效。 异常: - **RuntimeError** - Python对象 `source` 在执行期间引发异常。 diff --git a/docs/api/api_python/dataset/mindspore.dataset.Graph.rst b/docs/api/api_python/dataset/mindspore.dataset.Graph.rst index 7a47d3f7944..37489c27469 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.Graph.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.Graph.rst @@ -14,17 +14,17 @@ mindspore.dataset.Graph - **graph_feat** (dict, 可选) - 附加特征,不能分配给 `node_feat` 或者 `edge_feat` ,输入数据格式应该是dict,key是特征的类型,用字符串表示; value应该是NumPy数组,其shape可以不受限制。 - **node_type** (Union[list, numpy.ndarray], 可选) - 节点的类型,每个元素都是字符串,表示每个节点的类型。如果未提供,则每个节点的默认类型为“0”。 - **edge_type** (Union[list, numpy.ndarray], 可选) - 边的类型,每个元素都是字符串,表示每条边的类型。如果未提供,则每条边的默认类型为“0”。 - - **num_parallel_workers** (int, 可选) - 读取数据的工作线程数,默认值:None,使用mindspore.dataset.config中配置的线程数。 - - **working_mode** (str, 可选) - 设置工作模式,目前支持'local'/'client'/'server',默认值:'local'。 + - **num_parallel_workers** (int, 可选) - 读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。 + - **working_mode** (str, 可选) - 设置工作模式,目前支持'local'/'client'/'server'。默认值:'local'。 - **local**:用于非分布式训练场景。 - **client**:用于分布式训练场景。客户端不加载数据,而是从服务器获取数据。 - **server**:用于分布式训练场景。服务器加载数据并可供客户端使用。 - - **hostname** (str, 可选) - 图数据集服务器的主机名。该参数仅在工作模式设置为 'client' 或 'server' 时有效,默认值:'127.0.0.1'。 - - **port** (int, 可选) - 图数据服务器的端口,取值范围为1024-65535。此参数仅当工作模式设置为 'client' 或 'server' 时有效,默认值:50051。 - - **num_client** (int, 可选) - 期望连接到服务器的最大客户端数。服务器将根据该参数分配资源。该参数仅在工作模式设置为 'server' 时有效,默认值:1。 - - **auto_shutdown** (bool, 可选) - 当工作模式设置为 'server' 时有效。当连接的客户端数量达到 `num_client` ,且没有客户端正在连接时,服务器将自动退出,默认值:True。 + - **hostname** (str, 可选) - 图数据集服务器的主机名。该参数仅在工作模式设置为 'client' 或 'server' 时有效。默认值:'127.0.0.1'。 + - **port** (int, 可选) - 图数据服务器的端口,取值范围为1024-65535。此参数仅当工作模式设置为 'client' 或 'server' 时有效。默认值:50051。 + - **num_client** (int, 可选) - 期望连接到服务器的最大客户端数。服务器将根据该参数分配资源。该参数仅在工作模式设置为 'server' 时有效。默认值:1。 + - **auto_shutdown** (bool, 可选) - 当工作模式设置为 'server' 时有效。当连接的客户端数量达到 `num_client` ,且没有客户端正在连接时,服务器将自动退出。默认值:True。 异常: - **TypeError** - 如果 `edges` 不是list或NumPy array类型。 @@ -44,7 +44,7 @@ mindspore.dataset.Graph 获取图的所有边。 参数: - - **edge_type** (str) - 指定边的类型,Graph初始化未指定 `edge_type` 时,默认值为'0'。 + - **edge_type** (str) - 指定边的类型。默认值:'0'。 返回: numpy.ndarray,包含边的数组。 @@ -143,7 +143,7 @@ mindspore.dataset.Graph 参数: - **node_list** (Union[list, numpy.ndarray]) - 给定的节点列表。 - **neighbor_type** (str) - 指定相邻节点的类型。 - - **output_format** (OutputFormat, 可选) - 输出存储格式,默认值:mindspore.dataset.OutputFormat.NORMAL,取值范围:[OutputFormat.NORMAL, OutputFormat.COO, OutputFormat.CSR]。 + - **output_format** (OutputFormat, 可选) - 输出存储格式。默认值:mindspore.dataset.OutputFormat.NORMAL,取值范围:[OutputFormat.NORMAL, OutputFormat.COO, OutputFormat.CSR]。 返回: 对于普通格式或COO格式,将返回numpy.ndarray类型的数组表示相邻节点。如果指定了CSR格式,将返回两个numpy.ndarray数组,第一个表示偏移表,第二个表示相邻节点。 @@ -157,7 +157,7 @@ mindspore.dataset.Graph 获取图中的所有节点。 参数: - - **node_type** (str) - 指定节点的类型。Graph初始化未指定 `node_type` 时,默认值为'0'。 + - **node_type** (str) - 指定节点的类型。默认值:'0'。 返回: numpy.ndarray,包含节点的数组。 @@ -259,7 +259,7 @@ mindspore.dataset.Graph - **node_list** (Union[list, numpy.ndarray]) - 包含节点的列表。 - **neighbor_nums** (Union[list, numpy.ndarray]) - 每跳采样的相邻节点数。 - **neighbor_types** (Union[list, numpy.ndarray]) - 每跳采样的相邻节点类型,列表或数组中每个元素都应该是字符串类型。 - - **strategy** (SamplingStrategy, 可选) - 采样策略,默认值:mindspore.dataset.SamplingStrategy.RANDOM。取值范围:[SamplingStrategy.RANDOM, SamplingStrategy.EDGE_WEIGHT]。 + - **strategy** (SamplingStrategy, 可选) - 采样策略。默认值:mindspore.dataset.SamplingStrategy.RANDOM。取值范围:[SamplingStrategy.RANDOM, SamplingStrategy.EDGE_WEIGHT]。 - **SamplingStrategy.RANDOM**:随机抽样,带放回采样。 - **SamplingStrategy.EDGE_WEIGHT**:以边缘权重为概率进行采样。 @@ -286,9 +286,9 @@ mindspore.dataset.Graph 参数: - **target_nodes** (list[int]) - 随机游走中的起始节点列表。 - **meta_path** (list[int]) - 每个步长的节点类型。 - - **step_home_param** (float, 可选) - 返回 `node2vec算法 `_ 中的超参,默认值:1.0。 - - **step_away_param** (float, 可选) - `node2vec算法 `_ 中的in和out超参,默认值:1.0。 - - **default_node** (int, 可选) - 如果找不到更多相邻节点,则为默认节点,默认值:-1,表示不给定节点。 + - **step_home_param** (float, 可选) - 返回 `node2vec算法 `_ 中的超参。默认值:1.0。 + - **step_away_param** (float, 可选) - `node2vec算法 `_ 中的in和out超参。默认值:1.0。 + - **default_node** (int, 可选) - 如果找不到更多相邻节点,则为默认节点。默认值:-1,表示不给定节点。 返回: numpy.ndarray,包含节点的数组。 diff --git a/docs/api/api_python/dataset/mindspore.dataset.GraphData.rst b/docs/api/api_python/dataset/mindspore.dataset.GraphData.rst index a6585558f28..18b10836961 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.GraphData.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.GraphData.rst @@ -8,17 +8,17 @@ mindspore.dataset.GraphData 参数: - **dataset_file** (str) - 数据集文件路径。 - - **num_parallel_workers** (int, 可选) - 读取数据的工作线程数,默认值:None,使用mindspore.dataset.config中配置的线程数。 - - **working_mode** (str, 可选) - 设置工作模式,目前支持'local'/'client'/'server',默认值:'local'。 + - **num_parallel_workers** (int, 可选) - 读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。 + - **working_mode** (str, 可选) - 设置工作模式,目前支持'local'/'client'/'server'。默认值:'local'。 - **local**:用于非分布式训练场景。 - **client**:用于分布式训练场景。客户端不加载数据,而是从服务器获取数据。 - **server**:用于分布式训练场景。服务器加载数据并可供客户端使用。 - - **hostname** (str, 可选) - 图数据集服务器的主机名。该参数仅在工作模式设置为 'client' 或 'server' 时有效,默认值:'127.0.0.1'。 - - **port** (int, 可选) - 图数据服务器的端口,取值范围为1024-65535。此参数仅当工作模式设置为 'client' 或 'server' 时有效,默认值:50051。 - - **num_client** (int, 可选) - 期望连接到服务器的最大客户端数。服务器将根据该参数分配资源。该参数仅在工作模式设置为 'server' 时有效,默认值:1。 - - **auto_shutdown** (bool, 可选) - 当工作模式设置为 'server' 时有效。当连接的客户端数量达到 `num_client` ,且没有客户端正在连接时,服务器将自动退出,默认值:True。 + - **hostname** (str, 可选) - 图数据集服务器的主机名。该参数仅在工作模式设置为 'client' 或 'server' 时有效。默认值:'127.0.0.1'。 + - **port** (int, 可选) - 图数据服务器的端口,取值范围为1024-65535。此参数仅当工作模式设置为 'client' 或 'server' 时有效。默认值:50051。 + - **num_client** (int, 可选) - 期望连接到服务器的最大客户端数。服务器将根据该参数分配资源。该参数仅在工作模式设置为 'server' 时有效。默认值:1。 + - **auto_shutdown** (bool, 可选) - 当工作模式设置为 'server' 时有效。当连接的客户端数量达到 `num_client` ,且没有客户端正在连接时,服务器将自动退出。默认值:True。 异常: - **ValueError** - `dataset_file` 路径下数据文件不存在或无效。 @@ -132,7 +132,7 @@ mindspore.dataset.GraphData 参数: - **node_list** (Union[list, numpy.ndarray]) - 给定的节点列表。 - **neighbor_type** (int) - 指定相邻节点的类型。 - - **output_format** (OutputFormat, 可选) - 输出存储格式,默认值:mindspore.dataset.OutputFormat.NORMAL,取值范围:[OutputFormat.NORMAL, OutputFormat.COO, OutputFormat.CSR]。 + - **output_format** (OutputFormat, 可选) - 输出存储格式。默认值:mindspore.dataset.OutputFormat.NORMAL,取值范围:[OutputFormat.NORMAL, OutputFormat.COO, OutputFormat.CSR]。 返回: 对于普通格式或COO格式,将返回numpy.ndarray类型的数组表示相邻节点。如果指定了CSR格式,将返回两个numpy.ndarray数组,第一个表示偏移表,第二个表示相邻节点。 @@ -236,7 +236,7 @@ mindspore.dataset.GraphData - **node_list** (Union[list, numpy.ndarray]) - 包含节点的列表。 - **neighbor_nums** (Union[list, numpy.ndarray]) - 每跳采样的相邻节点数。 - **neighbor_types** (Union[list, numpy.ndarray]) - 每跳采样的相邻节点类型,列表或数组中每个元素都应该是int类型。 - - **strategy** (SamplingStrategy, 可选) - 采样策略,默认值:mindspore.dataset.SamplingStrategy.RANDOM。取值范围:[SamplingStrategy.RANDOM, SamplingStrategy.EDGE_WEIGHT]。 + - **strategy** (SamplingStrategy, 可选) - 采样策略。默认值:mindspore.dataset.SamplingStrategy.RANDOM。取值范围:[SamplingStrategy.RANDOM, SamplingStrategy.EDGE_WEIGHT]。 - **SamplingStrategy.RANDOM**:随机抽样,带放回采样。 - **SamplingStrategy.EDGE_WEIGHT**:以边缘权重为概率进行采样。 @@ -263,9 +263,9 @@ mindspore.dataset.GraphData 参数: - **target_nodes** (list[int]) - 随机游走中的起始节点列表。 - **meta_path** (list[int]) - 每个步长的节点类型。 - - **step_home_param** (float, 可选) - 返回 `node2vec算法 `_ 中的超参,默认值:1.0。 - - **step_away_param** (float, 可选) - `node2vec算法 `_ 中的in和out超参,默认值:1.0。 - - **default_node** (int, 可选) - 如果找不到更多相邻节点,则为默认节点,默认值:-1,表示不给定节点。 + - **step_home_param** (float, 可选) - 返回 `node2vec算法 `_ 中的超参。默认值:1.0。 + - **step_away_param** (float, 可选) - `node2vec算法 `_ 中的in和out超参。默认值:1.0。 + - **default_node** (int, 可选) - 如果找不到更多相邻节点,则为默认节点。默认值:-1,表示不给定节点。 返回: numpy.ndarray,包含节点的数组。 diff --git a/docs/api/api_python/dataset/mindspore.dataset.IMDBDataset.rst b/docs/api/api_python/dataset/mindspore.dataset.IMDBDataset.rst index b89e58835e0..293899f5fc2 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.IMDBDataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.IMDBDataset.rst @@ -14,8 +14,8 @@ mindspore.dataset.IMDBDataset 对于Polarity数据集,'train'将读取360万个训练样本,'test'将读取40万个测试样本,'all'将读取所有400万个样本。 对于Full数据集,'train'将读取300万个训练样本,'test'将读取65万个测试样本,'all'将读取所有365万个样本。默认值:None,读取所有样本。 - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。 - - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None,下表中会展示不同参数配置的预期行为。 - - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值:None,下表中会展示不同配置的预期行为。 + - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None。下表中会展示不同参数配置的预期行为。 + - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值:None。下表中会展示不同配置的预期行为。 - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 - **cache** (DatasetCache, 可选) - 单节点数据缓存服务,用于加快数据集处理,详情请阅读 `单节点数据缓存 `_ 。默认值:None,不使用缓存。 diff --git a/docs/api/api_python/dataset/mindspore.dataset.ImageFolderDataset.rst b/docs/api/api_python/dataset/mindspore.dataset.ImageFolderDataset.rst index 80b35ce888c..26401284430 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.ImageFolderDataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.ImageFolderDataset.rst @@ -11,13 +11,13 @@ mindspore.dataset.ImageFolderDataset - **dataset_dir** (str) - 包含数据集文件的根目录的路径。 - **num_samples** (int, 可选) - 指定从数据集中读取的样本数,可以小于数据集总数。默认值:None,读取全部样本图片。 - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。 - - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None,下表中会展示不同参数配置的预期行为。 - - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器,默认值:None,下表中会展示不同配置的预期行为。 - - **extensions** (list[str], 可选) - 指定文件的扩展名,仅读取与指定扩展名匹配的文件到数据集中,默认值:None。 + - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None。下表中会展示不同参数配置的预期行为。 + - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值:None。下表中会展示不同配置的预期行为。 + - **extensions** (list[str], 可选) - 指定文件的扩展名,仅读取与指定扩展名匹配的文件到数据集中。默认值:None。 - **class_indexing** (dict, 可选) - 指定文件夹名称到label索引的映射,要求映射规则为string到int。文件夹名称将按字母顺序排列,索引值从0开始,并且要求每个文件夹名称对应的索引值唯一。默认值:None,不指定。 - - **decode** (bool, 可选) - 是否对读取的图片进行解码操作,默认值:False,不解码。 - - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数,默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号,默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 + - **decode** (bool, 可选) - 是否对读取的图片进行解码操作。默认值:False,不解码。 + - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 + - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 - **cache** (DatasetCache, 可选) - 单节点数据缓存服务,用于加快数据集处理,详情请阅读 `单节点数据缓存 `_ 。默认值:None,不使用缓存。 - **decrypt** (callable, 可选) - 图像解密函数,接受加密的图片路径并返回bytes类型的解密数据。默认值:None,不进行解密。 diff --git a/docs/api/api_python/dataset/mindspore.dataset.InMemoryGraphDataset.rst b/docs/api/api_python/dataset/mindspore.dataset.InMemoryGraphDataset.rst index 0b70e1453b3..b0638c01ac9 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.InMemoryGraphDataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.InMemoryGraphDataset.rst @@ -11,15 +11,15 @@ 参数: - **data_dir** (str) - 加载数据集的目录,这里包含原始格式的数据,并将在 `process` 方法中被加载。 - - **save_dir** (str) - 保存处理后得到的数据集的相对目录,该目录位于 `data_dir` 下面,默认值:"./processed"。 - - **column_names** (Union[str, list[str]],可选) - dataset包含的单个列名或多个列名组成的列表,默认值:'Graph'。当实现类似 `__getitem__` 等方法时,列名的数量应该等于该方法中返回数据的条数。 - - **num_samples** (int, 可选) - 指定从数据集中读取的样本数,默认值:None,读取全部样本。 - - **num_parallel_workers** (int, 可选) - 指定读取数据的工作进程数/线程数(由参数 `python_multiprocessing` 决定当前为多进程模式或多线程模式),默认值:1。 + - **save_dir** (str) - 保存处理后得到的数据集的相对目录,该目录位于 `data_dir` 下面。默认值:"./processed"。 + - **column_names** (Union[str, list[str]],可选) - dataset包含的单个列名或多个列名组成的列表。默认值:'Graph'。当实现类似 `__getitem__` 等方法时,列名的数量应该等于该方法中返回数据的条数。 + - **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值:None,读取全部样本。 + - **num_parallel_workers** (int, 可选) - 指定读取数据的工作进程数/线程数(由参数 `python_multiprocessing` 决定当前为多进程模式或多线程模式)。默认值:1。 - **shuffle** (bool,可选) - 是否混洗数据集。当实现的Dataset带有可随机访问属性( `__getitem__` )时,才可以指定该参数。默认值:None。 - - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数,默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号,默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 - - **python_multiprocessing** (bool,可选) - 启用Python多进程模式加速运算,默认值:True。当传入 `source` 的Python对象的计算量很大时,开启此选项可能会有较好效果。 - - **max_rowsize** (int, 可选) - 指定在多进程之间复制数据时,共享内存分配的最大空间,默认值:6,单位为MB。仅当参数 `python_multiprocessing` 设为True时,此参数才会生效。 + - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 + - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 + - **python_multiprocessing** (bool,可选) - 启用Python多进程模式加速运算。默认值:True。当传入 `source` 的Python对象的计算量很大时,开启此选项可能会有较好效果。 + - **max_rowsize** (int, 可选) - 指定在多进程之间复制数据时,共享内存分配的最大空间。默认值:6,单位为MB。仅当参数 `python_multiprocessing` 设为True时,此参数才会生效。 .. py:method:: load() diff --git a/docs/api/api_python/dataset/mindspore.dataset.KMnistDataset.rst b/docs/api/api_python/dataset/mindspore.dataset.KMnistDataset.rst index 4eaa6a05ea2..7c6fd8f87f6 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.KMnistDataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.KMnistDataset.rst @@ -13,8 +13,8 @@ mindspore.dataset.KMnistDataset 取值为'train'时将会读取60,000个训练样本,取值为'test'时将会读取10,000个测试样本,取值为'all'时将会读取全部70,000个样本。默认值:None,读取全部样本图片。 - **num_samples** (int, 可选) - 指定从数据集中读取的样本数,可以小于数据集总数。默认值:None,读取全部样本图片。 - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。 - - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None,下表中会展示不同参数配置的预期行为。 - - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值:None,下表中会展示不同配置的预期行为。 + - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None。下表中会展示不同参数配置的预期行为。 + - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值:None。下表中会展示不同配置的预期行为。 - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 - **cache** (DatasetCache, 可选) - 单节点数据缓存服务,用于加快数据集处理,详情请阅读 `单节点数据缓存 `_ 。默认值:None,不使用缓存。 diff --git a/docs/api/api_python/dataset/mindspore.dataset.LJSpeechDataset.rst b/docs/api/api_python/dataset/mindspore.dataset.LJSpeechDataset.rst index b1efd026c58..17f7f7c2440 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.LJSpeechDataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.LJSpeechDataset.rst @@ -12,8 +12,8 @@ mindspore.dataset.LJSpeechDataset - **dataset_dir** (str) - 包含数据集文件的根目录路径。 - **num_samples** (int, 可选) - 指定从数据集中读取的样本数,可以小于数据集总数。默认值:None,读取全部样本音频。 - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。 - - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None,下表中会展示不同参数配置的预期行为。 - - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值:None,下表中会展示不同配置的预期行为。 + - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None。下表中会展示不同参数配置的预期行为。 + - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值:None。下表中会展示不同配置的预期行为。 - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 - **cache** (DatasetCache, 可选) - 单节点数据缓存服务,用于加快数据集处理,详情请阅读 `单节点数据缓存 `_ 。默认值:None,不使用缓存。 diff --git a/docs/api/api_python/dataset/mindspore.dataset.ManifestDataset.rst b/docs/api/api_python/dataset/mindspore.dataset.ManifestDataset.rst index 7cb1d58a6f6..fa2dd5de45e 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.ManifestDataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.ManifestDataset.rst @@ -9,15 +9,15 @@ 参数: - **dataset_file** (str) - 数据集文件的目录路径。 - - **usage** (str, 可选) - 指定数据集的子集,可取值为'train'、'eval' 或 'inference',默认值:'train'。 + - **usage** (str, 可选) - 指定数据集的子集,可取值为'train'、'eval' 或 'inference'。默认值:'train'。 - **num_samples** (int, 可选) - 指定从数据集中读取的样本数,可以小于数据集总数。默认值:None,读取全部样本图片。 - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。 - - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None,下表中会展示不同参数配置的预期行为。 - - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器,默认值:None,下表中会展示不同配置的预期行为。 + - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None。下表中会展示不同参数配置的预期行为。 + - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值:None。下表中会展示不同配置的预期行为。 - **class_indexing** (dict, 可选) - 指定一个从label名称到label索引的映射,要求映射规则为string到int。索引值从0开始,并且要求每个label名称对应的索引值唯一。默认值:None,不指定。 - - **decode** (bool, 可选) - 是否对读取的图片进行解码操作,默认值:False,不解码。 - - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数,默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号,默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 + - **decode** (bool, 可选) - 是否对读取的图片进行解码操作。默认值:False,不解码。 + - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 + - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 - **cache** (DatasetCache, 可选) - 单节点数据缓存服务,用于加快数据集处理,详情请阅读 `单节点数据缓存 `_ 。默认值:None,不使用缓存。 异常: diff --git a/docs/api/api_python/dataset/mindspore.dataset.MindDataset.rst b/docs/api/api_python/dataset/mindspore.dataset.MindDataset.rst index fb8117397fc..093eb3623d9 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.MindDataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.MindDataset.rst @@ -9,7 +9,7 @@ - **dataset_files** (Union[str, list[str]]) - MindRecord文件路径,支持单文件路径字符串、多文件路径字符串列表。如果 `dataset_files` 的类型是字符串,则它代表一组具有相同前缀名的MindRecord文件,同一路径下具有相同前缀名的其他MindRecord文件将会被自动寻找并加载。如果 `dataset_files` 的类型是列表,则它表示所需读取的MindRecord数据文件。 - **columns_list** (list[str],可选) - 指定从MindRecord文件中读取的数据列。默认值:None,读取所有列。 - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。 - - **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式,支持传入bool类型与枚举类型进行指定,默认值:mindspore.dataset.Shuffle.GLOBAL。 + - **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式,支持传入bool类型与枚举类型进行指定。默认值:mindspore.dataset.Shuffle.GLOBAL。 如果 `shuffle` 为False,则不混洗,如果 `shuffle` 为True,等同于将 `shuffle` 设置为mindspore.dataset.Shuffle.GLOBAL。 通过传入枚举变量设置数据混洗的模式: @@ -17,9 +17,9 @@ - **Shuffle.FILES**:仅混洗文件。 - **Shuffle.INFILE**:保持读入文件的序列,仅混洗每个文件中的数据。 - - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数,默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号,默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 - - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器,默认值:None,下表中会展示不同配置的预期行为。当前此数据集仅支持以下采样器:SubsetRandomSampler、PkSampler、RandomSampler、SequentialSampler和DistributedSampler。 + - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 + - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 + - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值:None。下表中会展示不同配置的预期行为。当前此数据集仅支持以下采样器:SubsetRandomSampler、PkSampler、RandomSampler、SequentialSampler和DistributedSampler。 - **padded_sample** (dict, 可选) - 指定额外添加到数据集的样本,可用于在分布式训练时补齐分片数据,注意字典的键名需要与 `column_list` 指定的列名相同。默认值:None,不添加样本。需要与 `num_padded` 参数同时使用。 - **num_padded** (int, 可选) - 指定额外添加的数据集样本的数量。在分布式训练时可用于为数据集补齐样本,使得总样本数量可被 `num_shards` 整除。默认值:None,不添加样本。需要与 `padded_sample` 参数同时使用。 - **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值:None,读取所有样本。 diff --git a/docs/api/api_python/dataset/mindspore.dataset.MnistDataset.rst b/docs/api/api_python/dataset/mindspore.dataset.MnistDataset.rst index 7801dda7478..6f616221ec5 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.MnistDataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.MnistDataset.rst @@ -13,10 +13,10 @@ mindspore.dataset.MnistDataset 取值为'train'时将会读取60,000个训练样本,取值为'test'时将会读取10,000个测试样本,取值为'all'时将会读取全部70,000个样本。默认值:None,读取全部样本图片。 - **num_samples** (int, 可选) - 指定从数据集中读取的样本数,可以小于数据集总数。默认值:None,读取全部样本图片。 - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。 - - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None,下表中会展示不同参数配置的预期行为。 - - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器,默认值:None,下表中会展示不同配置的预期行为。 - - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数,默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号,默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 + - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None。下表中会展示不同参数配置的预期行为。 + - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值:None。下表中会展示不同配置的预期行为。 + - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 + - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 - **cache** (DatasetCache, 可选) - 单节点数据缓存服务,用于加快数据集处理,详情请阅读 `单节点数据缓存 `_ 。默认值:None,不使用缓存。 异常: diff --git a/docs/api/api_python/dataset/mindspore.dataset.NumpySlicesDataset.rst b/docs/api/api_python/dataset/mindspore.dataset.NumpySlicesDataset.rst index c80da0d4397..3a20d028284 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.NumpySlicesDataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.NumpySlicesDataset.rst @@ -8,16 +8,16 @@ mindspore.dataset.NumpySlicesDataset 参数: - **data** (Union[list, tuple, dict]) - 输入的Python数据。支持的数据类型包括:list、tuple、dict和其他NumPy格式。 输入数据将沿着第一个维度切片,并生成额外的行。如果输入是单个list,则将生成一个数据列,若是嵌套多个list,则生成多个数据列。不建议通过这种方式加载大量的数据,因为可能会在数据加载到内存时等待较长时间。 - - **column_names** (list[str], 可选) - 指定数据集生成的列名,默认值:None,不指定。 + - **column_names** (list[str], 可选) - 指定数据集生成的列名。默认值:None,不指定。 如果未指定该参数,且当输入数据的类型是dict时,输出列名称将被命名为dict的键名,否则它们将被统一命名为column_0,column_1...。 - **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值:None,所有样本。 - - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数,默认值:1。 + - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:1。 - **shuffle** (bool, 可选) - 是否混洗数据集。 - 只有输入的 `data` 参数带有可随机访问属性(`__getitem__`)时,才可以指定该参数。默认值:None,下表中会展示不同配置的预期行为。 + 只有输入的 `data` 参数带有可随机访问属性(`__getitem__`)时,才可以指定该参数。默认值:None。下表中会展示不同配置的预期行为。 - **sampler** (Union[Sampler, Iterable], 可选) - 指定从数据集中选取样本的采样器。 - 只有输入的 `data` 参数带有可随机访问属性(`__getitem__`)时,才可以指定该参数。默认值:None,下表中会展示不同配置的预期行为。 - - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数,默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号,默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 + 只有输入的 `data` 参数带有可随机访问属性(`__getitem__`)时,才可以指定该参数。默认值:None。下表中会展示不同配置的预期行为。 + - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 + - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 .. note:: 此数据集可以指定参数 `sampler` ,但参数 `sampler` 和参数 `shuffle` 的行为是互斥的。下表展示了几种合法的输入参数组合及预期的行为。 diff --git a/docs/api/api_python/dataset/mindspore.dataset.OBSMindDataset.rst b/docs/api/api_python/dataset/mindspore.dataset.OBSMindDataset.rst index 80871534537..e99de51d1d3 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.OBSMindDataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.OBSMindDataset.rst @@ -14,7 +14,7 @@ - **sk** (str) - 访问密钥中的SK。 - **sync_obs_path** (str) - 用于同步操作云存储上的路径,用户需要提前创建,目录路径的格式为s3://bucketName/objectKey。 - **columns_list** (list[str],可选) - 指定从MindRecord文件中读取的数据列。默认值:None,读取所有列。 - - **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式,支持传入bool类型与枚举类型进行指定,默认值:mindspore.dataset.Shuffle.GLOBAL。 + - **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式,支持传入bool类型与枚举类型进行指定。默认值:mindspore.dataset.Shuffle.GLOBAL。 如果 `shuffle` 为False,则不混洗,如果 `shuffle` 为True,等同于将 `shuffle` 设置为mindspore.dataset.Shuffle.GLOBAL。 通过传入枚举变量设置数据混洗的模式: @@ -22,8 +22,8 @@ - **Shuffle.FILES**:仅混洗文件。 - **Shuffle.INFILE**:保持读入文件的序列,仅混洗每个文件中的数据。 - - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数,默认值:None。 - - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号,默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 + - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。 + - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 - **shard_equal_rows** (bool, 可选) - 分布式训练时,为所有分片获取等量的数据行数。默认值:True。 如果 `shard_equal_rows` 为False,则可能会使得每个分片的数据条目不相等,从而导致分布式训练失败。 因此当每个TFRecord文件的数据数量不相等时,建议将此参数设置为True。注意,只有当指定了 `num_shards` 时才能指定此参数。 diff --git a/docs/api/api_python/dataset/mindspore.dataset.PKSampler.rst b/docs/api/api_python/dataset/mindspore.dataset.PKSampler.rst index 3bf2f0c7707..654c9fdb768 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.PKSampler.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.PKSampler.rst @@ -7,9 +7,9 @@ mindspore.dataset.PKSampler 参数: - **num_val** (int) - 每个类要采样的元素数量。 - - **num_class** (int, 可选) - 要采样的类数量,默认值:为None,采样所有类。当前不支持指定该参数。 - - **shuffle** (bool, 可选) - 是否混洗采样得到的样本,默认值:False,不混洗样本。 - - **class_column** (str, 可选) - 指定label所属数据列的名称,将基于此列作为数据标签进行采样,默认值:'label'。 + - **num_class** (int, 可选) - 要采样的类数量。默认值:为None,采样所有类。当前不支持指定该参数。 + - **shuffle** (bool, 可选) - 是否混洗采样得到的样本。默认值:False,不混洗样本。 + - **class_column** (str, 可选) - 指定label所属数据列的名称,将基于此列作为数据标签进行采样。默认值:'label'。 - **num_samples** (int, 可选) - 获取的样本数,可用于部分获取采样得到的样本。默认值:None,获取采样到的所有样本。 异常: diff --git a/docs/api/api_python/dataset/mindspore.dataset.PhotoTourDataset.rst b/docs/api/api_python/dataset/mindspore.dataset.PhotoTourDataset.rst index 0e066f988ef..be664fc85cb 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.PhotoTourDataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.PhotoTourDataset.rst @@ -17,8 +17,8 @@ mindspore.dataset.PhotoTourDataset 取值为'test'时,将读取100,000个测试样本。 - **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值:None,读取所有样本。 - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。 - - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None,下表中会展示不同参数配置的预期行为。 - - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值:None,下表中会展示不同配置的预期行为。 + - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None。下表中会展示不同参数配置的预期行为。 + - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值:None。下表中会展示不同配置的预期行为。 - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 - **cache** (DatasetCache, 可选) - 单节点数据缓存服务,用于加快数据集处理,详情请阅读 `单节点数据缓存 `_ 。默认值:None,不使用缓存。 diff --git a/docs/api/api_python/dataset/mindspore.dataset.Places365Dataset.rst b/docs/api/api_python/dataset/mindspore.dataset.Places365Dataset.rst index a0dd59d55cd..f39a2e77ae4 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.Places365Dataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.Places365Dataset.rst @@ -10,15 +10,15 @@ mindspore.dataset.Places365Dataset 参数: - **dataset_dir** (str) - 包含数据集文件的根目录路径。 - - **usage** (str, 可选) - 'train-standard'、'train-challenge'或'val',默认值:'train-standard'。 + - **usage** (str, 可选) - 'train-standard'、'train-challenge'或'val'。默认值:'train-standard'。 - **small** (bool, 可选) - 是否使用256*256的低分辨率图像(True)或高分辨率图像(False)。默认值:False,使用低分辨率图像。 - - **decode** (bool, 可选) - 是否对读取的图片进行解码操作,默认值:False,不解码。 + - **decode** (bool, 可选) - 是否对读取的图片进行解码操作。默认值:False,不解码。 - **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值:None,读取所有样本。 - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。 - - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None,下表中会展示不同参数配置的预期行为。 - - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器,默认值:None,下表中会展示不同配置的预期行为。 - - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数,默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号,默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 + - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None。下表中会展示不同参数配置的预期行为。 + - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值:None。下表中会展示不同配置的预期行为。 + - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 + - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 - **cache** (DatasetCache, 可选) - 单节点数据缓存服务,用于加快数据集处理,详情请阅读 `单节点数据缓存 `_ 。默认值:None,不使用缓存。 异常: diff --git a/docs/api/api_python/dataset/mindspore.dataset.QMnistDataset.rst b/docs/api/api_python/dataset/mindspore.dataset.QMnistDataset.rst index 4d2d34d2721..0b39a446592 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.QMnistDataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.QMnistDataset.rst @@ -9,14 +9,14 @@ mindspore.dataset.QMnistDataset 参数: - **dataset_dir** (str) - 包含数据集文件的根目录路径。 - - **usage** (str, 可选) - 指定数据集的子集,可取值为'train'、'test'、'test10k'、'test50k'、'nist'或'all',默认值:None,读取所有子集。 + - **usage** (str, 可选) - 指定数据集的子集,可取值为'train'、'test'、'test10k'、'test50k'、'nist'或'all'。默认值:None,读取所有子集。 - **compat** (bool, 可选) - 指定每个样本的标签是类别号(compat=True)还是完整的QMNIST信息(compat=False)。默认值:True,标签为类别号。 - **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值:None,读取所有样本。 - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。 - - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None,下表中会展示不同参数配置的预期行为。 - - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器,默认值:None,下表中会展示不同配置的预期行为。 - - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数,默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号,默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 + - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None。下表中会展示不同参数配置的预期行为。 + - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值:None。下表中会展示不同配置的预期行为。 + - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 + - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 - **cache** (DatasetCache, 可选) - 单节点数据缓存服务,用于加快数据集处理,详情请阅读 `单节点数据缓存 `_ 。默认值:None,不使用缓存。 异常: diff --git a/docs/api/api_python/dataset/mindspore.dataset.RandomDataset.rst b/docs/api/api_python/dataset/mindspore.dataset.RandomDataset.rst index 3294ff1bc0f..93e811dd195 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.RandomDataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.RandomDataset.rst @@ -9,13 +9,13 @@ mindspore.dataset.RandomDataset - **total_rows** (int, 可选) - 随机生成样本数据的数量。默认值:None,生成随机数量的样本。 - **schema** (Union[str, Schema], 可选) - 读取模式策略,用于指定读取数据列的数据类型、数据维度等信息。 支持传入JSON文件路径或 mindspore.dataset.Schema 构造的对象。默认值:None,不指定。 - - **columns_list** (list[str], 可选) - 指定生成数据集的列名,默认值:None,生成的数据列将以"c0","c1","c2" ... "cn"的规则命名。 + - **columns_list** (list[str], 可选) - 指定生成数据集的列名。默认值:None,生成的数据列将以"c0","c1","c2" ... "cn"的规则命名。 - **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值:None,读取所有样本。 - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。 - **cache** (DatasetCache, 可选) - 单节点数据缓存服务,用于加快数据集处理,详情请阅读 `单节点数据缓存 `_ 。默认值:None,不使用缓存。 - - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None,下表中会展示不同参数配置的预期行为。 - - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数,默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号,默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 + - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None。下表中会展示不同参数配置的预期行为。 + - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 + - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 .. include:: mindspore.dataset.api_list_nlp.rst diff --git a/docs/api/api_python/dataset/mindspore.dataset.RandomSampler.rst b/docs/api/api_python/dataset/mindspore.dataset.RandomSampler.rst index 6d7db1eaaea..5af7dec48d5 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.RandomSampler.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.RandomSampler.rst @@ -6,7 +6,7 @@ mindspore.dataset.RandomSampler 随机采样器。 参数: - - **replacement** (bool, 可选) - 是否将样本ID放回下一次采样,默认值:False,无放回采样。 + - **replacement** (bool, 可选) - 是否将样本ID放回下一次采样。默认值:False,无放回采样。 - **num_samples** (int, 可选) - 获取的样本数,可用于部分获取采样得到的样本。默认值:None,获取采样到的所有样本。 异常: diff --git a/docs/api/api_python/dataset/mindspore.dataset.SBDataset.rst b/docs/api/api_python/dataset/mindspore.dataset.SBDataset.rst index a0ce8812746..d53e6a00da5 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.SBDataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.SBDataset.rst @@ -16,11 +16,11 @@ mindspore.dataset.SBDataset - **usage** (str, 可选) - 指定数据集的子集,可取值为'train'、'val'、'train_noval'和'all'。默认值:'train'。 - **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值:None,所有图像样本。 - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。 - - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None,下表中会展示不同参数配置的预期行为。 - - **decode** (bool, 可选) - 是否对读取的图片进行解码操作,默认值:False,不解码。 - - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器,默认值:None,下表中会展示不同配置的预期行为。 - - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数,默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号,默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 + - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None。下表中会展示不同参数配置的预期行为。 + - **decode** (bool, 可选) - 是否对读取的图片进行解码操作。默认值:False,不解码。 + - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值:None。下表中会展示不同配置的预期行为。 + - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 + - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 异常: - **RuntimeError** - `dataset_dir` 路径下不包含任何数据文件。 diff --git a/docs/api/api_python/dataset/mindspore.dataset.SBUDataset.rst b/docs/api/api_python/dataset/mindspore.dataset.SBUDataset.rst index 3fe2e7c54ff..5f6a0803290 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.SBUDataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.SBUDataset.rst @@ -9,13 +9,13 @@ mindspore.dataset.SBUDataset 参数: - **dataset_dir** (str) - 包含数据集文件的根目录的路径。 - - **decode** (bool, 可选) - 是否对读取的图片进行解码操作,默认值:False,不解码。 + - **decode** (bool, 可选) - 是否对读取的图片进行解码操作。默认值:False,不解码。 - **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值:None,所有图像样本。 - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。 - - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None,下表中会展示不同参数配置的预期行为。 - - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器,默认值:None,下表中会展示不同配置的预期行为。 - - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数,默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号,默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 + - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None。下表中会展示不同参数配置的预期行为。 + - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值:None。下表中会展示不同配置的预期行为。 + - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 + - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 - **cache** (DatasetCache, 可选) - 单节点数据缓存服务,用于加快数据集处理,详情请阅读 `单节点数据缓存 `_ 。默认值:None,不使用缓存。 异常: diff --git a/docs/api/api_python/dataset/mindspore.dataset.STL10Dataset.rst b/docs/api/api_python/dataset/mindspore.dataset.STL10Dataset.rst index d729b3d346a..f58c3b8e56a 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.STL10Dataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.STL10Dataset.rst @@ -14,10 +14,10 @@ mindspore.dataset.STL10Dataset 取值为'all'时将会读取全部类型的样本。默认值:None,读取全部样本图片。 - **num_samples** (int, 可选) - 指定从数据集中读取的样本数,可以小于数据集总数。默认值:None,读取全部样本图片。 - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。 - - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None,下表中会展示不同参数配置的预期行为。 - - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器,默认值:None,下表中会展示不同配置的预期行为。 - - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数,默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号,默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 + - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None。下表中会展示不同参数配置的预期行为。 + - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值:None。下表中会展示不同配置的预期行为。 + - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 + - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 - **cache** (DatasetCache, 可选) - 单节点数据缓存服务,用于加快数据集处理,详情请阅读 `单节点数据缓存 `_ 。默认值:None,不使用缓存。 异常: diff --git a/docs/api/api_python/dataset/mindspore.dataset.SVHNDataset.rst b/docs/api/api_python/dataset/mindspore.dataset.SVHNDataset.rst index 1e368fcbc0d..e6e49c5eec7 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.SVHNDataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.SVHNDataset.rst @@ -12,10 +12,10 @@ mindspore.dataset.SVHNDataset - **usage** (str, 可选) - 指定数据集的子集,可取值为'train'、'test'、'extra'或'all'。默认值:None,读取全部样本图片。 - **num_samples** (int, 可选) - 指定从数据集中读取的样本数,可以小于数据集总数。默认值:None,读取全部样本图片。 - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。 - - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None,下表中会展示不同参数配置的预期行为。 - - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器,默认值:None,下表中会展示不同配置的预期行为。 - - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数,默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号,默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 + - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None。下表中会展示不同参数配置的预期行为。 + - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值:None。下表中会展示不同配置的预期行为。 + - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 + - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 异常: - **RuntimeError** - `dataset_dir` 路径下不包含数据文件。 diff --git a/docs/api/api_python/dataset/mindspore.dataset.Schema.rst b/docs/api/api_python/dataset/mindspore.dataset.Schema.rst index c97ddcd9885..06f91138dd6 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.Schema.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.Schema.rst @@ -6,7 +6,7 @@ mindspore.dataset.Schema 用于解析和存储数据列属性的类。 参数: - - **schema_file** (str) - schema文件的路径,默认值:None。 + - **schema_file** (str) - schema文件的路径。默认值:None。 返回: schema对象,关于数据集的行列配置的策略信息。 @@ -21,7 +21,7 @@ mindspore.dataset.Schema 参数: - **name** (str) - 列的新名称。 - **de_type** (str) - 列的数据类型。 - - **shape** (list[int], 可选) - 列shape,默认值:None,-1表示该维度的shape是未知的。 + - **shape** (list[int], 可选) - 列shape。默认值:None,-1表示该维度的shape是未知的。 异常: - **ValueError** - 列类型未知。 diff --git a/docs/api/api_python/dataset/mindspore.dataset.SemeionDataset.rst b/docs/api/api_python/dataset/mindspore.dataset.SemeionDataset.rst index 8ee993134c8..46364ce7e2f 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.SemeionDataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.SemeionDataset.rst @@ -11,10 +11,10 @@ mindspore.dataset.SemeionDataset - **dataset_dir** (str) - 包含数据集文件的根目录的路径。 - **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值:None,所有图像样本。 - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。 - - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None,下表中会展示不同参数配置的预期行为。 - - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器,默认值:None,下表中会展示不同配置的预期行为。 - - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数,默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号,默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 + - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None。下表中会展示不同参数配置的预期行为。 + - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值:None。下表中会展示不同配置的预期行为。 + - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 + - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 - **cache** (DatasetCache, 可选) - 单节点数据缓存服务,用于加快数据集处理,详情请阅读 `单节点数据缓存 `_ 。默认值:None,不使用缓存。 异常: diff --git a/docs/api/api_python/dataset/mindspore.dataset.SequentialSampler.rst b/docs/api/api_python/dataset/mindspore.dataset.SequentialSampler.rst index bd80034eef5..c4dd1968004 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.SequentialSampler.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.SequentialSampler.rst @@ -6,7 +6,7 @@ mindspore.dataset.SequentialSampler 按数据集的读取顺序采样数据集样本,相当于不使用采样器。 参数: - - **start_index** (int, 可选) - 采样的起始样本ID,默认值:None,从数据集第一个样本开始采样。 + - **start_index** (int, 可选) - 采样的起始样本ID。默认值:None,从数据集第一个样本开始采样。 - **num_samples** (int, 可选) - 获取的样本数,可用于部分获取采样得到的样本。默认值:None,获取采样到的所有样本。 异常: diff --git a/docs/api/api_python/dataset/mindspore.dataset.SogouNewsDataset.rst b/docs/api/api_python/dataset/mindspore.dataset.SogouNewsDataset.rst index 9c8c4c47299..926b22395b1 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.SogouNewsDataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.SogouNewsDataset.rst @@ -13,15 +13,15 @@ mindspore.dataset.SogouNewsDataset 取值为'train'时将会读取45万个训练样本,取值为'test'时将会读取6万个测试样本,取值为'all'时将会读取全部51万个样本。默认值:None,读取全部样本。 - **num_samples** (int, 可选) - 指定从数据集中读取的样本数。 - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。 - - **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式,支持传入bool类型与枚举类型进行指定,默认值:mindspore.dataset.Shuffle.GLOBAL。 + - **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式,支持传入bool类型与枚举类型进行指定。默认值:mindspore.dataset.Shuffle.GLOBAL。 如果 `shuffle` 为False,则不混洗,如果 `shuffle` 为True,等同于将 `shuffle` 设置为mindspore.dataset.Shuffle.GLOBAL。 通过传入枚举变量设置数据混洗的模式: - **Shuffle.GLOBAL**:混洗文件和样本。 - **Shuffle.FILES**:仅混洗文件。 - - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数,默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号,默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 + - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 + - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 - **cache** (DatasetCache, 可选) - 单节点数据缓存服务,用于加快数据集处理,详情请阅读 `单节点数据缓存 `_ 。默认值:None,不使用缓存。 异常: diff --git a/docs/api/api_python/dataset/mindspore.dataset.SpeechCommandsDataset.rst b/docs/api/api_python/dataset/mindspore.dataset.SpeechCommandsDataset.rst index eff20625b9c..5cfd9cd37fc 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.SpeechCommandsDataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.SpeechCommandsDataset.rst @@ -14,10 +14,10 @@ mindspore.dataset.SpeechCommandsDataset 取值为'train'时将会读取84,843个训练样本,取值为'test'时将会读取11,005个测试样本,取值为'valid'时将会读取9,981个测试样本,取值为'all'时将会读取全部105,829个样本。默认值:None,读取全部样本。 - **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值:None,读取全部样本。 - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。 - - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None,下表中会展示不同参数配置的预期行为。 - - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器,默认值:None,下表中会展示不同配置的预期行为。 - - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数,默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号,默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 + - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None。下表中会展示不同参数配置的预期行为。 + - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值:None。下表中会展示不同配置的预期行为。 + - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 + - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 - **cache** (DatasetCache, 可选) - 单节点数据缓存服务,用于加快数据集处理,详情请阅读 `单节点数据缓存 `_ 。默认值:None,不使用缓存。 异常: diff --git a/docs/api/api_python/dataset/mindspore.dataset.TFRecordDataset.rst b/docs/api/api_python/dataset/mindspore.dataset.TFRecordDataset.rst index 11b257b2f61..d25b5472cd1 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.TFRecordDataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.TFRecordDataset.rst @@ -10,21 +10,21 @@ mindspore.dataset.TFRecordDataset - **schema** (Union[str, Schema], 可选) - 读取模式策略,用于指定读取数据列的数据类型、数据维度等信息。 支持传入JSON文件路径或 mindspore.dataset.Schema 构造的对象。默认值:None,不指定。 - **columns_list** (list[str], 可选) - 指定从TFRecord文件中读取的数据列。默认值:None,读取所有列。 - - **num_samples** (int, 可选) - 指定从数据集中读取的样本数,默认值:None,读取全部样本。 + - **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值:None,读取全部样本。 - 如果 `num_samples` 为None,并且numRows字段(由参数 `schema` 定义)不存在,则读取所有数据集; - 如果 `num_samples` 为None,并且numRows字段(由参数 `schema` 定义)的值大于0,则读取numRows条数据; - 如果 `num_samples` 和numRows字段(由参数 `schema` 定义)的值都大于0,此时仅有参数 `num_samples` 生效且读取给定数量的数据。 - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。 - - **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式,支持传入bool类型与枚举类型进行指定,默认值:mindspore.dataset.Shuffle.GLOBAL。 + - **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式,支持传入bool类型与枚举类型进行指定。默认值:mindspore.dataset.Shuffle.GLOBAL。 如果 `shuffle` 为False,则不混洗,如果 `shuffle` 为True,等同于将 `shuffle` 设置为mindspore.dataset.Shuffle.GLOBAL。 通过传入枚举变量设置数据混洗的模式: - **Shuffle.GLOBAL**:混洗文件和样本。 - **Shuffle.FILES**:仅混洗文件。 - - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数,默认值:None。指定此参数后,`num_samples` 表示每个分片的最大样本数。 - - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号,默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 + - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后,`num_samples` 表示每个分片的最大样本数。 + - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 - **shard_equal_rows** (bool, 可选) - 分布式训练时,为所有分片获取等量的数据行数。默认值:False。如果 `shard_equal_rows` 为False,则可能会使得每个分片的数据条目不相等,从而导致分布式训练失败。因此当每个TFRecord文件的数据数量不相等时,建议将此参数设置为True。注意,只有当指定了 `num_shards` 时才能指定此参数。 - **cache** (DatasetCache, 可选) - 单节点数据缓存服务,用于加快数据集处理,详情请阅读 `单节点数据缓存 `_ 。默认值:None,不使用缓存。 diff --git a/docs/api/api_python/dataset/mindspore.dataset.TedliumDataset.rst b/docs/api/api_python/dataset/mindspore.dataset.TedliumDataset.rst index d875c580028..98ba10403ec 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.TedliumDataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.TedliumDataset.rst @@ -17,10 +17,10 @@ mindspore.dataset.TedliumDataset - **extensions** (str, 可选) - 指定SPH文件的扩展名。默认值:'.sph'。 - **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值:None,读取全部样本。 - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。 - - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None,下表中会展示不同参数配置的预期行为。 - - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器,默认值:None,下表中会展示不同配置的预期行为。 - - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数,默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号,默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 + - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None。下表中会展示不同参数配置的预期行为。 + - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值:None。下表中会展示不同配置的预期行为。 + - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 + - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 - **cache** (DatasetCache, 可选) - 单节点数据缓存服务,用于加快数据集处理,详情请阅读 `单节点数据缓存 `_ 。默认值:None,不使用缓存。 异常: diff --git a/docs/api/api_python/dataset/mindspore.dataset.TextFileDataset.rst b/docs/api/api_python/dataset/mindspore.dataset.TextFileDataset.rst index 47f3aeb171d..c90e865b069 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.TextFileDataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.TextFileDataset.rst @@ -9,15 +9,15 @@ - **dataset_files** (Union[str, list[str]]) - 数据集文件路径,支持单文件路径字符串、多文件路径字符串列表或可被glob库模式匹配的字符串,文件列表将在内部进行字典排序。 - **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值:None,读取所有样本。 - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。 - - **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式,支持传入bool类型与枚举类型进行指定,默认值:mindspore.dataset.Shuffle.GLOBAL。 + - **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式,支持传入bool类型与枚举类型进行指定。默认值:mindspore.dataset.Shuffle.GLOBAL。 如果 `shuffle` 为False,则不混洗,如果 `shuffle` 为True,等同于将 `shuffle` 设置为mindspore.dataset.Shuffle.GLOBAL。 通过传入枚举变量设置数据混洗的模式: - **Shuffle.GLOBAL**:混洗文件和样本。 - **Shuffle.FILES**:仅混洗文件。 - - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数,默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号,默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 + - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 + - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 - **cache** (DatasetCache, 可选) - 单节点数据缓存服务,用于加快数据集处理,详情请阅读 `单节点数据缓存 `_ 。默认值:None,不使用缓存。 异常: diff --git a/docs/api/api_python/dataset/mindspore.dataset.UDPOSDataset.rst b/docs/api/api_python/dataset/mindspore.dataset.UDPOSDataset.rst index 2203893b41f..dac63b3ff24 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.UDPOSDataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.UDPOSDataset.rst @@ -12,15 +12,15 @@ mindspore.dataset.UDPOSDataset - **usage** (str, 可选) - 指定数据集的子集,可取值为'train'、'test'、'valid'或'all'。 取值为'train'时将会读取12,543个样本,取值为'test'时将会读取2,077个测试样本,取值为'test'时将会读取9,981个样本,取值为'valid'时将会读取2,002个样本,取值为'all'时将会读取全部16,622个样本。默认值:None,读取全部样本。 - **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值:None,读取全部样本。 - - **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式,支持传入bool类型与枚举类型进行指定,默认值:mindspore.dataset.Shuffle.GLOBAL。 + - **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式,支持传入bool类型与枚举类型进行指定。默认值:mindspore.dataset.Shuffle.GLOBAL。 如果 `shuffle` 为False,则不混洗,如果 `shuffle` 为True,等同于将 `shuffle` 设置为mindspore.dataset.Shuffle.GLOBAL。 通过传入枚举变量设置数据混洗的模式: - **Shuffle.GLOBAL**:混洗文件和样本。 - **Shuffle.FILES**:仅混洗文件。 - - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数,默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号,默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 + - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 + - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。 - **cache** (DatasetCache, 可选) - 单节点数据缓存服务,用于加快数据集处理,详情请阅读 `单节点数据缓存 `_ 。默认值:None,不使用缓存。 diff --git a/docs/api/api_python/dataset/mindspore.dataset.USPSDataset.rst b/docs/api/api_python/dataset/mindspore.dataset.USPSDataset.rst index 7d9fed19b74..dcb99c74054 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.USPSDataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.USPSDataset.rst @@ -13,15 +13,15 @@ mindspore.dataset.USPSDataset 取值为'train'时将会读取7,291个样本,取值为'test'时将会读取2,077个测试样本,取值为'test'时将会读取2,007个样本,取值为'all'时将会读取全部9,298个样本。默认值:None,读取全部样本。 - **num_samples** (int, 可选) - 指定从数据集中读取的样本数。 - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。 - - **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式,支持传入bool类型与枚举类型进行指定,默认值:mindspore.dataset.Shuffle.GLOBAL。 + - **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式,支持传入bool类型与枚举类型进行指定。默认值:mindspore.dataset.Shuffle.GLOBAL。 如果 `shuffle` 为False,则不混洗,如果 `shuffle` 为True,等同于将 `shuffle` 设置为mindspore.dataset.Shuffle.GLOBAL。 通过传入枚举变量设置数据混洗的模式: - **Shuffle.GLOBAL**:混洗文件和样本。 - **Shuffle.FILES**:仅混洗文件。 - - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数,默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号,默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 + - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 + - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 - **cache** (DatasetCache, 可选) - 单节点数据缓存服务,用于加快数据集处理,详情请阅读 `单节点数据缓存 `_ 。默认值:None,不使用缓存。 异常: diff --git a/docs/api/api_python/dataset/mindspore.dataset.VOCDataset.rst b/docs/api/api_python/dataset/mindspore.dataset.VOCDataset.rst index a05797b6b3b..b78b7a2a333 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.VOCDataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.VOCDataset.rst @@ -13,7 +13,7 @@ mindspore.dataset.VOCDataset 参数: - **dataset_dir** (str) - 包含数据集文件的根目录的路径。 - **task** (str, 可选) - 指定读取VOC数据的任务类型,现在只支持'Segmentation'和'Detection'。默认值:'Segmentation'。 - - **usage** (str, 可选) - 指定数据集的子集,默认值:'train'。 + - **usage** (str, 可选) - 指定数据集的子集。默认值:'train'。 - 如果 'task' 的值为 'Segmentation',则读取 'ImageSets/Segmentation/' 目录下定义的图片和label信息; - 如果 'task' 的值为 'Detection' ,则读取 'ImageSets/Main/' 目录下定义的图片和label信息。 @@ -21,13 +21,13 @@ mindspore.dataset.VOCDataset 仅在 'Detection' 任务中有效。默认值:None,不指定。 - **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值:None,所有图像样本。 - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。 - - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None,下表中会展示不同参数配置的预期行为。 - - **decode** (bool, 可选) - 是否对读取的图片进行解码操作,默认值:False,不解码。 - - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器,默认值:None,下表中会展示不同配置的预期行为。 - - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数,默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号,默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 + - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None。下表中会展示不同参数配置的预期行为。 + - **decode** (bool, 可选) - 是否对读取的图片进行解码操作。默认值:False,不解码。 + - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值:None。下表中会展示不同配置的预期行为。 + - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 + - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 - **cache** (DatasetCache, 可选) - 单节点数据缓存服务,用于加快数据集处理,详情请阅读 `单节点数据缓存 `_ 。默认值:None,不使用缓存。 - - **extra_metadata** (bool, 可选) - 用于指定是否额外输出一个数据列用于表示图片元信息。如果为True,则将额外输出一个名为 `[_meta-filename, dtype=string]` 的数据列,默认值:False。 + - **extra_metadata** (bool, 可选) - 用于指定是否额外输出一个数据列用于表示图片元信息。如果为True,则将额外输出一个名为 `[_meta-filename, dtype=string]` 的数据列。默认值:False。 - **decrypt** (callable, 可选) - 图像解密函数,接受加密的图片路径并返回bytes类型的解密数据。默认值:None,不进行解密。 异常: diff --git a/docs/api/api_python/dataset/mindspore.dataset.WIDERFaceDataset.rst b/docs/api/api_python/dataset/mindspore.dataset.WIDERFaceDataset.rst index ff79bbd2116..a9d1c80f396 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.WIDERFaceDataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.WIDERFaceDataset.rst @@ -14,11 +14,11 @@ mindspore.dataset.WIDERFaceDataset 取值为'train'时将会读取12,880个样本,取值为'test'时将会读取2,077个测试样本,取值为'test'时将会读取16,097个样本,取值为'valid'时将会读取3,226个样本,取值为'all'时将会读取全部类别样本。默认值:None,读取全部样本。 - **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值:None,读取全部样本。 - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。 - - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None,下表中会展示不同参数配置的预期行为。 - - **decode** (bool, 可选) - 是否对读取的图片进行解码操作,默认值:False,不解码。 - - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器,默认值:None,下表中会展示不同配置的预期行为。 - - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数,默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号,默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 + - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None。下表中会展示不同参数配置的预期行为。 + - **decode** (bool, 可选) - 是否对读取的图片进行解码操作。默认值:False,不解码。 + - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值:None。下表中会展示不同配置的预期行为。 + - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 + - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 - **cache** (DatasetCache, 可选) - 单节点数据缓存服务,用于加快数据集处理,详情请阅读 `单节点数据缓存 `_ 。默认值:None,不使用缓存。 异常: diff --git a/docs/api/api_python/dataset/mindspore.dataset.WaitedDSCallback.rst b/docs/api/api_python/dataset/mindspore.dataset.WaitedDSCallback.rst index 5c6a02b902e..9d3bb4b0a20 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.WaitedDSCallback.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.WaitedDSCallback.rst @@ -14,7 +14,7 @@ mindspore.dataset.WaitedDSCallback .. note:: 注意,第2个step或epoch开始时才会触发该调用。 参数: - - **step_size** (int, 可选) - 每个step包含的数据行数。通常step_size与batch_size一致,默认值:1。 + - **step_size** (int, 可选) - 每个step包含的数据行数。通常step_size与batch_size一致。默认值:1。 .. py:method:: sync_epoch_begin(train_run_context, ds_run_context) diff --git a/docs/api/api_python/dataset/mindspore.dataset.WeightedRandomSampler.rst b/docs/api/api_python/dataset/mindspore.dataset.WeightedRandomSampler.rst index 7b50e0bb452..7d24a4e979a 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.WeightedRandomSampler.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.WeightedRandomSampler.rst @@ -8,7 +8,7 @@ mindspore.dataset.WeightedRandomSampler 参数: - **weights** (list[float, int]) - 权重序列,总和不一定为1。 - **num_samples** (int, 可选) - 获取的样本数,可用于部分获取采样得到的样本。默认值:None,获取采样到的所有样本。 - - **replacement** (bool) - 是否将样本ID放回下一次采样,默认值:True,有放回采样。 + - **replacement** (bool) - 是否将样本ID放回下一次采样。默认值:True,有放回采样。 异常: - **TypeError** - `weights` 元素的类型不是数值类型。 diff --git a/docs/api/api_python/dataset/mindspore.dataset.WikiTextDataset.rst b/docs/api/api_python/dataset/mindspore.dataset.WikiTextDataset.rst index deff4a1d1cd..a54267508cf 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.WikiTextDataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.WikiTextDataset.rst @@ -12,15 +12,15 @@ mindspore.dataset.WikiTextDataset - **usage** (str, 可选) - 指定数据集的子集,可取值为'train', 'test', 'valid'或'all'。默认值:None,读取全部样本。 - **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值:None,读取全部样本。 - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。 - - **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式,支持传入bool类型与枚举类型进行指定,默认值:mindspore.dataset.Shuffle.GLOBAL。 + - **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式,支持传入bool类型与枚举类型进行指定。默认值:mindspore.dataset.Shuffle.GLOBAL。 如果 `shuffle` 为False,则不混洗,如果 `shuffle` 为True,等同于将 `shuffle` 设置为mindspore.dataset.Shuffle.GLOBAL。 通过传入枚举变量设置数据混洗的模式: - **Shuffle.GLOBAL**:混洗文件和样本。 - **Shuffle.FILES**:仅混洗文件。 - - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数,默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号,默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 + - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 + - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 - **cache** (DatasetCache, 可选) - 单节点数据缓存服务,用于加快数据集处理,详情请阅读 `单节点数据缓存 `_ 。默认值:None,不使用缓存。 异常: diff --git a/docs/api/api_python/dataset/mindspore.dataset.YahooAnswersDataset.rst b/docs/api/api_python/dataset/mindspore.dataset.YahooAnswersDataset.rst index e5806082447..d917fc41d95 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.YahooAnswersDataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.YahooAnswersDataset.rst @@ -13,15 +13,15 @@ mindspore.dataset.YahooAnswersDataset 取值为'train'时将会读取1,400,000个训练样本,取值为'test'时将会读取60,000个测试样本,取值为'all'时将会读取全部1,460,000个样本。默认值:None,读取全部样本。 - **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值:None,读取全部样本。 - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。 - - **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式,支持传入bool类型与枚举类型进行指定,默认值:mindspore.dataset.Shuffle.GLOBAL。 + - **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式,支持传入bool类型与枚举类型进行指定。默认值:mindspore.dataset.Shuffle.GLOBAL。 如果 `shuffle` 为False,则不混洗,如果 `shuffle` 为True,等同于将 `shuffle` 设置为mindspore.dataset.Shuffle.GLOBAL。 通过传入枚举变量设置数据混洗的模式: - **Shuffle.GLOBAL**:混洗文件和样本。 - **Shuffle.FILES**:仅混洗文件。 - - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数,默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号,默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 + - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 + - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 - **cache** (DatasetCache, 可选) - 单节点数据缓存服务,用于加快数据集处理,详情请阅读 `单节点数据缓存 `_ 。默认值:None,不使用缓存。 异常: diff --git a/docs/api/api_python/dataset/mindspore.dataset.YelpReviewDataset.rst b/docs/api/api_python/dataset/mindspore.dataset.YelpReviewDataset.rst index f8ead16a7b1..4fd553cfad9 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.YelpReviewDataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.YelpReviewDataset.rst @@ -13,15 +13,15 @@ mindspore.dataset.YelpReviewDataset 对于Polarity数据集,'train'将读取560,000个训练样本,'test'将读取38,000个测试样本,'all'将读取所有598,000个样本。 对于Full数据集,'train'将读取650,000个训练样本,'test'将读取50,000个测试样本,'all'将读取所有700,000个样本。默认值:None,读取所有样本。 - **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值:None,读取全部样本。 - - **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式,支持传入bool类型与枚举类型进行指定,默认值:mindspore.dataset.Shuffle.GLOBAL。 + - **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式,支持传入bool类型与枚举类型进行指定。默认值:mindspore.dataset.Shuffle.GLOBAL。 如果 `shuffle` 为False,则不混洗,如果 `shuffle` 为True,等同于将 `shuffle` 设置为mindspore.dataset.Shuffle.GLOBAL。 通过传入枚举变量设置数据混洗的模式: - **Shuffle.GLOBAL**:混洗文件和样本。 - **Shuffle.FILES**:仅混洗文件。 - - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数,默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号,默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 + - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 + - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。 - **cache** (DatasetCache, 可选) - 单节点数据缓存服务,用于加快数据集处理,详情请阅读 `单节点数据缓存 `_ 。默认值:None,不使用缓存。 diff --git a/docs/api/api_python/dataset/mindspore.dataset.YesNoDataset.rst b/docs/api/api_python/dataset/mindspore.dataset.YesNoDataset.rst index 27208ac1ba0..bd70a6a4996 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.YesNoDataset.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.YesNoDataset.rst @@ -12,10 +12,10 @@ mindspore.dataset.YesNoDataset - **dataset_dir** (str) - 包含数据集文件的根目录路径。 - **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值:None,读取全部样本。 - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值:None,使用mindspore.dataset.config中配置的线程数。 - - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None,下表中会展示不同参数配置的预期行为。 - - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器,默认值:None,下表中会展示不同配置的预期行为。 - - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数,默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 - - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号,默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 + - **shuffle** (bool, 可选) - 是否混洗数据集。默认值:None。下表中会展示不同参数配置的预期行为。 + - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值:None。下表中会展示不同配置的预期行为。 + - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值:None。指定此参数后, `num_samples` 表示每个分片的最大样本数。 + - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值:None。只有当指定了 `num_shards` 时才能指定此参数。 - **cache** (DatasetCache, 可选) - 单节点数据缓存服务,用于加快数据集处理,详情请阅读 `单节点数据缓存 `_ 。默认值:None,不使用缓存。 异常: diff --git a/docs/api/api_python/dataset/mindspore.dataset.serialize.rst b/docs/api/api_python/dataset/mindspore.dataset.serialize.rst index 30fcf04a454..4fcb054f381 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.serialize.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.serialize.rst @@ -12,7 +12,7 @@ 参数: - **dataset** (Dataset) - 数据处理管道对象。 - - **json_filepath** (str) - 生成序列化JSON文件的路径,默认值:'',不指定JSON路径。 + - **json_filepath** (str) - 生成序列化JSON文件的路径。默认值:'',不指定JSON路径。 返回: Dict,包含序列化数据集图的字典。 diff --git a/docs/api/api_python/dataset/mindspore.dataset.utils.imshow_det_bbox.rst b/docs/api/api_python/dataset/mindspore.dataset.utils.imshow_det_bbox.rst index 2b951e0aead..6c49413f222 100644 --- a/docs/api/api_python/dataset/mindspore.dataset.utils.imshow_det_bbox.rst +++ b/docs/api/api_python/dataset/mindspore.dataset.utils.imshow_det_bbox.rst @@ -9,18 +9,18 @@ - **image** (numpy.ndarray) - 待绘制的图像,shape为(C, H, W)或(H, W, C),通道顺序为RGB。 - **bboxes** (numpy.ndarray) - 边界框(包含类别置信度),shape为(N, 4)或(N, 5),格式为(N,X,Y,W,H)。 - **labels** (numpy.ndarray) - 边界框的类别,shape为(N, 1)。 - - **segm** (numpy.ndarray) - 图像分割掩码,shape为(M, H, W),M表示类别总数,默认值:None,不绘制掩码。 - - **class_names** (list[str], tuple[str], dict) - 类别索引到类别名的映射表,默认值:None,仅显示类别索引。 - - **score_threshold** (float) - 绘制边界框的类别置信度阈值,默认值:0,绘制所有边界框。 - - **bbox_color** (tuple(int)) - 指定绘制边界框时线条的颜色,顺序为BGR,默认值:(0,255,0),表示绿色。 - - **text_color** (tuple(int)) - 指定类别文本的显示颜色,顺序为BGR,默认值:(203, 192, 255),表示粉色。 - - **mask_color** (tuple(int)) - 指定掩码的显示颜色,顺序为BGR,默认值:(128, 0, 128),表示紫色。 - - **thickness** (int) - 指定边界框和类别文本的线条粗细,默认值:2。 - - **font_size** (int, float) - 指定类别文本字体大小,默认值:0.8。 - - **show** (bool) - 是否显示图像,默认值:True。 - - **win_name** (str) - 指定窗口名称,默认值:"win"。 - - **wait_time** (int) - 指定cv2.waitKey的时延,单位为ms,即图像显示的自动切换间隔,默认值:2000,表示间隔为2000ms。 - - **out_file** (str, 可选) - 输出图像的文件路径,用于在绘制后将结果存储到本地,默认值:None,不保存。 + - **segm** (numpy.ndarray) - 图像分割掩码,shape为(M, H, W),M表示类别总数。默认值:None,不绘制掩码。 + - **class_names** (list[str], tuple[str], dict) - 类别索引到类别名的映射表。默认值:None,仅显示类别索引。 + - **score_threshold** (float) - 绘制边界框的类别置信度阈值。默认值:0,绘制所有边界框。 + - **bbox_color** (tuple(int)) - 指定绘制边界框时线条的颜色,顺序为BGR。默认值:(0,255,0),表示绿色。 + - **text_color** (tuple(int)) - 指定类别文本的显示颜色,顺序为BGR。默认值:(203, 192, 255),表示粉色。 + - **mask_color** (tuple(int)) - 指定掩码的显示颜色,顺序为BGR。默认值:(128, 0, 128),表示紫色。 + - **thickness** (int) - 指定边界框和类别文本的线条粗细。默认值:2。 + - **font_size** (int, float) - 指定类别文本字体大小。默认值:0.8。 + - **show** (bool) - 是否显示图像。默认值:True。 + - **win_name** (str) - 指定窗口名称。默认值:"win"。 + - **wait_time** (int) - 指定cv2.waitKey的时延,单位为ms,即图像显示的自动切换间隔。默认值:2000,表示间隔为2000ms。 + - **out_file** (str, 可选) - 输出图像的文件路径,用于在绘制后将结果存储到本地。默认值:None,不保存。 返回: numpy.ndarray,带边界框和类别置信度的图像。 diff --git a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.AllpassBiquad.rst b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.AllpassBiquad.rst index e82b6e834db..d0949f1f8b8 100644 --- a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.AllpassBiquad.rst +++ b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.AllpassBiquad.rst @@ -17,7 +17,7 @@ mindspore.dataset.audio.AllpassBiquad 参数: - **sample_rate** (int) - 采样频率(单位:Hz),不能为零。 - **central_freq** (float) - 中心频率(单位:Hz)。 - - **Q** (float, 可选) - `品质因子 `_ ,能够反映带宽与采样频率和中心频率的关系,取值范围为(0, 1],默认值:0.707。 + - **Q** (float, 可选) - `品质因子 `_ ,能够反映带宽与采样频率和中心频率的关系,取值范围为(0, 1]。默认值:0.707。 异常: - **TypeError** - 当 `sample_rate` 的类型不为int。 diff --git a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.AmplitudeToDB.rst b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.AmplitudeToDB.rst index 00b5b726638..3ac077e0bf9 100644 --- a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.AmplitudeToDB.rst +++ b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.AmplitudeToDB.rst @@ -8,12 +8,12 @@ mindspore.dataset.audio.AmplitudeToDB .. note:: 待处理音频维度需为(..., freq, time)。 参数: - - **stype** ( :class:`mindspore.dataset.audio.ScaleType` , 可选) - 输入音频的原始标度,取值可为ScaleType.MAGNITUDE或ScaleType.POWER,默认值:ScaleType.POWER。 - - **ref_value** (float, 可选) - 系数参考值,默认值:1.0,用于计算分贝系数 `db_multiplier` ,公式为 + - **stype** ( :class:`mindspore.dataset.audio.ScaleType` , 可选) - 输入音频的原始标度,取值可为ScaleType.MAGNITUDE或ScaleType.POWER。默认值:ScaleType.POWER。 + - **ref_value** (float, 可选) - 系数参考值。默认值:1.0,用于计算分贝系数 `db_multiplier` ,公式为 :math:`db\_multiplier = Log10(max(ref\_value, amin))`。 - - **amin** (float, 可选) - 波形取值下界,低于该值的波形将会被裁切,取值必须大于0,默认值:1e-10。 - - **top_db** (float, 可选) - 最小截止分贝值,取值为非负数,默认值:80.0。 + - **amin** (float, 可选) - 波形取值下界,低于该值的波形将会被裁切,取值必须大于0。默认值:1e-10。 + - **top_db** (float, 可选) - 最小截止分贝值,取值为非负数。默认值:80.0。 异常: - **TypeError** - 当 `stype` 的类型不为 :class:`mindspore.dataset.audio.utils.ScaleType` 。 diff --git a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.BandBiquad.rst b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.BandBiquad.rst index 0bb3872e3bc..1abdf304dc3 100644 --- a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.BandBiquad.rst +++ b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.BandBiquad.rst @@ -14,8 +14,8 @@ mindspore.dataset.audio.BandBiquad 参数: - **sample_rate** (int) - 采样频率(单位:Hz),不能为零。 - **central_freq** (float) - 中心频率(单位:Hz)。 - - **Q** (float, 可选) - `品质因子 `_ ,能够反映带宽与采样频率和中心频率的关系,取值范围为(0, 1],默认值:0.707。 - - **noise** (bool, 可选) - 若为True,则使用非音调音频(如打击乐)模式;若为False,则使用音调音频(如语音、歌曲或器乐)模式,默认值:False。 + - **Q** (float, 可选) - `品质因子 `_ ,能够反映带宽与采样频率和中心频率的关系,取值范围为(0, 1]。默认值:0.707。 + - **noise** (bool, 可选) - 若为True,则使用非音调音频(如打击乐)模式;若为False,则使用音调音频(如语音、歌曲或器乐)模式。默认值:False。 异常: - **TypeError** - 当 `sample_rate` 的类型不为int。 diff --git a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.BandpassBiquad.rst b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.BandpassBiquad.rst index 7ae28d89f52..ca76d3d1907 100644 --- a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.BandpassBiquad.rst +++ b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.BandpassBiquad.rst @@ -22,7 +22,7 @@ mindspore.dataset.audio.BandpassBiquad 参数: - **sample_rate** (int) - 采样频率(单位:Hz),不能为零。 - **central_freq** (float) - 中心频率(单位:Hz)。 - - **Q** (float, 可选) - `品质因子 `_ ,能够反映带宽与采样频率和中心频率的关系,取值范围为(0, 1],默认值:0.707。 + - **Q** (float, 可选) - `品质因子 `_ ,能够反映带宽与采样频率和中心频率的关系,取值范围为(0, 1]。默认值:0.707。 - **const_skirt_gain** (bool, 可选) - 若为True,则使用恒定裙边增益(峰值增益为Q);若为False,则使用恒定的0dB峰值增益。默认值:False。 异常: diff --git a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.BandrejectBiquad.rst b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.BandrejectBiquad.rst index ede8d1aa6ef..b735495ded7 100644 --- a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.BandrejectBiquad.rst +++ b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.BandrejectBiquad.rst @@ -19,7 +19,7 @@ mindspore.dataset.audio.BandrejectBiquad 参数: - **sample_rate** (int) - 采样频率(单位:Hz),不能为零。 - **central_freq** (float) - 中心频率(单位:Hz)。 - - **Q** (float, 可选) - `品质因子 `_ ,能够反映带宽与采样频率和中心频率的关系,取值范围为(0, 1],默认值:0.707。 + - **Q** (float, 可选) - `品质因子 `_ ,能够反映带宽与采样频率和中心频率的关系,取值范围为(0, 1]。默认值:0.707。 异常: - **TypeError** - 当 `sample_rate` 的类型不为int。 diff --git a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.BassBiquad.rst b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.BassBiquad.rst index 4e399066b3d..e2634d62040 100644 --- a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.BassBiquad.rst +++ b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.BassBiquad.rst @@ -17,8 +17,8 @@ mindspore.dataset.audio.BassBiquad 参数: - **sample_rate** (int) - 采样频率(单位:Hz),不能为零。 - **gain** (float) - 期望提升(或衰减)的音频增益(单位:dB)。 - - **central_freq** (float, 可选) - 中心频率(单位:Hz),默认值:100.0。 - - **Q** (float, 可选) - `品质因子 `_ ,能够反映带宽与采样频率和中心频率的关系,取值范围为(0, 1],默认值:0.707。 + - **central_freq** (float, 可选) - 中心频率(单位:Hz)。默认值:100.0。 + - **Q** (float, 可选) - `品质因子 `_ ,能够反映带宽与采样频率和中心频率的关系,取值范围为(0, 1]。默认值:0.707。 异常: - **TypeError** - 当 `sample_rate` 的类型不为int。 diff --git a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.ComplexNorm.rst b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.ComplexNorm.rst index 41b3ed91860..feba2689ed2 100644 --- a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.ComplexNorm.rst +++ b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.ComplexNorm.rst @@ -8,7 +8,7 @@ mindspore.dataset.audio.ComplexNorm .. note:: 待处理音频维度需为(..., complex=2)。第0维代表实部,第1维代表虚部。 参数: - - **power** (float, 可选) - 范数的幂,取值必须非负,默认值:1.0。 + - **power** (float, 可选) - 范数的幂,取值必须非负。默认值:1.0。 异常: - **TypeError** - 当 `power` 的类型不为float。 diff --git a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.ComputeDeltas.rst b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.ComputeDeltas.rst index 2723b07b916..6969e3fd75a 100644 --- a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.ComputeDeltas.rst +++ b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.ComputeDeltas.rst @@ -11,7 +11,7 @@ mindspore.dataset.audio.ComputeDeltas 其中, :math:`d_{t}` 是时间 :math:`t` 的增量, :math:`c_{t}` 是时间 :math:`t` 的频谱图系数, :math:`N` 是 :math:`(\text{win_length}-1)//2` 。 参数: - - **win_length** (int, 可选) - 计算窗口长度,长度必须不小于3,默认值:5。 + - **win_length** (int, 可选) - 计算窗口长度,长度必须不小于3。默认值:5。 - **pad_mode** (:class:`mindspore.dataset.audio.BorderType`, 可选) - 边界填充模式,可以是 [BorderType.CONSTANT, BorderType.EDGE, BorderType.REFLECT, BordBorderTypeer.SYMMETRIC]中任何一个。 默认值:BorderType.EDGE。 diff --git a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.Contrast.rst b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.Contrast.rst index f8a21e634ca..0813837fdac 100644 --- a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.Contrast.rst +++ b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.Contrast.rst @@ -12,7 +12,7 @@ mindspore.dataset.audio.Contrast .. note:: 待处理音频维度需为(..., time)。 参数: - - **enhancement_amount** (float, 可选) - 控制音频增益的量,取值范围为[0,100],默认值:75.0。请注意当 `enhancement_amount` 等于0时,对比度增强效果仍然会很显著。 + - **enhancement_amount** (float, 可选) - 控制音频增益的量,取值范围为[0,100]。默认值:75.0。请注意当 `enhancement_amount` 等于0时,对比度增强效果仍然会很显著。 异常: - **TypeError** - 当 `enhancement_amount` 的类型不为float。 diff --git a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.EqualizerBiquad.rst b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.EqualizerBiquad.rst index 54f46b3f405..ca66a8eb180 100644 --- a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.EqualizerBiquad.rst +++ b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.EqualizerBiquad.rst @@ -11,4 +11,4 @@ mindspore.dataset.audio.EqualizerBiquad - **sample_rate** (int) - 采样频率(单位:Hz),值不能为零。 - **center_freq** (float) - 中心频率(单位:Hz)。 - **gain** (float) - 期望提升(或衰减)的音频增益(单位:dB)。 - - **Q** (float, 可选) - `品质因子 `_ ,能够反映带宽与采样频率和中心频率的关系,取值范围为(0, 1],默认值:0.707。 + - **Q** (float, 可选) - `品质因子 `_ ,能够反映带宽与采样频率和中心频率的关系,取值范围为(0, 1]。默认值:0.707。 diff --git a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.FrequencyMasking.rst b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.FrequencyMasking.rst index e245d426bdd..21d412b9b1a 100644 --- a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.FrequencyMasking.rst +++ b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.FrequencyMasking.rst @@ -8,10 +8,10 @@ mindspore.dataset.audio.FrequencyMasking .. note:: 待处理音频维度需为(..., freq, time)。 参数: - - **iid_masks** (bool, 可选) - 是否施加随机掩码,默认值:False。 - - **freq_mask_param** (int, 可选) - 当 `iid_masks` 为True时,掩码长度将从[0, freq_mask_param]中均匀采样;当 `iid_masks` 为False时,直接使用该值作为掩码长度。取值范围为[0, freq_length],其中 `freq_length` 为音频波形在频域的长度,默认值:0。 - - **mask_start** (int, 可选) - 添加掩码的起始位置,只有当 `iid_masks` 为True时,该值才会生效。取值范围为[0, freq_length - frequency_mask_param],其中 `freq_length` 为音频波形在频域的长度,默认值:0。 - - **mask_value** (float, 可选) - 掩码填充值,默认值:0.0。 + - **iid_masks** (bool, 可选) - 是否施加随机掩码。默认值:False。 + - **freq_mask_param** (int, 可选) - 当 `iid_masks` 为True时,掩码长度将从[0, freq_mask_param]中均匀采样;当 `iid_masks` 为False时,直接使用该值作为掩码长度。取值范围为[0, freq_length],其中 `freq_length` 为音频波形在频域的长度。默认值:0。 + - **mask_start** (int, 可选) - 添加掩码的起始位置,只有当 `iid_masks` 为True时,该值才会生效。取值范围为[0, freq_length - frequency_mask_param],其中 `freq_length` 为音频波形在频域的长度。默认值:0。 + - **mask_value** (float, 可选) - 掩码填充值。默认值:0.0。 异常: - **TypeError** - 当 `iid_masks` 的类型不为bool。 diff --git a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.GriffinLim.rst b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.GriffinLim.rst index d29b063da89..6c706e05cbf 100644 --- a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.GriffinLim.rst +++ b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.GriffinLim.rst @@ -11,15 +11,15 @@ mindspore.dataset.audio.GriffinLim 其中w表示窗口函数,y表示每个帧的重建信号,x表示整个信号。 参数: - - **n_fft** (int, 可选) - FFT的长度,默认值:400。 - - **n_iter** (int, 可选) - 相位恢复的迭代次数,默认值:32。 - - **win_length** (int, 可选) - GriffinLim的窗口大小,默认值:None,将设置为 `n_fft` 的值。 - - **hop_length** (int, 可选) - STFT窗口之间的跳数长度,默认值:None,将设置为 `win_length//2` 。 + - **n_fft** (int, 可选) - FFT的长度。默认值:400。 + - **n_iter** (int, 可选) - 相位恢复的迭代次数。默认值:32。 + - **win_length** (int, 可选) - GriffinLim的窗口大小。默认值:None,将设置为 `n_fft` 的值。 + - **hop_length** (int, 可选) - STFT窗口之间的跳数长度。默认值:None,将设置为 `win_length//2` 。 - **window_type** (WindowType, 可选) - GriffinLim的窗口类型,可以是WindowType.BARTLETT, WindowType.BLACKMAN,WindowType.HAMMING,WindowType.HANN或WindowType.KAISER。 默认值:WindowType.HANN,目前macOS上不支持kaiser窗口。 - - **power** (float, 可选) - 幅度谱图的指数,默认值:2.0。 - - **momentum** (float, 可选) - 快速Griffin-Lim的动量,默认值:0.99。 + - **power** (float, 可选) - 幅度谱图的指数。默认值:2.0。 + - **momentum** (float, 可选) - 快速Griffin-Lim的动量。默认值:0.99。 - **length** (int, 可选) - 预期输出波形的长度。默认值:None,将设置为stft矩阵的最后一个维度的值。 - - **rand_init** (bool, 可选) - 随机相位初始化或全零相位初始化标志,默认值:True。 + - **rand_init** (bool, 可选) - 随机相位初始化或全零相位初始化标志。默认值:True。 \ No newline at end of file diff --git a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.HighpassBiquad.rst b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.HighpassBiquad.rst index 0b5cd33e74e..f630ef494af 100644 --- a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.HighpassBiquad.rst +++ b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.HighpassBiquad.rst @@ -10,4 +10,4 @@ mindspore.dataset.audio.HighpassBiquad 参数: - **sample_rate** (int) - 采样频率(单位:Hz),不能为零。 - **cutoff_freq** (float) - 中心频率(单位:Hz)。 - - **Q** (float, 可选) - `品质因子 `_ ,能够反映带宽与采样频率和中心频率的关系,取值范围为(0, 1],默认值:0.707。 + - **Q** (float, 可选) - `品质因子 `_ ,能够反映带宽与采样频率和中心频率的关系,取值范围为(0, 1]。默认值:0.707。 diff --git a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.InverseMelScale.rst b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.InverseMelScale.rst index 018a0ec15c2..2ad2a9286ed 100644 --- a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.InverseMelScale.rst +++ b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.InverseMelScale.rst @@ -7,14 +7,14 @@ mindspore.dataset.audio.InverseMelScale 参数: - **n_stft** (int) - STFT中的滤波器的组数。 - - **n_mels** (int, 可选) - mel滤波器的数量,默认值:128。 - - **sample_rate** (int, 可选) - 音频信号采样频率,默认值:16000。 - - **f_min** (float, 可选) - 最小频率,默认值:0.0。 - - **f_max** (float, 可选) - 最大频率,默认值:None,将设置为 `sample_rate//2` 。 - - **max_iter** (int, 可选) - 最大优化迭代次数,默认值:100000。 - - **tolerance_loss** (float, 可选) - 当达到损失值时停止优化,默认值:1e-5。 - - **tolerance_change** (float, 可选) - 指定损失差异,当达到损失差异时停止优化,默认值:1e-8。 - - **sgdargs** (dict, 可选) - SGD优化器的参数,默认值:None,将设置为{'sgd_lr': 0.1, 'sgd_momentum': 0.9}。 + - **n_mels** (int, 可选) - mel滤波器的数量。默认值:128。 + - **sample_rate** (int, 可选) - 音频信号采样频率。默认值:16000。 + - **f_min** (float, 可选) - 最小频率。默认值:0.0。 + - **f_max** (float, 可选) - 最大频率。默认值:None,将设置为 `sample_rate//2` 。 + - **max_iter** (int, 可选) - 最大优化迭代次数。默认值:100000。 + - **tolerance_loss** (float, 可选) - 当达到损失值时停止优化。默认值:1e-5。 + - **tolerance_change** (float, 可选) - 指定损失差异,当达到损失差异时停止优化。默认值:1e-8。 + - **sgdargs** (dict, 可选) - SGD优化器的参数。默认值:None,将设置为{'sgd_lr': 0.1, 'sgd_momentum': 0.9}。 - **norm** (NormType, 可选) - 标准化方法,可以是NormType.SLANEY或NormType.NONE。默认值:NormType.NONE。 - **mel_type** (MelType, 可选) - 要使用的Mel比例,可以是MelType.SLAN或MelType.HTK。默认值:MelType.HTK。 \ No newline at end of file diff --git a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.LFilter.rst b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.LFilter.rst index 7c398f006ee..08298d342b8 100644 --- a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.LFilter.rst +++ b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.LFilter.rst @@ -10,7 +10,7 @@ mindspore.dataset.audio.LFilter 大小必须与 `b_coeffs` 相同(根据需要填充0)。 - **b_coeffs** (sequence) - (n_order + 1)维数差分方程的分子系数。较低的延迟系数是第一位的,例如[b0, b1, b2, ...]。 大小必须与 `a_coeffs` 相同(根据需要填充0)。 - - **clamp** (bool, 可选) - 如果为True,则将输出信号截断在[-1, 1]范围内,默认值:True。 + - **clamp** (bool, 可选) - 如果为True,则将输出信号截断在[-1, 1]范围内。默认值:True。 异常: - **RuntimeError** - 当输入音频的shape不为<..., time>。 diff --git a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.LowpassBiquad.rst b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.LowpassBiquad.rst index 676ccec593a..0d0180a3e61 100644 --- a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.LowpassBiquad.rst +++ b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.LowpassBiquad.rst @@ -17,7 +17,7 @@ mindspore.dataset.audio.LowpassBiquad 参数: - **sample_rate** (int) - 采样频率(单位:Hz),不能为零。 - **cutoff_freq** (float) - 滤波器截止频率(单位:Hz)。 - - **Q** (float, 可选) - `品质因子 `_ ,能够反映带宽与采样频率和中心频率的关系,取值范围(0, 1],默认值:0.707。 + - **Q** (float, 可选) - `品质因子 `_ ,能够反映带宽与采样频率和中心频率的关系,取值范围(0, 1]。默认值:0.707。 异常: - **TypeError** - 当 `sample_rate` 的类型不为int。 diff --git a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.Magphase.rst b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.Magphase.rst index e072859a80c..c585759bcd9 100644 --- a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.Magphase.rst +++ b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.Magphase.rst @@ -6,7 +6,7 @@ mindspore.dataset.audio.Magphase 将具有(..., 2)形状的复值光谱图分离,输出幅度和相位。 参数: - - **power** (float) - 范数的功率,必须是非负的,默认值:1.0。 + - **power** (float) - 范数的功率,必须是非负的。默认值:1.0。 异常: - **RuntimeError** - 当输入音频的shape不为<..., 2>。 diff --git a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.MelScale.rst b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.MelScale.rst index 3cc475f1a51..2e3ff07e74f 100644 --- a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.MelScale.rst +++ b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.MelScale.rst @@ -6,11 +6,11 @@ mindspore.dataset.audio.MelScale 将正常STFT转换为梅尔尺度的STFT。 参数: - - **n_mels** (int, 可选) - 梅尔滤波器的数量,默认值:128。 - - **sample_rate** (int, 可选) - 音频信号采样速率,默认值:16000。 - - **f_min** (float, 可选) - 最小频率,默认值:0.0。 - - **f_max** (float, 可选) - 最大频率,默认值:None,将设置为 `sample_rate//2` 。 - - **n_stft** (int, 可选) - STFT中的滤波器的组数,默认值:201。 + - **n_mels** (int, 可选) - 梅尔滤波器的数量。默认值:128。 + - **sample_rate** (int, 可选) - 音频信号采样速率。默认值:16000。 + - **f_min** (float, 可选) - 最小频率。默认值:0.0。 + - **f_max** (float, 可选) - 最大频率。默认值:None,将设置为 `sample_rate//2` 。 + - **n_stft** (int, 可选) - STFT中的滤波器的组数。默认值:201。 - **norm** (NormType, 可选) - 标准化方法,可以是NormType.SLANEY或NormType.NONE。默认值:NormType.NONE。 - **mel_type** (MelType, 可选) - 要使用的Mel比例,可以是MelType.SLAN或MelType.HTK。默认值:MelType.HTK。 \ No newline at end of file diff --git a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.Phaser.rst b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.Phaser.rst index 22b0e8d0f11..974d2053bce 100644 --- a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.Phaser.rst +++ b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.Phaser.rst @@ -7,11 +7,11 @@ mindspore.dataset.audio.Phaser 参数: - **sample_rate** (int) - 波形的采样率,例如44100 (Hz)。 - - **gain_in** (float, 可选) - 期望提升(或衰减)所需输入增益,单位为dB。允许的值范围为[0, 1],默认值:0.4。 - - **gain_out** (float, 可选) - 期望提升(或衰减)期望输出增益,单位为dB。允许的值范围为[0, 1e9],默认值:0.74。 - - **delay_ms** (float, 可选) - 延迟数,以毫秒为单位。允许的值范围为[0, 5],默认值:3.0。 - - **decay** (float, 可选) - 增益的期望衰减系数。允许的值范围为[0, 0.99],默认值:0.4。 - - **mod_speed** (float, 可选) - 调制速度,单位为Hz。允许的值范围为[0.1, 2],默认值:0.5。 + - **gain_in** (float, 可选) - 期望提升(或衰减)所需输入增益,单位为dB。允许的值范围为[0, 1]。默认值:0.4。 + - **gain_out** (float, 可选) - 期望提升(或衰减)期望输出增益,单位为dB。允许的值范围为[0, 1e9]。默认值:0.74。 + - **delay_ms** (float, 可选) - 延迟数,以毫秒为单位。允许的值范围为[0, 5]。默认值:3.0。 + - **decay** (float, 可选) - 增益的期望衰减系数。允许的值范围为[0, 0.99]。默认值:0.4。 + - **mod_speed** (float, 可选) - 调制速度,单位为Hz。允许的值范围为[0.1, 2]。默认值:0.5。 - **sinusoidal** (bool, 可选) - 如果为True,请使用正弦调制(对于多个乐器效果最好)。 如果为False,则使用triangular modulation(使单个乐器具有更清晰的相位效果)。默认值:True。 \ No newline at end of file diff --git a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.Resample.rst b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.Resample.rst index 65201821f4d..3f815a13463 100644 --- a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.Resample.rst +++ b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.Resample.rst @@ -6,11 +6,11 @@ mindspore.dataset.audio.Resample 将音频波形从一个频率重新采样到另一个频率。必要时可以指定重采样方法。 参数: - - **orig_freq** (float, 可选) - 音频波形的原始频率,必须为正,默认值:16000。 - - **new_freq** (float, 可选) - 目标音频波形频率,必须为正,默认值:16000。 + - **orig_freq** (float, 可选) - 音频波形的原始频率,必须为正。默认值:16000。 + - **new_freq** (float, 可选) - 目标音频波形频率,必须为正。默认值:16000。 - **resample_method** (ResampleMethod, 可选) - 重采样方法,可以是ResampleMethod.SINC_INTERPOLATION和ResampleMethod.KAISER_WINDOW。 默认值=ResampleMethod.SINC_INTERPOLATION。 - **lowpass_filter_width** (int, 可选) - 控制滤波器的宽度,越多意味着更清晰,但效率越低,必须为正。默认值:6。 - **rolloff** (float, 可选) - 滤波器的滚降频率,作为Nyquist的一小部分。 较低的值减少了抗锯齿,但也减少了一些最高频率,范围:(0, 1]。默认值:0.99。 - - **beta** (float, 可选) - 用于kaiser窗口的形状参数,默认值:None,将使用14.769656459379492。 + - **beta** (float, 可选) - 用于kaiser窗口的形状参数。默认值:None,将使用14.769656459379492。 diff --git a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.SlidingWindowCmn.rst b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.SlidingWindowCmn.rst index f409b3e8d9d..7213681f80d 100644 --- a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.SlidingWindowCmn.rst +++ b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.SlidingWindowCmn.rst @@ -6,9 +6,9 @@ mindspore.dataset.audio.SlidingWindowCmn 对每个话语应用滑动窗口倒谱均值(和可选方差)归一化。 参数: - - **cmn_window** (int, 可选) - 用于运行平均CMN计算的帧中窗口,默认值:600。 + - **cmn_window** (int, 可选) - 用于运行平均CMN计算的帧中窗口。默认值:600。 - **min_cmn_window** (int, 可选) - 解码开始时使用的最小CMN窗口(仅在开始时增加延迟)。 - 仅在中心为False时适用,在中心为True时忽略,默认值:100。 + 仅在中心为False时适用,在中心为True时忽略。默认值:100。 - **center** (bool, 可选) - 如果为True,则使用以当前帧为中心的窗口。如果为False,则窗口在左侧。默认值:False。 - **norm_vars** (bool, 可选) - 如果为True,则将方差规范化为1。默认值:False。 diff --git a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.SpectralCentroid.rst b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.SpectralCentroid.rst index 0a29fba76c3..eba1343aa1a 100644 --- a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.SpectralCentroid.rst +++ b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.SpectralCentroid.rst @@ -8,8 +8,8 @@ mindspore.dataset.audio.SpectralCentroid 参数: - **sample_rate** (int) - 波形的采样率,例如44100 (Hz)。 - **n_fft** (int, 可选) - FFT的大小,创建n_fft // 2 + 1 bins。默认值:400。 - - **win_length** (int, 可选) - 窗口大小,默认值:None,将设置为 `n_fft` 的值。 - - **hop_length** (int, 可选) - STFT窗口之间的跳数长度,默认值:None,将设置为 `win_length//2` 。 - - **pad** (int, 可选) - 信号的两侧填充数量,默认值:0。 + - **win_length** (int, 可选) - 窗口大小。默认值:None,将设置为 `n_fft` 的值。 + - **hop_length** (int, 可选) - STFT窗口之间的跳数长度。默认值:None,将设置为 `win_length//2` 。 + - **pad** (int, 可选) - 信号的两侧填充数量。默认值:0。 - **window** (WindowType, 可选) - 窗口函数,可以是WindowType.BARTLETT、WindowType.BLACKMAN、 WindowType.HAMMING、WindowType.HANN或WindowType.KAISER。默认值:WindowType.HANN。 diff --git a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.Spectrogram.rst b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.Spectrogram.rst index 783c5dfd21e..8647642866b 100644 --- a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.Spectrogram.rst +++ b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.Spectrogram.rst @@ -6,17 +6,17 @@ mindspore.dataset.audio.Spectrogram 从音频信号创建光谱图。 参数: - - **n_fft** (int, 可选) - FFT的大小,创建 `n_fft // 2 + 1` 组滤波器,默认值:400。 - - **win_length** (int, 可选) - 窗口大小,默认值:None,将设置为 `n_fft` 的值。 - - **hop_length** (int, 可选) - STFT窗口之间的跳数长度,默认值:None,将设置为 `win_length//2` 。 - - **pad** (int, 可选) - 信号的双面填充,默认值:0。 + - **n_fft** (int, 可选) - FFT的大小,创建 `n_fft // 2 + 1` 组滤波器。默认值:400。 + - **win_length** (int, 可选) - 窗口大小。默认值:None,将设置为 `n_fft` 的值。 + - **hop_length** (int, 可选) - STFT窗口之间的跳数长度。默认值:None,将设置为 `win_length//2` 。 + - **pad** (int, 可选) - 信号的双面填充。默认值:0。 - **window** (WindowType, 可选) - GriffinLim的窗口类型,可以是WindowType.BARTLETT, WindowType.BLACKMAN,WindowType.HAMMING,WindowType.HANN或WindowType.KAISER。 默认值:WindowType.HANN,目前macOS上不支持kaiser窗口。 - - **power** (float, 可选) - 幅度谱图的指数,默认值:2.0。 + - **power** (float, 可选) - 幅度谱图的指数。默认值:2.0。 - **normalized** (bool, 可选) - 是否在stft之后按幅度归一化。默认值:False。 - - **center** (bool, 可选) - 是否在两侧填充波形,默认值:True。 + - **center** (bool, 可选) - 是否在两侧填充波形。默认值:True。 - **pad_mode** (BorderType, 可选) - 控制中心为True时使用的填充方法,可以是BorderType.REFLECT、BorderType.CONSTANT、 - BorderType.EDGE、BorderType.SYMMETRIC,默认值:BorderType.REFLECT。 - - **onesided** (bool, 可选) - 控制是否返回一半结果以避免冗余,默认值:True。 + BorderType.EDGE、BorderType.SYMMETRIC。默认值:BorderType.REFLECT。 + - **onesided** (bool, 可选) - 控制是否返回一半结果以避免冗余。默认值:True。 \ No newline at end of file diff --git a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.TimeMasking.rst b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.TimeMasking.rst index a74ffb67d8c..553fecb0dd7 100644 --- a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.TimeMasking.rst +++ b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.TimeMasking.rst @@ -8,10 +8,10 @@ mindspore.dataset.audio.TimeMasking .. note:: 待处理音频维度需为(..., freq, time)。 参数: - - **iid_masks** (bool, 可选) - 是否施加随机掩码,默认值:False。 - - **time_mask_param** (int, 可选) - 当 `iid_masks` 为True时,掩码长度将从[0, time_mask_param]中均匀采样;当 `iid_masks` 为False时,直接使用该值作为掩码的长度。取值范围为[0, time_length],其中 `time_length` 为音频波形在时域的长度,默认值:0。 - - **mask_start** (int, 可选) - 添加掩码的起始位置,只有当 `iid_masks` 为True时,该值才会生效。取值范围为[0, time_length - time_mask_param],其中 `time_length` 为音频波形在时域的长度,默认值:0。 - - **mask_value** (float, 可选) - 掩码填充值,默认值:0.0。 + - **iid_masks** (bool, 可选) - 是否施加随机掩码。默认值:False。 + - **time_mask_param** (int, 可选) - 当 `iid_masks` 为True时,掩码长度将从[0, time_mask_param]中均匀采样;当 `iid_masks` 为False时,直接使用该值作为掩码的长度。取值范围为[0, time_length],其中 `time_length` 为音频波形在时域的长度。默认值:0。 + - **mask_start** (int, 可选) - 添加掩码的起始位置,只有当 `iid_masks` 为True时,该值才会生效。取值范围为[0, time_length - time_mask_param],其中 `time_length` 为音频波形在时域的长度。默认值:0。 + - **mask_value** (float, 可选) - 掩码填充值。默认值:0.0。 异常: - **TypeError** - 当 `iid_masks` 的类型不为bool。 diff --git a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.TimeStretch.rst b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.TimeStretch.rst index aca8b2cf53e..f0424e2ae7e 100644 --- a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.TimeStretch.rst +++ b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.TimeStretch.rst @@ -8,9 +8,9 @@ mindspore.dataset.audio.TimeStretch .. note:: 待处理音频维度需为(..., freq, time, complex=2)。第0维代表实部,第1维代表虚部。 参数: - - **hop_length** (int, 可选) - STFT窗之间每跳的长度,即连续帧之间的样本数,默认值:None,表示取 `n_freq - 1`。 - - **n_freq** (int, 可选) - STFT中的滤波器组数,默认值:201。 - - **fixed_rate** (float, 可选) - 频谱在时域加快或减缓的比例,默认值:None,表示保持原始速率。 + - **hop_length** (int, 可选) - STFT窗之间每跳的长度,即连续帧之间的样本数。默认值:None,表示取 `n_freq - 1`。 + - **n_freq** (int, 可选) - STFT中的滤波器组数。默认值:201。 + - **fixed_rate** (float, 可选) - 频谱在时域加快或减缓的比例。默认值:None,表示保持原始速率。 异常: - **TypeError** - 当 `hop_length` 的类型不为int。 diff --git a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.TrebleBiquad.rst b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.TrebleBiquad.rst index c1396b3eddf..d6ad13aa79a 100644 --- a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.TrebleBiquad.rst +++ b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.TrebleBiquad.rst @@ -8,5 +8,5 @@ mindspore.dataset.audio.TrebleBiquad 参数: - **sample_rate** (int) - 采样频率(单位:Hz),不能为零。 - **gain** (float) - 期望提升(或衰减)的音频增益(单位:dB)。 - - **central_freq** (float, 可选) - 中心频率(单位:Hz),默认值:3000。 - - **Q** (float, 可选) - `品质因子 `_ ,能够反映带宽与采样频率和中心频率的关系,取值范围为(0, 1],默认值:0.707。 + - **central_freq** (float, 可选) - 中心频率(单位:Hz)。默认值:3000。 + - **Q** (float, 可选) - `品质因子 `_ ,能够反映带宽与采样频率和中心频率的关系,取值范围为(0, 1]。默认值:0.707。 diff --git a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.Vad.rst b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.Vad.rst index 78bbee1bf98..50934323037 100644 --- a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.Vad.rst +++ b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.Vad.rst @@ -7,19 +7,19 @@ mindspore.dataset.audio.Vad 参数: - **sample_rate** (int) - 采样频率(单位:Hz),不能为零。 - - **trigger_level** (float, 可选) - 用于触发活动检测的测量级别,默认值:7.0。 - - **trigger_time** (float, 可选) - 用于帮助忽略短音的时间常数(以秒为单位,默认值:0.25。 - - **search_time** (float, 可选) - 在检测到的触发点之前搜索要包括的更安静/更短声音的音频量(以秒为单位),默认值:1.0。 - - **allowed_gap** (float, 可选) - 包括检测到的触发点之前较短/较短声音之间允许的间隙(以秒为单位),默认值:0.25。 - - **pre_trigger_time** (float, 可选) - 在触发点和任何找到的更安静/更短的声音突发之前,要保留的音频量(以秒为单位),默认值:0.0。 - - **boot_time** (float, 可选) - 初始噪声估计的时间,默认值:0.35。 - - **noise_up_time** (float, 可选) - 当噪音水平增加时,自适应噪音估计器使用的时间常数,默认值:0.1。 - - **noise_down_time** (float, 可选) - 当噪音水平降低时,自适应噪音估计器使用的时间常数,默认值:0.01。 - - **noise_reduction_amount** (float, 可选) - 检测算法中使用的降噪量,默认值:1.35。 - - **measure_freq** (float, 可选) - 算法处理的频率,默认值:20.0。 - - **measure_duration** (float, 可选) - 测量持续时间,默认值:None,使用测量周期的两倍。 - - **measure_smooth_time** (float, 可选) - 用于平滑光谱测量的时间常数,默认值:0.4。 - - **hp_filter_freq** (float, 可选) - 应用于检测器算法输入的高通滤波器的"Brick-wall"频率,默认值:50.0。 - - **lp_filter_freq** (float, 可选) - 应用于检测器算法输入的低通滤波器的"Brick-wall"频率,默认值:6000.0。 - - **hp_lifter_freq** (float, 可选) - 应用于检测器算法输入的高通升降机的"Brick-wall"频率,默认值:150.0。 - - **lp_lifter_freq** (float, 可选) - 应用于检测器算法输入的低通升降机的"Brick-wall"频率,默认值:20000.0。 + - **trigger_level** (float, 可选) - 用于触发活动检测的测量级别。默认值:7.0。 + - **trigger_time** (float, 可选) - 用于帮助忽略短音的时间常数(以秒为单位。默认值:0.25。 + - **search_time** (float, 可选) - 在检测到的触发点之前搜索要包括的更安静/更短声音的音频量(以秒为单位)。默认值:1.0。 + - **allowed_gap** (float, 可选) - 包括检测到的触发点之前较短/较短声音之间允许的间隙(以秒为单位)。默认值:0.25。 + - **pre_trigger_time** (float, 可选) - 在触发点和任何找到的更安静/更短的声音突发之前,要保留的音频量(以秒为单位)。默认值:0.0。 + - **boot_time** (float, 可选) - 初始噪声估计的时间。默认值:0.35。 + - **noise_up_time** (float, 可选) - 当噪音水平增加时,自适应噪音估计器使用的时间常数。默认值:0.1。 + - **noise_down_time** (float, 可选) - 当噪音水平降低时,自适应噪音估计器使用的时间常数。默认值:0.01。 + - **noise_reduction_amount** (float, 可选) - 检测算法中使用的降噪量。默认值:1.35。 + - **measure_freq** (float, 可选) - 算法处理的频率。默认值:20.0。 + - **measure_duration** (float, 可选) - 测量持续时间。默认值:None,使用测量周期的两倍。 + - **measure_smooth_time** (float, 可选) - 用于平滑光谱测量的时间常数。默认值:0.4。 + - **hp_filter_freq** (float, 可选) - 应用于检测器算法输入的高通滤波器的"Brick-wall"频率。默认值:50.0。 + - **lp_filter_freq** (float, 可选) - 应用于检测器算法输入的低通滤波器的"Brick-wall"频率。默认值:6000.0。 + - **hp_lifter_freq** (float, 可选) - 应用于检测器算法输入的高通升降机的"Brick-wall"频率。默认值:150.0。 + - **lp_lifter_freq** (float, 可选) - 应用于检测器算法输入的低通升降机的"Brick-wall"频率。默认值:20000.0。 diff --git a/docs/api/api_python/dataset_text/mindspore.dataset.text.JiebaTokenizer.rst b/docs/api/api_python/dataset_text/mindspore.dataset.text.JiebaTokenizer.rst index e12cb80a31c..2dfa21c8dc2 100644 --- a/docs/api/api_python/dataset_text/mindspore.dataset.text.JiebaTokenizer.rst +++ b/docs/api/api_python/dataset_text/mindspore.dataset.text.JiebaTokenizer.rst @@ -18,7 +18,7 @@ mindspore.dataset.text.JiebaTokenizer - **JiebaMode.HMM**:使用隐马尔可夫模型算法进行分词。 - **JiebaMode.MIX**:使用隐式马尔科夫模型分词算法和最大概率法分词算法混合进行分词。 - - **with_offsets** (bool, 可选) - 是否输出标记(token)的偏移量,默认值:False。 + - **with_offsets** (bool, 可选) - 是否输出标记(token)的偏移量。默认值:False。 异常: - **ValueError** - 没有提供参数 `hmm_path` 或为None。 diff --git a/docs/api/api_python/dataset_text/mindspore.dataset.text.Lookup.rst b/docs/api/api_python/dataset_text/mindspore.dataset.text.Lookup.rst index cbe0467812e..86d199c2297 100644 --- a/docs/api/api_python/dataset_text/mindspore.dataset.text.Lookup.rst +++ b/docs/api/api_python/dataset_text/mindspore.dataset.text.Lookup.rst @@ -10,7 +10,7 @@ mindspore.dataset.text.Lookup - **unknown_token** (str, 可选) - 备用词汇,用于要查找的单词不在词汇表时进行替换。 如果单词不在词汇表中,则查找结果将替换为 `unknown_token` 的值。 如果单词不在词汇表中,且未指定 `unknown_token` ,将抛出运行时错误。默认值:None,不指定该参数。 - - **data_type** (mindspore.dtype, 可选) - Lookup输出的数据类型,默认值:mindspore.int32。 + - **data_type** (mindspore.dtype, 可选) - Lookup输出的数据类型。默认值:mindspore.int32。 异常: - **TypeError** - 参数 `vocab` 类型不为 :class:`mindspore.dataset.text.Vocab` 。 diff --git a/docs/api/api_python/dataset_text/mindspore.dataset.text.NormalizeUTF8.rst b/docs/api/api_python/dataset_text/mindspore.dataset.text.NormalizeUTF8.rst index d104d84b183..e5ec00e7a02 100644 --- a/docs/api/api_python/dataset_text/mindspore.dataset.text.NormalizeUTF8.rst +++ b/docs/api/api_python/dataset_text/mindspore.dataset.text.NormalizeUTF8.rst @@ -10,7 +10,7 @@ mindspore.dataset.text.NormalizeUTF8 参数: - **normalize_form** (NormalizeForm, 可选) - 指定不同的规范化形式,可以取值为 NormalizeForm.NONE, NormalizeForm.NFC, NormalizeForm.NFKC、NormalizeForm.NFD、NormalizeForm.NFKD此四种unicode中的 - 任何一种形式,默认值:NormalizeForm.NFKC。 + 任何一种形式。默认值:NormalizeForm.NFKC。 - NormalizeForm.NONE,对输入字符串不做任何处理。 - NormalizeForm.NFC,对输入字符串进行C形式规范化。 diff --git a/docs/api/api_python/dataset_text/mindspore.dataset.text.RegexTokenizer.rst b/docs/api/api_python/dataset_text/mindspore.dataset.text.RegexTokenizer.rst index 8b5a94ba7cc..b6a9b2e1c9d 100644 --- a/docs/api/api_python/dataset_text/mindspore.dataset.text.RegexTokenizer.rst +++ b/docs/api/api_python/dataset_text/mindspore.dataset.text.RegexTokenizer.rst @@ -13,7 +13,7 @@ mindspore.dataset.text.RegexTokenizer - **delim_pattern** (str) - 以正则表达式表示的分隔符,字符串将被正则匹配的分隔符分割。 - **keep_delim_pattern** (str, 可选) - 如果被 `delim_pattern` 匹配的字符串也能被 `keep_delim_pattern` 匹配,就可以此分隔符作为标记(token)保存。 默认值:''(空字符),即分隔符不会作为输出标记保留。 - - **with_offsets** (bool, 可选) - 是否输出分词标记(token)的偏移量,默认值:False,不输出。 + - **with_offsets** (bool, 可选) - 是否输出分词标记(token)的偏移量。默认值:False,不输出。 异常: - **TypeError** - 参数 `delim_pattern` 的类型不是str。 diff --git a/docs/api/api_python/dataset_text/mindspore.dataset.text.SentencePieceVocab.rst b/docs/api/api_python/dataset_text/mindspore.dataset.text.SentencePieceVocab.rst index d1a75f6b503..6a6250e8f18 100644 --- a/docs/api/api_python/dataset_text/mindspore.dataset.text.SentencePieceVocab.rst +++ b/docs/api/api_python/dataset_text/mindspore.dataset.text.SentencePieceVocab.rst @@ -13,8 +13,8 @@ - **dataset** (Dataset) - 表示用于构建SentencePiece对象的数据集。 - **col_names** (list) - 表示列名称的列表。 - **vocab_size** (int) - 表示词汇大小。 - - **character_coverage** (float) - 表示模型涵盖的字符数量。推荐的默认值为:0.9995,适用于具有丰富字符集的语言,如日文或中文,1.0适用于具有小字符集的其他语言。 - - **model_type** (SentencePieceModel) - 其值可以是SentencePieceModel.UNIGRAM、SentencePieceModel.BPE、SentencePieceModel.CHAR或SentencePieceModel.WORD,默认值:SentencePieceModel.UNIgram。使用SentencePieceModel.WORD类型时,必须预先标记输入句子。 + - **character_coverage** (float) - 表示模型涵盖的字符数量。推荐的默认值:0.9995,适用于具有丰富字符集的语言,如日文或中文,1.0适用于具有小字符集的其他语言。 + - **model_type** (SentencePieceModel) - 其值可以是SentencePieceModel.UNIGRAM、SentencePieceModel.BPE、SentencePieceModel.CHAR或SentencePieceModel.WORD。默认值:SentencePieceModel.UNIgram。使用SentencePieceModel.WORD类型时,必须预先标记输入句子。 - SentencePieceModel.UNIGRAM:Unigram语言模型意味着句子中的下一个单词被假定为独立于模型生成的前一个单词。 - SentencePieceModel.BPE:指字节对编码算法,它取代了最频繁的对句子中的字节数,其中包含一个未使用的字节。 @@ -33,8 +33,8 @@ 参数: - **file_path** (list) - 表示包含SentencePiece文件路径的一个列表。 - **vocab_size** (int) - 表示词汇大小。 - - **character_coverage** (float) - 表示模型涵盖的字符数量。推荐的默认值为:0.9995,适用于具有丰富字符集的语言,如日文或中文,1.0适用于具有小字符集的其他语言。 - - **model_type** (SentencePieceModel) - 其值可以是SentencePieceModel.UNIGRAM、SentencePieceModel.BPE、SentencePieceModel.CHAR或SentencePieceModel.WORD,默认值为SentencePieceModel.UNIgram。使用SentencePieceModel.WORD类型时,必须预先标记输入句子。 + - **character_coverage** (float) - 表示模型涵盖的字符数量。推荐的默认值:0.9995,适用于具有丰富字符集的语言,如日文或中文,1.0适用于具有小字符集的其他语言。 + - **model_type** (SentencePieceModel) - 其值可以是SentencePieceModel.UNIGRAM、SentencePieceModel.BPE、SentencePieceModel.CHAR或SentencePieceModel.WORD。默认值:SentencePieceModel.UNIgram。使用SentencePieceModel.WORD类型时,必须预先标记输入句子。 - SentencePieceModel.UNIGRAM:Unigram语言模型意味着句子中的下一个单词被假定为独立于模型生成的前一个单词。 - SentencePieceModel.BPE:指字节对编码算法,它取代了最频繁的对句子中的字节数,其中包含一个未使用的字节。 diff --git a/docs/api/api_python/dataset_text/mindspore.dataset.text.SlidingWindow.rst b/docs/api/api_python/dataset_text/mindspore.dataset.text.SlidingWindow.rst index 44957313729..13a2d064444 100644 --- a/docs/api/api_python/dataset_text/mindspore.dataset.text.SlidingWindow.rst +++ b/docs/api/api_python/dataset_text/mindspore.dataset.text.SlidingWindow.rst @@ -7,7 +7,7 @@ mindspore.dataset.text.SlidingWindow 参数: - **width** (int) - 窗口的宽度,它必须是整数并且大于零。 - - **axis** (int, 可选) - 计算滑动窗口的轴,默认值:0。 + - **axis** (int, 可选) - 计算滑动窗口的轴。默认值:0。 异常: - **TypeError** - 参数 `width` 的类型不为int。 diff --git a/docs/api/api_python/dataset_text/mindspore.dataset.text.ToVectors.rst b/docs/api/api_python/dataset_text/mindspore.dataset.text.ToVectors.rst index 14647afaab9..303c5af2f0d 100644 --- a/docs/api/api_python/dataset_text/mindspore.dataset.text.ToVectors.rst +++ b/docs/api/api_python/dataset_text/mindspore.dataset.text.ToVectors.rst @@ -7,7 +7,7 @@ mindspore.dataset.text.ToVectors 参数: - **vectors** (Vectors) - 向量对象。 - - **unk_init** (sequence, 可选) - 用于初始化向量外(OOV)令牌的序列,默认值:None,用零向量初始化。 + - **unk_init** (sequence, 可选) - 用于初始化向量外(OOV)令牌的序列。默认值:None,用零向量初始化。 - **lower_case_backup** (bool, 可选) - 是否查找小写的token。如果为False,则将查找原始大小写中的每个token。 如果为True,则将首先查找原始大小写中的每个token,如果在属性stoi(字符->索引映射)的键中找不到,则将查找小写中的token。默认值:False。 diff --git a/docs/api/api_python/dataset_text/mindspore.dataset.text.UnicodeCharTokenizer.rst b/docs/api/api_python/dataset_text/mindspore.dataset.text.UnicodeCharTokenizer.rst index 7cd0b4dbaca..8b765673164 100644 --- a/docs/api/api_python/dataset_text/mindspore.dataset.text.UnicodeCharTokenizer.rst +++ b/docs/api/api_python/dataset_text/mindspore.dataset.text.UnicodeCharTokenizer.rst @@ -6,7 +6,7 @@ mindspore.dataset.text.UnicodeCharTokenizer 使用Unicode分词器将字符串分词为Unicode字符。 参数: - - **with_offsets** (bool, 可选) - 是否输出标记(token)的偏移量,默认值:False。 + - **with_offsets** (bool, 可选) - 是否输出标记(token)的偏移量。默认值:False。 异常: - **TypeError** - 参数 `with_offsets` 的类型不为bool。 diff --git a/docs/api/api_python/dataset_text/mindspore.dataset.text.UnicodeScriptTokenizer.rst b/docs/api/api_python/dataset_text/mindspore.dataset.text.UnicodeScriptTokenizer.rst index 4bb01263e76..8e37b6b36fa 100644 --- a/docs/api/api_python/dataset_text/mindspore.dataset.text.UnicodeScriptTokenizer.rst +++ b/docs/api/api_python/dataset_text/mindspore.dataset.text.UnicodeScriptTokenizer.rst @@ -8,8 +8,8 @@ mindspore.dataset.text.UnicodeScriptTokenizer .. note:: Windows平台尚不支持 `UnicodeScriptTokenizer` 。 参数: - - **keep_whitespace** (bool, 可选) - 是否输出空白标记(token),默认值:False。 - - **with_offsets** (bool, 可选) - 是否输出标记(token)的偏移量,默认值:False。 + - **keep_whitespace** (bool, 可选) - 是否输出空白标记(token)。默认值:False。 + - **with_offsets** (bool, 可选) - 是否输出标记(token)的偏移量。默认值:False。 异常: - **TypeError** - 参数 `keep_whitespace` 的类型不为bool。 diff --git a/docs/api/api_python/dataset_text/mindspore.dataset.text.Vocab.rst b/docs/api/api_python/dataset_text/mindspore.dataset.text.Vocab.rst index 4a9e1179a9c..e775f17b55a 100644 --- a/docs/api/api_python/dataset_text/mindspore.dataset.text.Vocab.rst +++ b/docs/api/api_python/dataset_text/mindspore.dataset.text.Vocab.rst @@ -16,9 +16,9 @@ 参数: - **dataset** (Dataset) - 表示要从中构建vocab的数据集。 - - **columns** (list[str],可选) - 表示要从中获取单词的列名。它可以是列名的列表,默认值:None。 - - **freq_range** (tuple,可选) - 表示整数元组(min_frequency,max_frequency)。频率范围内的单词将被保留。0 <= min_frequency <= max_frequency <= total_words。min_frequency=0等同于min_frequency=1。max_frequency > total_words等同于max_frequency = total_words。min_frequency和max_frequency可以为None,分别对应于0和total_words,默认值:None。 - - **top_k** (int,可选) - `top_k` 大于0。要在vocab中 `top_k` 建立的单词数量表示取用最频繁的单词。 `top_k` 在 `freq_range` 之后取用。如果没有足够的 `top_k` ,所有单词都将被取用,默认值:None。 + - **columns** (list[str],可选) - 表示要从中获取单词的列名。它可以是列名的列表。默认值:None。 + - **freq_range** (tuple,可选) - 表示整数元组(min_frequency,max_frequency)。频率范围内的单词将被保留。0 <= min_frequency <= max_frequency <= total_words。min_frequency=0等同于min_frequency=1。max_frequency > total_words等同于max_frequency = total_words。min_frequency和max_frequency可以为None,分别对应于0和total_words。默认值:None。 + - **top_k** (int,可选) - `top_k` 大于0。要在vocab中 `top_k` 建立的单词数量表示取用最频繁的单词。 `top_k` 在 `freq_range` 之后取用。如果没有足够的 `top_k` ,所有单词都将被取用。默认值:None。 - **special_tokens** (list,可选) - 特殊分词列表,如常用的""、""等。默认值:None,表示不添加特殊分词(token)。 - **special_first** (bool,可选) - 表示是否将 `special_tokens` 中的特殊分词添加到词典的最前面。如果为True则将 `special_tokens` 添加到词典的最前,否则添加到词典的最后。默认值:True。 @@ -41,8 +41,8 @@ 参数: - **file_path** (str) - 表示包含vocab文件路径的一个列表。 - - **delimiter** (str,可选) - 表示用来分隔文件中每一行的分隔符。第一个元素被视为单词,默认值:""。 - - **vocab_size** (int,可选) - 表示要从 `file_path` 读取的字数,默认值:None,表示读取所有的字。 + - **delimiter** (str,可选) - 表示用来分隔文件中每一行的分隔符。第一个元素被视为单词。默认值:""。 + - **vocab_size** (int,可选) - 表示要从 `file_path` 读取的字数。默认值:None,表示读取所有的字。 - **special_tokens** (list,可选) - 特殊分词列表,如常用的""、""等。默认值:None,表示不添加特殊分词(token)。 - **special_first** (bool,可选) - 表示是否将 `special_tokens` 中的特殊分词添加到词典的最前面。如果为True则将 `special_tokens` 添加到词典的最前,否则添加到词典的最后。默认值:True。 diff --git a/docs/api/api_python/dataset_text/mindspore.dataset.text.WhitespaceTokenizer.rst b/docs/api/api_python/dataset_text/mindspore.dataset.text.WhitespaceTokenizer.rst index 8ba20303c92..86870c398c9 100644 --- a/docs/api/api_python/dataset_text/mindspore.dataset.text.WhitespaceTokenizer.rst +++ b/docs/api/api_python/dataset_text/mindspore.dataset.text.WhitespaceTokenizer.rst @@ -8,7 +8,7 @@ mindspore.dataset.text.WhitespaceTokenizer .. note:: Windows平台尚不支持 `WhitespaceTokenizer` 。 参数: - - **with_offsets** (bool, 可选) - 是否输出标记(token)的偏移量,默认值:False。 + - **with_offsets** (bool, 可选) - 是否输出标记(token)的偏移量。默认值:False。 异常: - **TypeError** - 参数 `with_offsets` 的类型不为bool。 diff --git a/docs/api/api_python/dataset_text/mindspore.dataset.text.to_bytes.rst b/docs/api/api_python/dataset_text/mindspore.dataset.text.to_bytes.rst index 017581ff6ac..e1095c64531 100644 --- a/docs/api/api_python/dataset_text/mindspore.dataset.text.to_bytes.rst +++ b/docs/api/api_python/dataset_text/mindspore.dataset.text.to_bytes.rst @@ -7,7 +7,7 @@ 参数: - **array** (numpy.ndarray) - 表示 `string` 类型的数组,代表字符串。 - - **encoding** (str) - 表示用于编码的字符集,默认值:'utf8'。 + - **encoding** (str) - 表示用于编码的字符集。默认值:'utf8'。 返回: numpy.ndarray,表示 `bytes` 的NumPy数组。 diff --git a/docs/api/api_python/dataset_text/mindspore.dataset.text.to_str.rst b/docs/api/api_python/dataset_text/mindspore.dataset.text.to_str.rst index f6dc0d9d780..d52552dd21b 100644 --- a/docs/api/api_python/dataset_text/mindspore.dataset.text.to_str.rst +++ b/docs/api/api_python/dataset_text/mindspore.dataset.text.to_str.rst @@ -7,7 +7,7 @@ 参数: - **array** (numpy.ndarray) - 表示 `bytes` 类型的数组,代表字符串。 - - **encoding** (str) - 表示用于解码的字符集,默认值:'utf8'。 + - **encoding** (str) - 表示用于解码的字符集。默认值:'utf8'。 返回: numpy.ndarray,表示 `str` 的NumPy数组。 diff --git a/docs/api/api_python/dataset_transforms/mindspore.dataset.transforms.Concatenate.rst b/docs/api/api_python/dataset_transforms/mindspore.dataset.transforms.Concatenate.rst index 986c2524154..cec095004fe 100644 --- a/docs/api/api_python/dataset_transforms/mindspore.dataset.transforms.Concatenate.rst +++ b/docs/api/api_python/dataset_transforms/mindspore.dataset.transforms.Concatenate.rst @@ -6,9 +6,9 @@ mindspore.dataset.transforms.Concatenate 在Tensor的某一个轴上进行元素拼接,目前仅支持拼接形状为1D的Tensor。 参数: - - **axis** (int, 可选) - 指定一个轴用于拼接Tensor,默认值:0。 - - **prepend** (numpy.ndarray, 可选) - 指定拼接在最前面的Tensor,默认值:None,不指定。 - - **append** (numpy.ndarray, 可选) - 指定拼接在最后面的Tensor,默认值:None,不指定。 + - **axis** (int, 可选) - 指定一个轴用于拼接Tensor。默认值:0。 + - **prepend** (numpy.ndarray, 可选) - 指定拼接在最前面的Tensor。默认值:None,不指定。 + - **append** (numpy.ndarray, 可选) - 指定拼接在最后面的Tensor。默认值:None,不指定。 异常: - **TypeError** - 参数 `axis` 的类型不为int。 diff --git a/docs/api/api_python/dataset_transforms/mindspore.dataset.transforms.Mask.rst b/docs/api/api_python/dataset_transforms/mindspore.dataset.transforms.Mask.rst index afc0ca221a4..89a170a1bf5 100644 --- a/docs/api/api_python/dataset_transforms/mindspore.dataset.transforms.Mask.rst +++ b/docs/api/api_python/dataset_transforms/mindspore.dataset.transforms.Mask.rst @@ -8,7 +8,7 @@ mindspore.dataset.transforms.Mask 参数: - **operator** (:class:`mindspore.dataset.transforms.Relational`) - 关系操作符,可以取值为Relational.EQ、Relational.NE、Relational.LT、Relational.GT、Relational.LE、Relational.GE。以Relational.EQ为例,将找出Tensor中与 `constant` 相等的元素。 - **constant** (Union[str, int, float, bool]) - 与输入Tensor进行比较的基准值。 - - **dtype** (:class:`mindspore.dtype`, 可选) - 生成的掩码Tensor的数据类型,默认值::class:`mindspore.dtype.bool_` 。 + - **dtype** (:class:`mindspore.dtype`, 可选) - 生成的掩码Tensor的数据类型。默认值::class:`mindspore.dtype.bool_` 。 异常: - **TypeError** - 参数 `operator` 类型不为 :class:`mindspore.dataset.transforms.Relational` 。 diff --git a/docs/api/api_python/dataset_transforms/mindspore.dataset.transforms.OneHot.rst b/docs/api/api_python/dataset_transforms/mindspore.dataset.transforms.OneHot.rst index 9e95a7d6f01..951e080f115 100644 --- a/docs/api/api_python/dataset_transforms/mindspore.dataset.transforms.OneHot.rst +++ b/docs/api/api_python/dataset_transforms/mindspore.dataset.transforms.OneHot.rst @@ -7,7 +7,7 @@ mindspore.dataset.transforms.OneHot 参数: - **num_classes** (int) - 数据集的类别数,它应该大于数据集中最大的label编号。 - - **smoothing_rate** (float,可选) - 标签平滑的系数,默认值:0.0。 + - **smoothing_rate** (float,可选) - 标签平滑的系数。默认值:0.0。 异常: - **TypeError** - 参数 `num_classes` 类型不为int。 diff --git a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.AdjustGamma.rst b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.AdjustGamma.rst index 8f71ca15317..e9047cc78e1 100644 --- a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.AdjustGamma.rst +++ b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.AdjustGamma.rst @@ -12,7 +12,7 @@ mindspore.dataset.vision.AdjustGamma 参数: - **gamma** (float) - 输出图像像素值与输入图像像素值呈指数相关。 `gamma` 大于1使阴影更暗,而 `gamma` 小于1使黑暗区域更亮。 - - **gain** (float, 可选) - 常数乘数,默认值:1.0。 + - **gain** (float, 可选) - 常数乘数。默认值:1.0。 异常: - **TypeError** - 如果 `gain` 不是浮点类型。 diff --git a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.AutoAugment.rst b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.AutoAugment.rst index 580a47d51c2..19cf44f9209 100644 --- a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.AutoAugment.rst +++ b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.AutoAugment.rst @@ -7,7 +7,7 @@ mindspore.dataset.vision.AutoAugment 此操作仅适用于3通道RGB图像。 参数: - - **policy** (AutoAugmentPolicy, 可选) - 在不同数据集上学习的AutoAugment策略,默认值:AutoAugmentPolicy.IMAGENET。 + - **policy** (AutoAugmentPolicy, 可选) - 在不同数据集上学习的AutoAugment策略。默认值:AutoAugmentPolicy.IMAGENET。 可以是[AutoAugmentPolicy.IMAGENET, AutoAugmentPolicy.CIFAR10, AutoAugmentPolicy.SVHN]中的任何一个。 @@ -15,7 +15,7 @@ mindspore.dataset.vision.AutoAugment - **AutoAugmentPolicy.CIFAR10**:表示应用在Cifar10数据集上学习的AutoAugment。 - **AutoAugmentPolicy.SVHN**:表示应用在SVHN数据集上学习的AutoAugment。 - - **interpolation** (Inter, 可选) - 图像插值方式,默认值:Inter.NEAREST。 + - **interpolation** (Inter, 可选) - 图像插值方式。默认值:Inter.NEAREST。 可以是[Inter.NEAREST, Inter.BILINEAR, Inter.BICUBIC, Inter.AREA]中的任何一个。 @@ -26,7 +26,7 @@ mindspore.dataset.vision.AutoAugment - **fill_value** (Union[int, tuple[int]], 可选) - 填充的像素值。 如果是3元素元组,则分别用于填充R、G、B通道。 - 如果是整数,则用于所有 RGB 通道。 `fill_value` 值必须在 [0, 255] 范围内,默认值:0。 + 如果是整数,则用于所有 RGB 通道。 `fill_value` 值必须在 [0, 255] 范围内。默认值:0。 异常: - **TypeError** - 如果 `policy` 不是 :class:`mindspore.dataset.vision.AutoAugmentPolicy` 类型。 diff --git a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.AutoContrast.rst b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.AutoContrast.rst index 8e3ae2af868..ca2f8ed2628 100644 --- a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.AutoContrast.rst +++ b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.AutoContrast.rst @@ -6,8 +6,8 @@ mindspore.dataset.vision.AutoContrast 在输入图像上应用自动对比度。首先计算图像的直方图,将直方图中最亮像素的值映射为255,将直方图中最暗像素的值映射为0。 参数: - - **cutoff** (float, 可选) - 输入图像直方图中最亮和最暗像素的百分比。该值必须在 [0.0, 50.0) 范围内,默认值:0.0。 - - **ignore** (Union[int, sequence], 可选) - 要忽略的背景像素值,忽略值必须在 [0, 255] 范围内,默认值:None。 + - **cutoff** (float, 可选) - 输入图像直方图中最亮和最暗像素的百分比。该值必须在 [0.0, 50.0) 范围内。默认值:0.0。 + - **ignore** (Union[int, sequence], 可选) - 要忽略的背景像素值,忽略值必须在 [0, 255] 范围内。默认值:None。 异常: - **TypeError** - 如果 `cutoff` 不是float类型。 diff --git a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.BoundingBoxAugment.rst b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.BoundingBoxAugment.rst index 3b6a476980a..aee704a16e5 100644 --- a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.BoundingBoxAugment.rst +++ b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.BoundingBoxAugment.rst @@ -7,7 +7,7 @@ mindspore.dataset.vision.BoundingBoxAugment 参数: - **transform** (TensorOperation) - 对图像的随机标注边界框区域应用的变换处理。 - - **ratio** (float, 可选) - 要应用变换的边界框的比例。范围:[0.0, 1.0],默认值:0.3。 + - **ratio** (float, 可选) - 要应用变换的边界框的比例。范围:[0.0, 1.0]。默认值:0.3。 异常: - **TypeError** - 如果 `transform` 不是 :class:`mindspore.dataset.vision.transforms` 模块中的图像变换处理。 diff --git a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.CutMixBatch.rst b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.CutMixBatch.rst index abca2c9e06c..dd37ed63fe1 100644 --- a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.CutMixBatch.rst +++ b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.CutMixBatch.rst @@ -8,8 +8,8 @@ mindspore.dataset.vision.CutMixBatch 参数: - **image_batch_format** (ImageBatchFormat) - 图像批处理输出格式。可以是 [ImageBatchFormat.NHWC、ImageBatchFormat.NCHW] 中的任何一个。 - - **alpha** (float, 可选) - β分布的超参数,必须大于0,默认值:1.0。 - - **prob** (float, 可选) - 对每个图像应用剪切混合处理的概率,取值范围:[0.0, 1.0],默认值:1.0。 + - **alpha** (float, 可选) - β分布的超参数,必须大于0。默认值:1.0。 + - **prob** (float, 可选) - 对每个图像应用剪切混合处理的概率,取值范围:[0.0, 1.0]。默认值:1.0。 异常: - **TypeError** - 如果 `image_batch_format` 不是 :class:`mindspore.dataset.vision.ImageBatchFormat` 的类型。 diff --git a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.CutOut.rst b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.CutOut.rst index 125497bb215..124f8e3bad3 100644 --- a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.CutOut.rst +++ b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.CutOut.rst @@ -7,7 +7,7 @@ mindspore.dataset.vision.CutOut 参数: - **length** (int) - 每个正方形区域的边长,必须大于 0。 - - **num_patches** (int, 可选) - 要从图像中切出的正方形区域数,必须大于0,默认值:1。 + - **num_patches** (int, 可选) - 要从图像中切出的正方形区域数,必须大于0。默认值:1。 - **is_hwc** (bool, 可选) - 表示输入图像是否为HWC格式,True为HWC格式,False为CHW格式。默认值:True。 异常: diff --git a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.GaussianBlur.rst b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.GaussianBlur.rst index 3b3293fafdf..35333efe3f5 100644 --- a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.GaussianBlur.rst +++ b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.GaussianBlur.rst @@ -9,7 +9,7 @@ mindspore.dataset.vision.GaussianBlur - **kernel_size** (Union[int, Sequence[int]]) - 要使用的高斯核的大小。该值必须是正数和奇数。 如果只提供一个整数,高斯核大小将为 (kernel_size, kernel_size)。 如果提供了整数序列,则它必须是表示(宽度、高度)的 2 个值的序列。 - - **sigma** (Union[float, Sequence[float]], 可选) - 要使用的高斯核的标准差,该值必须是正数,默认值:None。 + - **sigma** (Union[float, Sequence[float]], 可选) - 要使用的高斯核的标准差,该值必须是正数。默认值:None。 如果仅提供浮点数,则 `sigma` 将为 (sigma, sigma)。 如果提供了浮点序列,则它必须是表示(宽度、高度)的 2 个值的序列。 如果为None, `sigma` 采用的值为 ((kernel\underline{} size - 1) * 0.5 - 1) * 0.3 + 0.8。 diff --git a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.MixUpBatch.rst b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.MixUpBatch.rst index 410ff516467..2cc35940074 100644 --- a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.MixUpBatch.rst +++ b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.MixUpBatch.rst @@ -10,7 +10,7 @@ mindspore.dataset.vision.MixUpBatch 请注意,在调用此处理之前,您需要将标注制作成 one-hot 格式并进行batch操作。 参数: - - **alpha** (float, 可选) - β分布的超参数,该值必须为正,默认值:1.0。 + - **alpha** (float, 可选) - β分布的超参数,该值必须为正。默认值:1.0。 异常: - **TypeError** - 如果 `alpha` 不是float类型。 diff --git a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.NormalizePad.rst b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.NormalizePad.rst index 1ac87a34e7e..8dcca1d23bf 100644 --- a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.NormalizePad.rst +++ b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.NormalizePad.rst @@ -8,7 +8,7 @@ mindspore.dataset.vision.NormalizePad 参数: - **mean** (sequence) - 图像每个通道的均值组成的列表或元组。平均值必须在 (0.0, 255.0] 范围内。 - **std** (sequence) - 图像每个通道的标准差组成的列表或元组。标准差值必须在 (0.0, 255.0] 范围内。 - - **dtype** (str, 可选) - 输出图像的数据类型,默认值:"float32"。 + - **dtype** (str, 可选) - 输出图像的数据类型。默认值:"float32"。 - **is_hwc** (bool, 可选) - 表示输入图像是否为HWC格式,True为HWC格式,False为CHW格式。默认值:True。 异常: diff --git a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.Pad.rst b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.Pad.rst index b96f0fa84f0..d8fbaf4a0db 100644 --- a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.Pad.rst +++ b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.Pad.rst @@ -14,8 +14,8 @@ mindspore.dataset.vision.Pad - **fill_value** (Union[int, tuple[int]], 可选) - 填充的像素值,仅在 `padding_mode` 取值为Border.CONSTANT时有效。 如果是3元素元组,则分别用于填充R、G、B通道。 如果是整数,则用于所有 RGB 通道。 - `fill_value` 值必须在 [0, 255] 范围内,默认值:0。 - - **padding_mode** (Border, 可选) - 边界填充方式。可以是 [Border.CONSTANT、Border.EDGE、Border.REFLECT、Border.SYMMETRIC] 中的任何一个,默认值:Border.CONSTANT。 + `fill_value` 值必须在 [0, 255] 范围内。默认值:0。 + - **padding_mode** (Border, 可选) - 边界填充方式。可以是 [Border.CONSTANT、Border.EDGE、Border.REFLECT、Border.SYMMETRIC] 中的任何一个。默认值:Border.CONSTANT。 - **Border.CONSTANT** - 使用常量值进行填充。 - **Border.EDGE** - 使用各边的边界像素值进行填充。 diff --git a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.PadToSize.rst b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.PadToSize.rst index 31e63406475..80070a23ee4 100644 --- a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.PadToSize.rst +++ b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.PadToSize.rst @@ -15,8 +15,8 @@ mindspore.dataset.vision.PadToSize - **fill_value** (Union[int, tuple[int, int, int]], 可选) - 填充的像素值,仅在 `padding_mode` 取值为Border.CONSTANT时有效。 如果是3元素元组,则分别用于填充R、G、B通道。 如果是整数,则用于所有 RGB 通道。 - `fill_value` 值必须在 [0, 255] 范围内,默认值:0。 - - **padding_mode** (Border, 可选) - 边界填充方式。可以是 [Border.CONSTANT、Border.EDGE、Border.REFLECT、Border.SYMMETRIC] 中的任何一个,默认值:Border.CONSTANT。 + `fill_value` 值必须在 [0, 255] 范围内。默认值:0。 + - **padding_mode** (Border, 可选) - 边界填充方式。可以是 [Border.CONSTANT、Border.EDGE、Border.REFLECT、Border.SYMMETRIC] 中的任何一个。默认值:Border.CONSTANT。 - **Border.CONSTANT** - 使用常量值进行填充。 - **Border.EDGE** - 使用各边的边界像素值进行填充。 diff --git a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomAffine.rst b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomAffine.rst index 30a5e347383..0502038ac3e 100644 --- a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomAffine.rst +++ b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomAffine.rst @@ -9,25 +9,25 @@ mindspore.dataset.vision.RandomAffine - **degrees** (Union[int, float, sequence]) - 旋转度数的范围。 如果 `degrees` 是一个数字,它代表旋转范围是(-degrees, degrees)。 如果 `degrees` 是一个序列,它代表旋转是 (min, max)。 - - **translate** (sequence, 可选) - 一个序列(tx_min, tx_max, ty_min, ty_max)用于表示水平(tx)方向和垂直(ty)方向的最小/最大平移范围,取值范围 [-1.0, 1.0],默认值:None。 + - **translate** (sequence, 可选) - 一个序列(tx_min, tx_max, ty_min, ty_max)用于表示水平(tx)方向和垂直(ty)方向的最小/最大平移范围,取值范围 [-1.0, 1.0]。默认值:None。 水平和垂直偏移分别从以下范围中随机选择:(tx_min*width, tx_max*width) 和 (ty_min*height, ty_max*height)。 如果 `translate` 是一个包含2个值的元组或列表,则 (translate[0], translate[1]) 表示水平(X)方向的随机平移范围。 如果 `translate` 是一个包含4个值的元组或列表,则 (translate[0], translate[1]) 表示水平(X)方向的随机平移范围,(translate[2], translate[3])表示垂直(Y)方向的随机平移范围。 如果为None,则不对图像进行任何平移。 - - **scale** (sequence, 可选) - 图像的比例因子的随机范围,必须为非负数,使用原始比例,默认值:None。 - - **shear** (Union[float, Sequence[float, float], Sequence[float, float, float, float]], 可选) - 图像的剪切因子的随机范围,必须为正数,默认值:None。 + - **scale** (sequence, 可选) - 图像的比例因子的随机范围,必须为非负数,使用原始比例。默认值:None。 + - **shear** (Union[float, Sequence[float, float], Sequence[float, float, float, float]], 可选) - 图像的剪切因子的随机范围,必须为正数。默认值:None。 如果是数字,则应用在 (-shear, +shear) 范围内平行于 X 轴的剪切。 如果 `shear` 是一个包含2个值的元组或列表,则在 (shear[0],shear[1]) 范围内进行水平(X)方向的剪切变换。 如果 `shear` 是一个包含4个值的元组或列表,则在 (shear[0],shear[1]) 范围内进行水平(X)方向的剪切变换,并在(shear[2], shear[3])范围内进行垂直(Y)方向的剪切变换。 如果为None,则不应用任何剪切。 - - **resample** (Inter, 可选) - 图像插值方式。它可以是 [Inter.BILINEAR、Inter.NEAREST、Inter.BICUBIC、Inter.AREA] 中的任何一个,默认值:Inter.NEAREST。 + - **resample** (Inter, 可选) - 图像插值方式。它可以是 [Inter.BILINEAR、Inter.NEAREST、Inter.BICUBIC、Inter.AREA] 中的任何一个。默认值:Inter.NEAREST。 - **Inter.BILINEAR**: 双线性插值。 - **Inter.NEAREST**: 最近邻插值。 - **Inter.BICUBIC**: 双三次插值。 - **Inter.AREA**: 像素区域插值。 - - **fill_value** (Union[int, tuple[int]], 可选) - 用于填充输出图像中变换之外的区域。元组中必须有三个值,取值范围是[0, 255],默认值:0。 + - **fill_value** (Union[int, tuple[int]], 可选) - 用于填充输出图像中变换之外的区域。元组中必须有三个值,取值范围是[0, 255]。默认值:0。 异常: - **TypeError** - 如果 `degrees` 不是int、float或sequence类型。 diff --git a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomAutoContrast.rst b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomAutoContrast.rst index bd6965ac0ff..737d7bb56e2 100644 --- a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomAutoContrast.rst +++ b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomAutoContrast.rst @@ -6,8 +6,8 @@ mindspore.dataset.vision.RandomAutoContrast 以给定的概率自动调整图像的对比度。 参数: - - **cutoff** (float, 可选) - 输入图像直方图中最亮和最暗像素的百分比。该值必须在 [0.0, 50.0) 范围内,默认值:0.0。 - - **ignore** (Union[int, sequence], 可选) - 要忽略的背景像素值,忽略值必须在 [0, 255] 范围内,默认值:None。 + - **cutoff** (float, 可选) - 输入图像直方图中最亮和最暗像素的百分比。该值必须在 [0.0, 50.0) 范围内。默认值:0.0。 + - **ignore** (Union[int, sequence], 可选) - 要忽略的背景像素值,忽略值必须在 [0, 255] 范围内。默认值:None。 - **prob** (float, 可选) - 图像被调整对比度的概率,取值范围:[0.0, 1.0]。默认值:0.5。 异常: diff --git a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomColor.rst b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomColor.rst index fff46a97e8c..e6039af5bf5 100644 --- a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomColor.rst +++ b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomColor.rst @@ -7,7 +7,7 @@ mindspore.dataset.vision.RandomColor 参数: - **degrees** (Sequence[float], 可选) - 色彩调节系数的范围,必须为非负数。它应该是(min, max)格式。 - 如果min与max相等,则代表色彩变化步长固定,默认值:(0.1, 1.9)。 + 如果min与max相等,则代表色彩变化步长固定。默认值:(0.1, 1.9)。 异常: - **TypeError** - 如果 `degrees` 不是Sequence[float]类型。 diff --git a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomColorAdjust.rst b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomColorAdjust.rst index cb0f806be86..834827a09ab 100644 --- a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomColorAdjust.rst +++ b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomColorAdjust.rst @@ -8,16 +8,16 @@ mindspore.dataset.vision.RandomColorAdjust .. note:: 此操作支持通过 Offload 在 Ascend 或 GPU 平台上运行。 参数: - - **brightness** (Union[float, Sequence[float]], 可选) - 亮度调整因子。不能为负,默认值:(1, 1)。 + - **brightness** (Union[float, Sequence[float]], 可选) - 亮度调整因子。不能为负。默认值:(1, 1)。 如果是浮点数,则从 [max(0, 1-brightness), 1+brightness] 范围内统一选择因子。 如果它是一个序列,则代表是范围 [min, max],从此范围中选择调整因子。 - - **contrast** (Union[float, Sequence[float]], 可选) - 对比度调整因子。不能为负,默认值:(1, 1)。 + - **contrast** (Union[float, Sequence[float]], 可选) - 对比度调整因子。不能为负。默认值:(1, 1)。 如果是浮点数,则从 [max(0, 1-contrast), 1+contrast] 范围内统一选择因子。 如果它是一个序列,则代表是范围 [min, max],从此范围中选择调整因子。 - - **saturation** (Union[float, Sequence[float]], 可选) - 饱和度调整因子。不能为负,默认值:(1, 1)。 + - **saturation** (Union[float, Sequence[float]], 可选) - 饱和度调整因子。不能为负。默认值:(1, 1)。 如果是浮点数,则从 [max(0, 1-saturation), 1+saturation] 范围内统一选择因子。 如果它是一个序列,则代表是范围 [min, max],从此范围中选择调整因子。 - - **hue** (Union[float, Sequence[float]], 可选) - 色调调整因子,默认值:(0, 0)。 + - **hue** (Union[float, Sequence[float]], 可选) - 色调调整因子。默认值:(0, 0)。 如果是浮点数,则代表是范围 [-hue, hue],从此范围中选择调整因子。注意 `hue` 取值应为[0, 0.5]。 如果它是一个序列,则代表是范围 [min, max],从此范围中选择调整因子。注意取值范围min和max是 [-0.5, 0.5] 范围内的浮点数,并且min小于等于max。 diff --git a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomCrop.rst b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomCrop.rst index ec4a528c284..fc7db7ba242 100644 --- a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomCrop.rst +++ b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomCrop.rst @@ -11,7 +11,7 @@ mindspore.dataset.vision.RandomCrop - **size** (Union[int, Sequence[int]]) - 裁剪图像的输出尺寸大小。值必须为正。 如果 size 是整数,则返回一个裁剪尺寸大小为 (size, size) 的正方形。 如果 size 是一个长度为 2 的序列,则以2个元素分别为高和宽放缩至(高度, 宽度)大小。 - - **padding** (Union[int, Sequence[int]], 可选) - 图像各边填充的像素数。填充值必须为非负值,默认值:None。 + - **padding** (Union[int, Sequence[int]], 可选) - 图像各边填充的像素数。填充值必须为非负值。默认值:None。 如果 `padding` 不为 None,则首先使用 `padding` 填充图像。 如果 `padding` 是一个整数,代表为图像的所有方向填充该值大小的像素。 如果 `padding` 是一个包含2个值的元组或列表,第一个值会用于填充图像的左侧和右侧,第二个值会用于填充图像的上侧和下侧。 @@ -20,8 +20,8 @@ mindspore.dataset.vision.RandomCrop - **fill_value** (Union[int, tuple[int]], 可选) - 边框的像素强度,仅当 `padding_mode` 为 Border.CONSTANT 时有效。 如果是3元素元组,则分别用于填充R、G、B通道。 如果是整数,则用于所有RGB通道。 - `fill_value` 值必须在 [0, 255] 范围内,默认值:0。 - - **padding_mode** (Border, 可选) - 边界填充方式。它可以是 [Border.CONSTANT、Border.EDGE、Border.REFLECT、Border.SYMMETRIC] 中的任何一个,默认值:Border.CONSTANT。 + `fill_value` 值必须在 [0, 255] 范围内。默认值:0。 + - **padding_mode** (Border, 可选) - 边界填充方式。它可以是 [Border.CONSTANT、Border.EDGE、Border.REFLECT、Border.SYMMETRIC] 中的任何一个。默认值:Border.CONSTANT。 - **Border.CONSTANT** - 使用常量值进行填充。 - **Border.EDGE** - 使用各边的边界像素值进行填充。 diff --git a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomCropDecodeResize.rst b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomCropDecodeResize.rst index 49c02e6a1d3..dd88e7c527e 100644 --- a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomCropDecodeResize.rst +++ b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomCropDecodeResize.rst @@ -9,9 +9,9 @@ mindspore.dataset.vision.RandomCropDecodeResize - **size** (Union[int, Sequence[int]]) - 调整后图像的输出尺寸大小。大小值必须为正。 如果 size 是整数,则返回一个裁剪尺寸大小为 (size, size) 的正方形。 如果 size 是一个长度为 2 的序列,则以2个元素分别为高和宽放缩至(高度, 宽度)大小。 - - **scale** (Union[list, tuple], 可选) - 要裁剪的原始尺寸大小的各个尺寸的范围[min, max),必须为非负数,默认值:(0.08, 1.0)。 - - **ratio** (Union[list, tuple], 可选) - 宽高比的范围 [min, max) 裁剪,必须为非负数,默认值:(3. / 4., 4. / 3.)。 - - **interpolation** (Inter, 可选) - 图像插值方式。它可以是 [Inter.BILINEAR、Inter.NEAREST、Inter.BICUBIC、Inter.AREA、Inter.PILCUBIC] 中的任何一个,默认值:Inter.BILINEAR。 + - **scale** (Union[list, tuple], 可选) - 要裁剪的原始尺寸大小的各个尺寸的范围[min, max),必须为非负数。默认值:(0.08, 1.0)。 + - **ratio** (Union[list, tuple], 可选) - 宽高比的范围 [min, max) 裁剪,必须为非负数。默认值:(3. / 4., 4. / 3.)。 + - **interpolation** (Inter, 可选) - 图像插值方式。它可以是 [Inter.BILINEAR、Inter.NEAREST、Inter.BICUBIC、Inter.AREA、Inter.PILCUBIC] 中的任何一个。默认值:Inter.BILINEAR。 - **Inter.BILINEAR**: 双线性插值。 - **Inter.NEAREST**: 最近邻插值。 @@ -19,7 +19,7 @@ mindspore.dataset.vision.RandomCropDecodeResize - **Inter.AREA**: 像素区域插值。 - **Inter.PILCUBIC**: Pillow库中实现的双三次插值,输入需为3通道格式。 - - **max_attempts** (int, 可选) - 生成随机裁剪位置的最大尝试次数,超过该次数时将使用中心裁剪, `max_attempts` 值必须为正数,默认值:10。 + - **max_attempts** (int, 可选) - 生成随机裁剪位置的最大尝试次数,超过该次数时将使用中心裁剪, `max_attempts` 值必须为正数。默认值:10。 异常: - **TypeError** - 如果 `size` 不是int或Sequence[int]类型。 diff --git a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomCropWithBBox.rst b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomCropWithBBox.rst index dde9d93bbfc..b9c6c0a5430 100644 --- a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomCropWithBBox.rst +++ b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomCropWithBBox.rst @@ -9,7 +9,7 @@ mindspore.dataset.vision.RandomCropWithBBox - **size** (Union[int, Sequence[int]]) - 裁剪图像的输出尺寸大小。大小值必须为正。 如果 size 是整数,则返回一个裁剪尺寸大小为 (size, size) 的正方形。 如果 size 是一个长度为 2 的序列,则以2个元素分别为高和宽放缩至(高度, 宽度)大小。 - - **padding** (Union[int, Sequence[int]], 可选) - 填充图像的像素数。填充值必须非负值,默认值:None。 + - **padding** (Union[int, Sequence[int]], 可选) - 填充图像的像素数。填充值必须非负值。默认值:None。 如果 `padding` 不为 None,则首先使用 `padding` 填充图像。 如果 `padding` 是一个整数,代表为图像的所有方向填充该值大小的像素。 如果 `padding` 是一个包含2个值的元组或列表,第一个值会用于填充图像的左侧和右侧,第二个值会用于填充图像的上侧和下侧。 @@ -18,8 +18,8 @@ mindspore.dataset.vision.RandomCropWithBBox - **fill_value** (Union[int, tuple[int]], 可选) - 边框的像素强度,仅当 `padding_mode` 为 Border.CONSTANT 时有效。 如果是3元素元组,则分别用于填充R、G、B通道。 如果是整数,则用于所有 RGB 通道。 - `fill_value` 值必须在 [0, 255] 范围内,默认值:0。 - - **padding_mode** (Border, 可选) - 边界填充方式。它可以是 [Border.CONSTANT、Border.EDGE、Border.REFLECT、Border.SYMMETRIC] 中的任何一个,默认值:Border.CONSTANT。 + `fill_value` 值必须在 [0, 255] 范围内。默认值:0。 + - **padding_mode** (Border, 可选) - 边界填充方式。它可以是 [Border.CONSTANT、Border.EDGE、Border.REFLECT、Border.SYMMETRIC] 中的任何一个。默认值:Border.CONSTANT。 - **Border.CONSTANT** - 使用常量值进行填充。 - **Border.EDGE** - 使用各边的边界像素值进行填充。 diff --git a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomErasing.rst b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomErasing.rst index d5f063a4fcb..b2c72efe10e 100644 --- a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomErasing.rst +++ b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomErasing.rst @@ -9,10 +9,10 @@ mindspore.dataset.vision.RandomErasing 参数: - **prob** (float,可选) - 执行随机擦除的概率,取值范围:[0.0, 1.0]。默认值:0.5。 - - **scale** (Sequence[float, float],可选) - 擦除区域面积相对原图比例的随机选取范围,按照(min, max)顺序排列,默认值:(0.02, 0.33)。 - - **ratio** (Sequence[float, float],可选) - 擦除区域宽高比的随机选取范围,按照(min, max)顺序排列,默认值:(0.3, 3.3)。 + - **scale** (Sequence[float, float],可选) - 擦除区域面积相对原图比例的随机选取范围,按照(min, max)顺序排列。默认值:(0.02, 0.33)。 + - **ratio** (Sequence[float, float],可选) - 擦除区域宽高比的随机选取范围,按照(min, max)顺序排列。默认值:(0.3, 3.3)。 - **value** (Union[int, str, Sequence[int, int, int]]) - 擦除区域的像素填充值。若输入int,将以该值填充RGB通道;若输入Sequence[int, int, int],将分别用于填充R、G、B通道;若输入字符串'random',将以从标准正态分布获得的随机值擦除各个像素。默认值:0。 - - **inplace** (bool,可选) - 是否直接在原图上执行擦除,默认值:False。 + - **inplace** (bool,可选) - 是否直接在原图上执行擦除。默认值:False。 - **max_attempts** (int,可选) - 生成随机擦除区域的最大尝试次数,超过该次数时将返回原始图像。默认值:10。 异常: diff --git a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomPerspective.rst b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomPerspective.rst index 7d31d28e6a6..7b36712ec22 100644 --- a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomPerspective.rst +++ b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomPerspective.rst @@ -6,7 +6,7 @@ mindspore.dataset.vision.RandomPerspective 按照指定的概率对输入PIL图像进行透视变换。 参数: - - **distortion_scale** (float,可选) - 失真程度,取值范围为[0.0, 1.0],默认值:0.5。 + - **distortion_scale** (float,可选) - 失真程度,取值范围为[0.0, 1.0]。默认值:0.5。 - **prob** (float,可选) - 执行透视变换的概率,取值范围:[0.0, 1.0]。默认值:0.5。 - **interpolation** (Inter,可选) - 插值方式,取值可为 Inter.BILINEAR、Inter.NEAREST 或 Inter.BICUBIC。默认值:Inter.BICUBIC。 diff --git a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomPosterize.rst b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomPosterize.rst index 5f2859ceef3..10a027156f0 100644 --- a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomPosterize.rst +++ b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomPosterize.rst @@ -7,7 +7,7 @@ mindspore.dataset.vision.RandomPosterize 参数: - **bits** (Union[int, Sequence[int]], 可选) - 随机位数压缩的范围。位值必须在 [1,8] 范围内,并且在给定范围内至少包含一个整数值。它必须是 (min, max) 或整数格式。 - 如果min与max相等,那么它是一个单一的位数压缩操作,默认值:(8, 8)。 + 如果min与max相等,那么它是一个单一的位数压缩操作。默认值:(8, 8)。 异常: - **TypeError** - 如果 `bits` 不是int或Sequence[int]类型。 diff --git a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomResizedCrop.rst b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomResizedCrop.rst index af3c1738694..ee333ccc51b 100644 --- a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomResizedCrop.rst +++ b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomResizedCrop.rst @@ -9,9 +9,9 @@ mindspore.dataset.vision.RandomResizedCrop 参数: - **size** (Union[int, Sequence[int]]) - 图像的输出尺寸大小。若输入整型,则放缩至(size, size)大小;若输入2元素序列,则以2个元素分别为高和宽放缩至(高度, 宽度)大小。 - - **scale** (Union[list, tuple], 可选) - 裁剪子图的尺寸大小相对原图比例的随机选取范围,需要在[min, max)区间,默认值:(0.08, 1.0)。 - - **ratio** (Union[list, tuple], 可选) - 裁剪子图的宽高比的随机选取范围,需要在[min, max)区间,默认值:(3./4., 4./3.)。 - - **interpolation** (Inter, 可选) - 插值方式。它可以是 [Inter.BILINEAR, Inter.NEAREST, Inter.BICUBIC, Inter.PILCUBIC] 中的任何一个,默认值:Inter.BILINEAR。 + - **scale** (Union[list, tuple], 可选) - 裁剪子图的尺寸大小相对原图比例的随机选取范围,需要在[min, max)区间。默认值:(0.08, 1.0)。 + - **ratio** (Union[list, tuple], 可选) - 裁剪子图的宽高比的随机选取范围,需要在[min, max)区间。默认值:(3./4., 4./3.)。 + - **interpolation** (Inter, 可选) - 插值方式。它可以是 [Inter.BILINEAR, Inter.NEAREST, Inter.BICUBIC, Inter.PILCUBIC] 中的任何一个。默认值:Inter.BILINEAR。 - Inter.BILINEAR,双线性插值。 - Inter.NEAREST,最近邻插值。 @@ -19,7 +19,7 @@ mindspore.dataset.vision.RandomResizedCrop - Inter.AREA,像素区域插值。 - Inter.PILCUBIC,Pillow库中实现的双三次插值,输入应为3通道格式。 - - **max_attempts** (int, 可选) - 生成随机裁剪位置的最大尝试次数,超过该次数时将使用中心裁剪,默认值:10。 + - **max_attempts** (int, 可选) - 生成随机裁剪位置的最大尝试次数,超过该次数时将使用中心裁剪。默认值:10。 异常: - **TypeError** - 当 `size` 的类型不为int或Sequence[int]。 diff --git a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomResizedCropWithBBox.rst b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomResizedCropWithBBox.rst index 402c759f1e2..d30ec726a34 100644 --- a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomResizedCropWithBBox.rst +++ b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomResizedCropWithBBox.rst @@ -7,15 +7,15 @@ mindspore.dataset.vision.RandomResizedCropWithBBox 参数: - **size** (Union[int, Sequence[int]]) - 图像的输出尺寸大小。若输入整型,则放缩至(size, size)大小;若输入2元素序列,则以2个元素分别为高和宽放缩至(高度, 宽度)大小。 - - **scale** (Union[list, tuple], 可选) - 裁剪子图的尺寸大小相对原图比例的随机选取范围,需要在[min, max)区间,默认值:(0.08, 1.0)。 - - **ratio** (Union[list, tuple], 可选) - 裁剪子图的宽高比的随机选取范围,需要在[min, max)区间,默认值:(3./4., 4./3.)。 - - **interpolation** (Inter, 可选) - 插值方式。它可以是 [Inter.BILINEAR, Inter.NEAREST, Inter.BICUBIC] 中的任何一个,默认值:Inter.BILINEAR。 + - **scale** (Union[list, tuple], 可选) - 裁剪子图的尺寸大小相对原图比例的随机选取范围,需要在[min, max)区间。默认值:(0.08, 1.0)。 + - **ratio** (Union[list, tuple], 可选) - 裁剪子图的宽高比的随机选取范围,需要在[min, max)区间。默认值:(3./4., 4./3.)。 + - **interpolation** (Inter, 可选) - 插值方式。它可以是 [Inter.BILINEAR, Inter.NEAREST, Inter.BICUBIC] 中的任何一个。默认值:Inter.BILINEAR。 - Inter.BILINEAR,双线性插值。 - Inter.NEAREST,最近邻插值。 - Inter.BICUBIC,双三次插值。 - - **max_attempts** (int, 可选) - 生成随机裁剪位置的最大尝试次数,超过该次数时将使用中心裁剪,默认值:10。 + - **max_attempts** (int, 可选) - 生成随机裁剪位置的最大尝试次数,超过该次数时将使用中心裁剪。默认值:10。 异常: - **TypeError** - 当 `size` 的类型不为int或Sequence[int]。 diff --git a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomRotation.rst b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomRotation.rst index a849b5de7de..7005ed41df7 100644 --- a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomRotation.rst +++ b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomRotation.rst @@ -7,7 +7,7 @@ mindspore.dataset.vision.RandomRotation 参数: - **degrees** (Union[int, float, sequence]) - 旋转角度的随机选取范围。若输入单个数字,则从(-degrees, degrees)中随机生成旋转角度;若输入2元素序列,需按(min, max)顺序排列。 - - **resample** (Inter, 可选) - 插值方式。它可以是 [Inter.BILINEAR, Inter.NEAREST, Inter.BICUBIC, Inter.AREA] 中的任何一个,默认值:Inter.NEAREST。 + - **resample** (Inter, 可选) - 插值方式。它可以是 [Inter.BILINEAR, Inter.NEAREST, Inter.BICUBIC, Inter.AREA] 中的任何一个。默认值:Inter.NEAREST。 - Inter.BILINEAR,双线性插值。 - Inter.NEAREST,最近邻插值。 @@ -16,7 +16,7 @@ mindspore.dataset.vision.RandomRotation - **expand** (bool, 可选) - 若为True,将扩展图像尺寸大小使其足以容纳整个旋转图像;若为False,则保持图像尺寸大小不变。请注意,扩展时将假设图像为中心旋转且未进行平移。默认值:False。 - **center** (tuple, 可选) - 可选的旋转中心,以图像左上角为原点,旋转中心的位置按照 (宽度, 高度) 格式指定。默认值:None,表示中心旋转。 - - **fill_value** (Union[int, tuple[int]], 可选) - 旋转图像之外区域的像素填充值。若输入3元素元组,将分别用于填充R、G、B通道;若输入整型,将以该值填充RGB通道。`fill_value` 值必须在 [0, 255] 范围内,默认值:0。 + - **fill_value** (Union[int, tuple[int]], 可选) - 旋转图像之外区域的像素填充值。若输入3元素元组,将分别用于填充R、G、B通道;若输入整型,将以该值填充RGB通道。`fill_value` 值必须在 [0, 255] 范围内。默认值:0。 异常: - **TypeError** - 当 `degrees` 的类型不为int、float或sequence。 diff --git a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomSharpness.rst b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomSharpness.rst index cfbaaa5b131..03facad04d8 100644 --- a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomSharpness.rst +++ b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomSharpness.rst @@ -6,7 +6,7 @@ mindspore.dataset.vision.RandomSharpness 在固定或随机的范围调整输入图像的锐度。度数为0.0时将返回模糊图像;度数为1.0时将返回原始图像;度数为2.0时将返回锐化图像。 参数: - - **degrees** (Union[list, tuple], 可选) - 锐度调节系数的随机选取范围,需为非负数,按照(min, max)顺序排列。如果min与max相等,将使用固定的调节系数进行处理,默认值:(0.1, 1.9)。 + - **degrees** (Union[list, tuple], 可选) - 锐度调节系数的随机选取范围,需为非负数,按照(min, max)顺序排列。如果min与max相等,将使用固定的调节系数进行处理。默认值:(0.1, 1.9)。 异常: - **TypeError** - 如果 `degree` 的类型不为list或tuple。 diff --git a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomSolarize.rst b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomSolarize.rst index 472433e1460..6f55b200daa 100644 --- a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomSolarize.rst +++ b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.RandomSolarize.rst @@ -6,7 +6,7 @@ mindspore.dataset.vision.RandomSolarize 从给定阈值范围内随机选择一个子范围,对位于给定子范围内的像素,将其像素值设置为(255 - 原本像素值)。 参数: - - **threshold** (tuple, 可选) - 随机曝光的阈值范围,默认值:(0, 255)。 `threshold` 输入格式应该为 (min, max),其中min和max是 (0, 255) 范围内的整数,并且min小于等于max。如果min与max相等,则反转所有高于 min(或max) 的像素值。 + - **threshold** (tuple, 可选) - 随机曝光的阈值范围。默认值:(0, 255)。 `threshold` 输入格式应该为 (min, max),其中min和max是 (0, 255) 范围内的整数,并且min小于等于max。如果min与max相等,则反转所有高于 min(或max) 的像素值。 异常: - **TypeError** - 当 `threshold` 的类型不为tuple。 diff --git a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.Resize.rst b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.Resize.rst index bcf680b130b..5dddf05cb7a 100644 --- a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.Resize.rst +++ b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.Resize.rst @@ -7,7 +7,7 @@ mindspore.dataset.vision.Resize 参数: - **size** (Union[int, Sequence[int]]) - 图像的输出尺寸大小。若输入整型,将调整图像的较短边长度为 `size`,且保持图像的宽高比不变;若输入是2元素组成的序列,其输入格式需要是 (高度, 宽度) 。 - - **interpolation** (Inter, 可选) - 图像插值方式。它可以是 [Inter.LINEAR, Inter.NEAREST, Inter.BICUBIC, Inter.PILCUBIC] 中的任何一个,默认值:Inter.LINEAR。 + - **interpolation** (Inter, 可选) - 图像插值方式。它可以是 [Inter.LINEAR, Inter.NEAREST, Inter.BICUBIC, Inter.PILCUBIC] 中的任何一个。默认值:Inter.LINEAR。 - Inter.BILINEAR,双线性插值。 - Inter.LINEAR,双线性插值,同 Inter.BILINEAR 。 diff --git a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.ResizeWithBBox.rst b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.ResizeWithBBox.rst index 30e6717a6a9..176e826fe1d 100644 --- a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.ResizeWithBBox.rst +++ b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.ResizeWithBBox.rst @@ -7,7 +7,7 @@ mindspore.dataset.vision.ResizeWithBBox 参数: - **size** (Union[int, Sequence[int]]) - 图像的输出尺寸大小。若输入整型,将调整图像的较短边长度为 `size`,且保持图像的宽高比不变;若输入是2元素组成的序列,其输入格式需要是 (高度, 宽度) 。 - - **interpolation** (Inter, 可选) - 图像插值方式。它可以是 [Inter.LINEAR, Inter.NEAREST, Inter.BICUBIC, Inter.PILCUBIC] 中的任何一个,默认值:Inter.LINEAR。 + - **interpolation** (Inter, 可选) - 图像插值方式。它可以是 [Inter.LINEAR, Inter.NEAREST, Inter.BICUBIC, Inter.PILCUBIC] 中的任何一个。默认值:Inter.LINEAR。 - Inter.LINEAR,双线性插值。 - Inter.NEAREST,最近邻插值。 diff --git a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.Rotate.rst b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.Rotate.rst index 34dad0e7f74..fce4ae17b08 100644 --- a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.Rotate.rst +++ b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.Rotate.rst @@ -7,7 +7,7 @@ mindspore.dataset.vision.Rotate 参数: - **degrees** (Union[int, float]) - 旋转角度。 - - **resample** (Inter, 可选) - 插值方式。它可以是 [Inter.BILINEAR, Inter.NEAREST, Inter.BICUBIC] 中的任何一个,默认值:Inter.NEAREST。 + - **resample** (Inter, 可选) - 插值方式。它可以是 [Inter.BILINEAR, Inter.NEAREST, Inter.BICUBIC] 中的任何一个。默认值:Inter.NEAREST。 - Inter.BILINEAR,双线性插值。 - Inter.NEAREST,最近邻插值。 @@ -15,7 +15,7 @@ mindspore.dataset.vision.Rotate - **expand** (bool, 可选) - 若为True,将扩展图像尺寸大小使其足以容纳整个旋转图像;若为False,则保持图像尺寸大小不变。请注意,扩展时将假设图像为中心旋转且未进行平移。默认值:False。 - **center** (tuple, 可选) - 可选的旋转中心,以图像左上角为原点,旋转中心的位置按照 (宽度, 高度) 格式指定。默认值:None,表示中心旋转。 - - **fill_value** (Union[int, tuple[int]], 可选) - 旋转图像之外区域的像素填充值。若输入3元素元组,将分别用于填充R、G、B通道;若输入整型,将以该值填充RGB通道。 `fill_value` 值必须在 [0, 255] 范围内,默认值:0。 + - **fill_value** (Union[int, tuple[int]], 可选) - 旋转图像之外区域的像素填充值。若输入3元素元组,将分别用于填充R、G、B通道;若输入整型,将以该值填充RGB通道。 `fill_value` 值必须在 [0, 255] 范围内。默认值:0。 异常: - **TypeError** - 当 `degrees` 的类型不为int或float。 diff --git a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.SlicePatches.rst b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.SlicePatches.rst index a3f936da93b..d537c190c1f 100644 --- a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.SlicePatches.rst +++ b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.SlicePatches.rst @@ -6,10 +6,10 @@ mindspore.dataset.vision.SlicePatches 在水平和垂直方向上将Tensor切片为多个块。适合于Tensor高宽较大的使用场景。如果将 `num_height` 和 `num_width` 都设置为 1,则Tensor将保持不变。输出Tensor的数量等于 num_height*num_width。 参数: - - **num_height** (int, 可选) - 垂直方向的切块数量,默认值:1。 - - **num_width** (int, 可选) - 水平方向的切块数量,默认值:1。 - - **slice_mode** (Inter, 可选) - 表示填充或丢弃,它可以是 [SliceMode.PAD, SliceMode.DROP] 中的任何一个,默认值:SliceMode.PAD。 - - **fill_value** (int, 可选) - 如果 `slice_mode` 取值为 SliceMode.PAD,该值表示在右侧和底部方向上的填充的边界宽度(以像素数计),默认值:0。 + - **num_height** (int, 可选) - 垂直方向的切块数量。默认值:1。 + - **num_width** (int, 可选) - 水平方向的切块数量。默认值:1。 + - **slice_mode** (Inter, 可选) - 表示填充或丢弃,它可以是 [SliceMode.PAD, SliceMode.DROP] 中的任何一个。默认值:SliceMode.PAD。 + - **fill_value** (int, 可选) - 如果 `slice_mode` 取值为 SliceMode.PAD,该值表示在右侧和底部方向上的填充的边界宽度(以像素数计)。默认值:0。 异常: - **TypeError** - 当 `num_height` 不是int。 diff --git a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.ToTensor.rst b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.ToTensor.rst index f6378ac1784..a4190bafa78 100644 --- a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.ToTensor.rst +++ b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.ToTensor.rst @@ -6,7 +6,7 @@ mindspore.dataset.vision.ToTensor 将输入PIL图像或numpy.ndarray图像转换为指定类型的numpy.ndarray图像,图像的像素值范围将从[0, 255]放缩为[0.0, 1.0],shape将从(H, W, C)调整为(C, H, W)。 参数: - - **output_type** (Union[mindspore.dtype, numpy.dtype],可选) - 输出图像的数据类型,默认值::class:`numpy.float32`。 + - **output_type** (Union[mindspore.dtype, numpy.dtype],可选) - 输出图像的数据类型。默认值::class:`numpy.float32`。 异常: - **TypeError** - 当输入图像的类型不为 :class:`PIL.Image.Image` 或 :class:`numpy.ndarray` 。 diff --git a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.UniformAugment.rst b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.UniformAugment.rst index c9979c9afb2..136cccc5816 100644 --- a/docs/api/api_python/dataset_vision/mindspore.dataset.vision.UniformAugment.rst +++ b/docs/api/api_python/dataset_vision/mindspore.dataset.vision.UniformAugment.rst @@ -9,7 +9,7 @@ mindspore.dataset.vision.UniformAugment 参数: - **transforms** (Sequence) - 数据处理操作序列。 - - **num_ops** (int,可选) - 均匀采样的数据处理操作数,默认值:2。 + - **num_ops** (int,可选) - 均匀采样的数据处理操作数。默认值:2。 异常: - **TypeError** - 当 `transforms` 的类型不为数据处理操作序列。 diff --git a/mindspore/python/mindspore/dataset/__init__.py b/mindspore/python/mindspore/dataset/__init__.py index fcd82ac9376..bf91de4bdda 100644 --- a/mindspore/python/mindspore/dataset/__init__.py +++ b/mindspore/python/mindspore/dataset/__init__.py @@ -21,7 +21,7 @@ Besides, this module provides APIs to sample data while loading. We can enable cache in most of the dataset with its key arguments 'cache'. Please notice that cache is not supported on Windows platform yet. Do not use it while loading and processing data on Windows. More introductions and limitations -can refer `Single-Node Tensor Cache `_. +can refer `Single-Node Tensor Cache `_ . Common imported modules in corresponding API examples are as follows: diff --git a/mindspore/python/mindspore/dataset/audio/transforms.py b/mindspore/python/mindspore/dataset/audio/transforms.py index 6d65bd67439..650a75c50e5 100644 --- a/mindspore/python/mindspore/dataset/audio/transforms.py +++ b/mindspore/python/mindspore/dataset/audio/transforms.py @@ -508,8 +508,8 @@ class ComputeDeltas(AudioTensorOperation): at time :math:`t` , :math:`N` is :math:`(\text{win_length}-1)//2` . Args: - win_length (int, optional): The window length used for computing delta, must be no less than 3 (default=5). - pad_mode (BorderType, optional): Mode parameter passed to padding (default=BorderType.EDGE).It can be any of + win_length (int, optional): The window length used for computing delta, must be no less than 3. Default: 5. + pad_mode (BorderType, optional): Mode parameter passed to padding. Default: BorderType.EDGE. It can be any of [BorderType.CONSTANT, BorderType.EDGE, BorderType.REFLECT, BordBorderTypeer.SYMMETRIC]. - BorderType.CONSTANT, means it fills the border with constant values. @@ -677,13 +677,13 @@ class DetectPitchFrequency(AudioTensorOperation): Args: sample_rate (int): Sampling rate of the waveform, e.g. 44100 (Hz), the value can't be zero. - frame_time (float, optional): Duration of a frame, the value must be greater than zero (default=0.01). + frame_time (float, optional): Duration of a frame, the value must be greater than zero. Default: 0.01. win_length (int, optional): The window length for median smoothing (in number of frames), the value must be - greater than zero (default=30). - freq_low (int, optional): Lowest frequency that can be detected (Hz), the value must be greater than zero - (default=85). - freq_high (int, optional): Highest frequency that can be detected (Hz), the value must be greater than zero - (default=3400). + greater than zero. Default: 30. + freq_low (int, optional): Lowest frequency that can be detected (Hz), the value must be greater than zero. + Default: 85. + freq_high (int, optional): Highest frequency that can be detected (Hz), the value must be greater than zero. + Default: 3400. Examples: >>> import numpy as np @@ -723,10 +723,10 @@ class Dither(AudioTensorOperation): density_function (DensityFunction, optional): The density function of a continuous random variable. Can be one of DensityFunction.TPDF (Triangular Probability Density Function), DensityFunction.RPDF (Rectangular Probability Density Function) or - DensityFunction.GPDF (Gaussian Probability Density Function) - (default=DensityFunction.TPDF). + DensityFunction.GPDF (Gaussian Probability Density Function). + Default: DensityFunction.TPDF. noise_shaping (bool, optional): A filtering process that shapes the spectral - energy of quantisation error (default=False). + energy of quantisation error. Default: False. Examples: >>> import numpy as np @@ -755,7 +755,7 @@ class EqualizerBiquad(AudioTensorOperation): sample_rate (int): Sampling rate of the waveform, e.g. 44100 (Hz), the value can't be zero. center_freq (float): Central frequency (in Hz). gain (float): Desired gain at the boost (or attenuation) in dB. - Q (float, optional): https://en.wikipedia.org/wiki/Q_factor, range: (0, 1] (default=0.707). + Q (float, optional): https://en.wikipedia.org/wiki/Q_factor, range: (0, 1]. Default: 0.707. Examples: >>> import numpy as np @@ -790,9 +790,9 @@ class Fade(AudioTensorOperation): Add a fade in and/or fade out to an waveform. Args: - fade_in_len (int, optional): Length of fade-in (time frames), which must be non-negative (default=0). - fade_out_len (int, optional): Length of fade-out (time frames), which must be non-negative (default=0). - fade_shape (FadeShape, optional): Shape of fade (default=FadeShape.LINEAR). Can be one of + fade_in_len (int, optional): Length of fade-in (time frames), which must be non-negative. Default: 0. + fade_out_len (int, optional): Length of fade-out (time frames), which must be non-negative. Default: 0. + fade_shape (FadeShape, optional): Shape of fade. Default: FadeShape.LINEAR. Can be one of FadeShape.QUARTER_SINE, FadeShape.HALF_SINE, FadeShape.LINEAR, FadeShape.LOGARITHMIC or FadeShape.EXPONENTIAL. @@ -842,7 +842,7 @@ class Filtfilt(AudioTensorOperation): b_coeffs (Sequence): numerator coefficients of difference equation of dimension of (n_order + 1). Lower delays coefficients are first, e.g. [b0, b1, b2, ...]. Must be same size as a_coeffs (pad with 0's as necessary). - clamp (bool, optional): If True, clamp the output signal to be in the range [-1, 1]. Default=True. + clamp (bool, optional): If True, clamp the output signal to be in the range [-1, 1]. Default: True. Raises: RuntimeError: If the shape of input audio waveform does not match <..., time>. @@ -882,15 +882,15 @@ class Flanger(AudioTensorOperation): Args: sample_rate (int): Sampling rate of the waveform, e.g. 44100 (Hz). - delay (float, optional): Desired delay in milliseconds (ms), range: [0, 30] (default=0.0). - depth (float, optional): Desired delay depth in milliseconds (ms), range: [0, 10] (default=2.0). - regen (float, optional): Desired regen (feedback gain) in dB, range: [-95, 95] (default=0.0). - width (float, optional): Desired width (delay gain) in dB, range: [0, 100] (default=71.0). - speed (float, optional): Modulation speed in Hz, range: [0.1, 10] (default=0.5). - phase (float, optional): Percentage phase-shift for multi-channel, range: [0, 100] (default=25.0). - modulation (Modulation, optional): Modulation of the input tensor (default=Modulation.SINUSOIDAL). + delay (float, optional): Desired delay in milliseconds (ms), range: [0, 30]. Default: 0.0. + depth (float, optional): Desired delay depth in milliseconds (ms), range: [0, 10]. Default: 2.0. + regen (float, optional): Desired regen (feedback gain) in dB, range: [-95, 95]. Default: 0.0. + width (float, optional): Desired width (delay gain) in dB, range: [0, 100]. Default: 71.0. + speed (float, optional): Modulation speed in Hz, range: [0.1, 10]. Default: 0.5. + phase (float, optional): Percentage phase-shift for multi-channel, range: [0, 100]. Default: 25.0. + modulation (Modulation, optional): Modulation of the input tensor. Default: Modulation.SINUSOIDAL. It can be one of Modulation.SINUSOIDAL or Modulation.TRIANGULAR. - interpolation (Interpolation, optional): Interpolation of the input tensor (default=Interpolation.LINEAR). + interpolation (Interpolation, optional): Interpolation of the input tensor. Default: Interpolation.LINEAR. It can be one of Interpolation.LINEAR or Interpolation.QUADRATIC. Examples: @@ -984,7 +984,7 @@ class Gain(AudioTensorOperation): Apply amplification or attenuation to the whole waveform. Args: - gain_db (float): Gain adjustment in decibels (dB) (default=1.0). + gain_db (float): Gain adjustment in decibels (dB). Default: 1.0. Examples: >>> import numpy as np @@ -1015,19 +1015,19 @@ class GriffinLim(AudioTensorOperation): whole signal. Args: - n_fft (int, optional): Size of FFT (default=400). - n_iter (int, optional): Number of iteration for phase recovery (default=32). - win_length (int, optional): Window size for GriffinLim (default=None, will be set to n_fft). - hop_length (int, optional): Length of hop between STFT windows (default=None, will be set to win_length // 2). + n_fft (int, optional): Size of FFT. Default: 400. + n_iter (int, optional): Number of iteration for phase recovery. Default: 32. + win_length (int, optional): Window size for GriffinLim. Default: None, will be set to n_fft. + hop_length (int, optional): Length of hop between STFT windows. Default: None, will be set to win_length // 2. window_type (WindowType, optional): Window type for GriffinLim, which can be WindowType.BARTLETT, - WindowType.BLACKMAN, WindowType.HAMMING, WindowType.HANN or WindowType.KAISER (default=WindowType.HANN). + WindowType.BLACKMAN, WindowType.HAMMING, WindowType.HANN or WindowType.KAISER. Default: WindowType.HANN. Currently kaiser window is not supported on macOS. - power (float, optional): Exponent for the magnitude spectrogram (default=2.0). - momentum (float, optional): The momentum for fast Griffin-Lim (default=0.99). - length (int, optional): Length of the expected output waveform (default=None, will be set to the value of last - dimension of the stft matrix). - rand_init (bool, optional): Flag for random phase initialization or all-zero phase initialization - (default=True). + power (float, optional): Exponent for the magnitude spectrogram. Default: 2.0. + momentum (float, optional): The momentum for fast Griffin-Lim. Default: 0.99. + length (int, optional): Length of the expected output waveform. Default: None, will be set to the value of last + dimension of the stft matrix. + rand_init (bool, optional): Flag for random phase initialization or all-zero phase initialization. + Default: True. Examples: >>> import numpy as np @@ -1065,7 +1065,7 @@ class HighpassBiquad(AudioTensorOperation): Args: sample_rate (int): Sampling rate of the waveform, e.g. 44100 (Hz), the value can't be zero. cutoff_freq (float): Filter cutoff frequency (in Hz). - Q (float, optional): Quality factor, https://en.wikipedia.org/wiki/Q_factor, range: (0, 1] (default=0.707). + Q (float, optional): Quality factor, https://en.wikipedia.org/wiki/Q_factor, range: (0, 1]. Default: 0.707. Examples: >>> import numpy as np @@ -1093,18 +1093,18 @@ class InverseMelScale(AudioTensorOperation): Args: n_stft (int): Number of bins in STFT. - n_mels (int, optional): Number of mel filterbanks (default=128). - sample_rate (int, optional): Sample rate of audio signal (default=16000). - f_min (float, optional): Minimum frequency (default=0.0). - f_max (float, optional): Maximum frequency (default=None, will be set to sample_rate // 2). - max_iter (int, optional): Maximum number of optimization iterations (default=100000). - tolerance_loss (float, optional): Value of loss to stop optimization at (default=1e-5). - tolerance_change (float, optional): Difference in losses to stop optimization at (default=1e-8). - sgdargs (dict, optional): Arguments for the SGD optimizer (default=None, will be set to - {'sgd_lr': 0.1, 'sgd_momentum': 0.9}). - norm (NormType, optional): Normalization method, can be NormType.SLANEY or NormType.NONE - (default=NormType.NONE). - mel_type (MelType, optional): Mel scale to use, can be MelType.SLANEY or MelType.HTK (default=MelType.HTK). + n_mels (int, optional): Number of mel filterbanks. Default: 128. + sample_rate (int, optional): Sample rate of audio signal. Default: 16000. + f_min (float, optional): Minimum frequency. Default: 0.0. + f_max (float, optional): Maximum frequency. Default: None, will be set to sample_rate // 2. + max_iter (int, optional): Maximum number of optimization iterations. Default: 100000. + tolerance_loss (float, optional): Value of loss to stop optimization at. Default: 1e-5. + tolerance_change (float, optional): Difference in losses to stop optimization at. Default: 1e-8. + sgdargs (dict, optional): Arguments for the SGD optimizer. Default: None, will be set to + {'sgd_lr': 0.1, 'sgd_momentum': 0.9}. + norm (NormType, optional): Normalization method, can be NormType.SLANEY or NormType.NONE. + Default: NormType.NONE. + mel_type (MelType, optional): Mel scale to use, can be MelType.SLANEY or MelType.HTK. Default: MelType.HTK. Examples: >>> import numpy as np @@ -1151,7 +1151,7 @@ class LFilter(AudioTensorOperation): b_coeffs (sequence): numerator coefficients of difference equation of dimension of (n_order + 1). Lower delays coefficients are first, e.g. [b0, b1, b2, ...]. Must be same size as a_coeffs (pad with 0's as necessary). - clamp (bool, optional): If True, clamp the output signal to be in the range [-1, 1] (default=True). + clamp (bool, optional): If True, clamp the output signal to be in the range [-1, 1]. Default: True. Raises: RuntimeError: If the shape of input audio waveform does not match <..., time>. @@ -1236,7 +1236,7 @@ class Magphase(AudioTensorOperation): Separate a complex-valued spectrogram with shape (..., 2) into its magnitude and phase. Args: - power (float): Power of the norm, which must be non-negative (default=1.0). + power (float): Power of the norm, which must be non-negative. Default: 1.0. Raises: RuntimeError: If the shape of input audio waveform does not match <..., 2>. @@ -1343,15 +1343,15 @@ class MelScale(AudioTensorOperation): Convert normal STFT to STFT at the Mel scale. Args: - n_mels (int, optional): Number of mel filterbanks (default=128). - sample_rate (int, optional): Sample rate of audio signal (default=16000). - f_min (float, optional): Minimum frequency (default=0). - f_max (float, optional): Maximum frequency (default=None, will be set to sample_rate // 2). - n_stft (int, optional): Number of bins in STFT (default=201). + n_mels (int, optional): Number of mel filterbanks. Default: 128. + sample_rate (int, optional): Sample rate of audio signal. Default: 16000. + f_min (float, optional): Minimum frequency. Default: 0. + f_max (float, optional): Maximum frequency. Default: None, will be set to sample_rate // 2. + n_stft (int, optional): Number of bins in STFT. Default: 201. norm (NormType, optional): Type of norm, value should be NormType.SLANEY or NormType::NONE. If norm is NormType.SLANEY, divide the triangular mel weight by the width of the mel band. - (default=NormType.NONE). - mel_type (MelType, optional): Type to use, value should be MelType.SLANEY or MelType.HTK (default=MelType.HTK). + Default: NormType.NONE. + mel_type (MelType, optional): Type to use, value should be MelType.SLANEY or MelType.HTK. Default: MelType.HTK. Examples: >>> import numpy as np @@ -1385,7 +1385,7 @@ class MuLawDecoding(AudioTensorOperation): Decode mu-law encoded signal. Args: - quantization_channels (int, optional): Number of channels, which must be positive (Default: 256). + quantization_channels (int, optional): Number of channels, which must be positive. Default: 256. Examples: >>> import numpy as np @@ -1410,7 +1410,7 @@ class MuLawEncoding(AudioTensorOperation): Encode signal based on mu-law companding. Args: - quantization_channels (int, optional): Number of channels, which must be positive (Default: 256). + quantization_channels (int, optional): Number of channels, which must be positive. Default: 256. Examples: >>> import numpy as np @@ -1435,9 +1435,9 @@ class Overdrive(AudioTensorOperation): Apply overdrive on input audio. Args: - gain (float, optional): Desired gain at the boost (or attenuation) in dB, in range of [0, 100] (default=20.0). + gain (float, optional): Desired gain at the boost (or attenuation) in dB, in range of [0, 100]. Default: 20.0. color (float, optional): Controls the amount of even harmonic content in the over-driven output, - in range of [0, 100] (default=20.0). + in range of [0, 100]. Default: 20.0. Examples: >>> import numpy as np @@ -1465,15 +1465,15 @@ class Phaser(AudioTensorOperation): Args: sample_rate (int): Sampling rate of the waveform, e.g. 44100 (Hz). gain_in (float, optional): Desired input gain at the boost (or attenuation) in dB. - Allowed range of values is [0, 1] (default=0.4). + Allowed range of values is [0, 1]. Default: 0.4. gain_out (float, optional): Desired output gain at the boost (or attenuation) in dB. - Allowed range of values is [0, 1e9] (default=0.74). - delay_ms (float, optional): Desired delay in milli seconds. Allowed range of values is [0, 5] (default=3.0). - decay (float, optional): Desired decay relative to gain-in. Allowed range of values is [0, 0.99] (default=0.4). - mod_speed (float, optional): Modulation speed in Hz. Allowed range of values is [0.1, 2] (default=0.5). + Allowed range of values is [0, 1e9]. Default: 0.74. + delay_ms (float, optional): Desired delay in milli seconds. Allowed range of values is [0, 5]. Default: 3.0. + decay (float, optional): Desired decay relative to gain-in. Allowed range of values is [0, 0.99]. Default: 0.4. + mod_speed (float, optional): Modulation speed in Hz. Allowed range of values is [0.1, 2]. Default: 0.5. sinusoidal (bool, optional): If True, use sinusoidal modulation (preferable for multiple instruments). If False, use triangular modulation (gives single instruments a sharper - phasing effect) (default=True). + phasing effect. Default: True. Examples: >>> import numpy as np @@ -1538,16 +1538,16 @@ class Resample(AudioTensorOperation): Resample a signal from one frequency to another. A resample method can be given. Args: - orig_freq (float, optional): The original frequency of the signal, which must be positive (default=16000). - new_freq (float, optional): The desired frequency, which must be positive (default=16000). + orig_freq (float, optional): The original frequency of the signal, which must be positive. Default: 16000. + new_freq (float, optional): The desired frequency, which must be positive. Default: 16000. resample_method (ResampleMethod, optional): The resample method, which can be - ResampleMethod.SINC_INTERPOLATION and ResampleMethod.KAISER_WINDOW - (default=ResampleMethod.SINC_INTERPOLATION). + ResampleMethod.SINC_INTERPOLATION and ResampleMethod.KAISER_WINDOW. + Default: ResampleMethod.SINC_INTERPOLATION. lowpass_filter_width (int, optional): Controls the shaperness of the filter, more means sharper but less - efficient, which must be positive (default=6). + efficient, which must be positive. Default: 6. rolloff (float, optional): The roll-off frequency of the filter, as a fraction of the Nyquist. Lower values - reduce anti-aliasing, but also reduce some of the highest frequencies, range: (0, 1] (default=0.99). - beta (float, optional): The shape parameter used for kaiser window (default=None, will use 14.769656459379492). + reduce anti-aliasing, but also reduce some of the highest frequencies, range: (0, 1]. Default: 0.99. + beta (float, optional): The shape parameter used for kaiser window. Default: None, will use 14.769656459379492. Examples: >>> import numpy as np @@ -1609,12 +1609,12 @@ class SlidingWindowCmn(AudioTensorOperation): Apply sliding-window cepstral mean (and optionally variance) normalization per utterance. Args: - cmn_window (int, optional): Window in frames for running average CMN computation (default=600). + cmn_window (int, optional): Window in frames for running average CMN computation. Default: 600. min_cmn_window (int, optional): Minimum CMN window used at start of decoding (adds latency only at start). - Only applicable if center is False, ignored if center is True (default=100). + Only applicable if center is False, ignored if center is True. Default: 100. center (bool, optional): If True, use a window centered on the current frame. If False, window is - to the left. (default=False). - norm_vars (bool, optional): If True, normalize variance to one. (default=False). + to the left. Default: False. + norm_vars (bool, optional): If True, normalize variance to one. Default: False. Examples: >>> import numpy as np @@ -1650,13 +1650,13 @@ class SpectralCentroid(TensorOperation): Args: sample_rate (int): Sampling rate of the waveform, e.g. 44100 (Hz). - n_fft (int, optional): Size of FFT, creates n_fft // 2 + 1 bins (default=400). - win_length (int, optional): Window size (default=None, will use n_fft). - hop_length (int, optional): Length of hop between STFT windows (default=None, will use win_length // 2). - pad (int, optional): Two sided padding of signal (default=0). + n_fft (int, optional): Size of FFT, creates n_fft // 2 + 1 bins. Default: 400. + win_length (int, optional): Window size. Default: None, will use n_fft. + hop_length (int, optional): Length of hop between STFT windows. Default: None, will use win_length // 2. + pad (int, optional): Two sided padding of signal. Default: 0. window (WindowType, optional): Window function that is applied/multiplied to each frame/window, which can be WindowType.BARTLETT, WindowType.BLACKMAN, WindowType.HAMMING, WindowType.HANN - or WindowType.KAISER (default=WindowType.HANN). + or WindowType.KAISER. Default: WindowType.HANN. Examples: >>> import numpy as np @@ -1687,21 +1687,21 @@ class Spectrogram(TensorOperation): Create a spectrogram from an audio signal. Args: - n_fft (int, optional): Size of FFT, creates n_fft // 2 + 1 bins (default=400). - win_length (int, optional): Window size (default=None, will use n_fft). - hop_length (int, optional): Length of hop between STFT windows (default=None, will use win_length // 2). - pad (int, optional): Two sided padding of signal (default=0). + n_fft (int, optional): Size of FFT, creates n_fft // 2 + 1 bins. Default: 400. + win_length (int, optional): Window size. Default: None, will use n_fft. + hop_length (int, optional): Length of hop between STFT windows. Default: None, will use win_length // 2. + pad (int, optional): Two sided padding of signal. Default: 0. window (WindowType, optional): Window function that is applied/multiplied to each frame/window, which can be WindowType.BARTLETT, WindowType.BLACKMAN, WindowType.HAMMING, WindowType.HANN - or WindowType.KAISER (default=WindowType.HANN). Currently kaiser window is not supported on macOS. + or WindowType.KAISER. Default: WindowType.HANN. Currently kaiser window is not supported on macOS. power (float, optional): Exponent for the magnitude spectrogram, which must be greater - than or equal to 0, e.g., 1 for energy, 2 for power, etc. (default=2.0). - normalized (bool, optional): Whether to normalize by magnitude after stft (default=False). - center (bool, optional): Whether to pad waveform on both sides (default=True). + than or equal to 0, e.g., 1 for energy, 2 for power, etc. Default: 2.0. + normalized (bool, optional): Whether to normalize by magnitude after stft. Default: False. + center (bool, optional): Whether to pad waveform on both sides. Default: True. pad_mode (BorderType, optional): Controls the padding method used when center is True, - which can be BorderType.REFLECT, BorderType.CONSTANT, BorderType.EDGE, BorderType.SYMMETRIC - (default=BorderType.REFLECT). - onesided (bool, optional): Controls whether to return half of results to avoid redundancy (default=True). + which can be BorderType.REFLECT, BorderType.CONSTANT, BorderType.EDGE, BorderType.SYMMETRIC. + Default: BorderType.REFLECT. + onesided (bool, optional): Controls whether to return half of results to avoid redundancy. Default: True. Examples: >>> import numpy as np @@ -1852,8 +1852,8 @@ class TrebleBiquad(AudioTensorOperation): Args: sample_rate (int): Sampling rate of the waveform, e.g. 44100 (Hz), the value can't be zero. gain (float): Desired gain at the boost (or attenuation) in dB. - central_freq (float, optional): Central frequency (in Hz) (default=3000). - Q(float, optional): Quality factor, https://en.wikipedia.org/wiki/Q_factor, range: (0, 1] (default=0.707). + central_freq (float, optional): Central frequency (in Hz). Default: 3000. + Q (float, optional): Quality factor, https://en.wikipedia.org/wiki/Q_factor, range: (0, 1]. Default: 0.707. Examples: >>> import numpy as np @@ -1882,33 +1882,33 @@ class Vad(AudioTensorOperation): Args: sample_rate (int): Sample rate of audio signal. - trigger_level (float, optional): The measurement level used to trigger activity detection (default=7.0). - trigger_time (float, optional): The time constant (in seconds) used to help ignore short sounds (default=0.25). + trigger_level (float, optional): The measurement level used to trigger activity detection. Default: 7.0. + trigger_time (float, optional): The time constant (in seconds) used to help ignore short sounds. Default: 0.25. search_time (float, optional): The amount of audio (in seconds) to search for quieter/shorter sounds to include - prior to the detected trigger point (default=1.0). + prior to the detected trigger point. Default: 1.0. allowed_gap (float, optional): The allowed gap (in seconds) between quiteter/shorter sounds to include prior to - the detected trigger point (default=0.25). + the detected trigger point. Default: 0.25. pre_trigger_time (float, optional): The amount of audio (in seconds) to preserve before the trigger point and - any found quieter/shorter bursts (default=0.0). - boot_time (float, optional): The time for the initial noise estimate (default=0.35). + any found quieter/shorter bursts. Default: 0.0. + boot_time (float, optional): The time for the initial noise estimate. Default: 0.35. noise_up_time (float, optional): Time constant used by the adaptive noise estimator, when the noise level is - increasing (default=0.1). + increasing. Default: 0.1. noise_down_time (float, optional): Time constant used by the adaptive noise estimator, when the noise level is - decreasing (default=0.01). - noise_reduction_amount (float, optional): The amount of noise reduction used in the detection algorithm - (default=1.35). - measure_freq (float, optional): The frequency of the algorithm’s processing (default=20.0). - measure_duration (float, optional): The duration of measurement (default=None, use twice the measurement - period). - measure_smooth_time (float, optional): The time constant used to smooth spectral measurements (default=0.4). + decreasing. Default: 0.01. + noise_reduction_amount (float, optional): The amount of noise reduction used in the detection algorithm. + Default: 1.35. + measure_freq (float, optional): The frequency of the algorithm’s processing. Default: 20.0. + measure_duration (float, optional): The duration of measurement. Default: None, use twice the measurement + period. + measure_smooth_time (float, optional): The time constant used to smooth spectral measurements. Default: 0.4. hp_filter_freq (float, optional): The "Brick-wall" frequency of high-pass filter applied at the input to the - detector algorithm (default=50.0). + detector algorithm. Default: 50.0. lp_filter_freq (float, optional): The "Brick-wall" frequency of low-pass filter applied at the input to the - detector algorithm (default=6000.0). + detector algorithm. Default: 6000.0. hp_lifter_freq (float, optional): The "Brick-wall" frequency of high-pass lifter applied at the input to the - detector algorithm (default=150.0). + detector algorithm. Default: 150.0. lp_lifter_freq (float, optional): The "Brick-wall" frequency of low-pass lifter applied at the input to the - detector algorithm (default=2000.0). + detector algorithm. Default: 2000.0. Examples: >>> import numpy as np @@ -1966,7 +1966,7 @@ class Vol(AudioTensorOperation): If gain_type = power, gain stands for power. If gain_type = db, gain stands for decibels. gain_type (GainType, optional): Type of gain, contains the following three enumeration values - GainType.AMPLITUDE, GainType.POWER and GainType.DB (default=GainType.AMPLITUDE). + GainType.AMPLITUDE, GainType.POWER and GainType.DB. Default: GainType.AMPLITUDE. Examples: >>> import numpy as np diff --git a/mindspore/python/mindspore/dataset/audio/utils.py b/mindspore/python/mindspore/dataset/audio/utils.py index b263ef0502c..fdd35af39bc 100644 --- a/mindspore/python/mindspore/dataset/audio/utils.py +++ b/mindspore/python/mindspore/dataset/audio/utils.py @@ -221,7 +221,7 @@ def create_dct(n_mfcc, n_mels, norm=NormMode.NONE): Args: n_mfcc (int): Number of mfc coefficients to retain, the value must be greater than 0. n_mels (int): Number of mel filterbanks, the value must be greater than 0. - norm (NormMode, optional): Normalization mode, can be NormMode.NONE or NormMode.ORTHO (default=NormMode.NONE). + norm (NormMode, optional): Normalization mode, can be NormMode.NONE or NormMode.ORTHO. Default: NormMode.NONE. Returns: numpy.ndarray, the transformation matrix, to be right-multiplied to row-wise data of size (n_mels, n_mfcc). @@ -306,8 +306,8 @@ def melscale_fbanks(n_freqs, f_min, f_max, n_mels, sample_rate, norm=NormType.NO f_max (float): Maximum of frequency in Hz. n_mels (int): Number of mel filterbanks. sample_rate (int): Sample rate. - norm (NormType, optional): Norm to use, can be NormType.NONE or NormType.SLANEY (Default: NormType.NONE). - mel_type (MelType, optional): Scale to use, can be MelType.HTK or MelType.SLANEY (Default: NormType.SLANEY). + norm (NormType, optional): Norm to use, can be NormType.NONE or NormType.SLANEY. Default: NormType.NONE. + mel_type (MelType, optional): Scale to use, can be MelType.HTK or MelType.SLANEY. Default: NormType.SLANEY. Returns: numpy.ndarray, the frequency transformation matrix. diff --git a/mindspore/python/mindspore/dataset/core/config.py b/mindspore/python/mindspore/dataset/core/config.py index 0e94fe629e2..80dd8b06cb9 100644 --- a/mindspore/python/mindspore/dataset/core/config.py +++ b/mindspore/python/mindspore/dataset/core/config.py @@ -761,7 +761,7 @@ def get_multiprocessing_timeout_interval(): Returns: int, interval (in seconds) for multiprocessing/multithreading timeout when main process/thread gets data from - subprocesses/child threads (default is 300s). + subprocesses/child threads. Default: 300s. Examples: >>> # Get the global configuration of multiprocessing/multithreading timeout when main process/thread gets data diff --git a/mindspore/python/mindspore/dataset/engine/cache_client.py b/mindspore/python/mindspore/dataset/engine/cache_client.py index cc9c9e4a436..c69b8f338e6 100644 --- a/mindspore/python/mindspore/dataset/engine/cache_client.py +++ b/mindspore/python/mindspore/dataset/engine/cache_client.py @@ -27,18 +27,18 @@ class DatasetCache: A client to interface with tensor caching service. For details, please check `Tutorial `_. + tutorials/experts/en/master/dataset/cache.html>`_ . Args: session_id (int): A user assigned session id for the current pipeline. - size (int, optional): Size of the memory set aside for the row caching (default=0, which means unlimited, - note that it might bring in the risk of running out of memory on the machine). - spilling (bool, optional): Whether or not spilling to disk if out of memory (default=False). - hostname (str, optional): Host name (default=None, use default hostname '127.0.0.1'). - port (int, optional): Port to connect to server (default=None, use default port 50052). - num_connections (int, optional): Number of tcp/ip connections (default=None, use default value 12). - prefetch_size (int, optional): The size of the cache queue between operations - (default=None, use default value 20). + size (int, optional): Size of the memory set aside for the row caching. Default: 0, which means unlimited, + note that it might bring in the risk of running out of memory on the machine. + spilling (bool, optional): Whether or not spilling to disk if out of memory. Default: False. + hostname (str, optional): Host name. Default: None, use default hostname '127.0.0.1'. + port (int, optional): Port to connect to server. Default: None, use default port 50052. + num_connections (int, optional): Number of tcp/ip connections. Default: None, use default value 12. + prefetch_size (int, optional): The size of the cache queue between operations. + Default: None, use default value 20. Examples: >>> import mindspore.dataset as ds diff --git a/mindspore/python/mindspore/dataset/engine/datasets.py b/mindspore/python/mindspore/dataset/engine/datasets.py index 05a183bcd37..200f6d75238 100644 --- a/mindspore/python/mindspore/dataset/engine/datasets.py +++ b/mindspore/python/mindspore/dataset/engine/datasets.py @@ -310,8 +310,8 @@ class Dataset: NumpySlicesDataset(GeneratorDataset) Args: - num_parallel_workers (int, optional): Number of workers to process the dataset in parallel - (default=None). + num_parallel_workers (int, optional): Number of workers to process the dataset in parallel. + Default: None. """ def __init__(self, children=None, num_parallel_workers=None, cache=None): @@ -448,7 +448,7 @@ class Dataset: Serialize a pipeline into JSON string and dump into file if filename is provided. Args: - filename (str): filename of JSON file to be saved as (default=""). + filename (str): filename of JSON file to be saved as. Default: ''. Returns: str, JSON string of the pipeline. @@ -484,7 +484,7 @@ class Dataset: element_length_function (Callable, optional): A function that takes in M arguments where M = len(column_names) and returns an integer. If no value provided, parameter M the len(column_names) must be 1, and the size of the first - dimension of that column will be taken as the length (default=None). + dimension of that column will be taken as the length. Default: None. pad_info (dict, optional): The information about how to batch each column. The key corresponds to the column name, and the value must be a tuple of 2 elements. The first element corresponds to the shape to pad to, and the second @@ -493,13 +493,13 @@ class Dataset: batch, and 0 will be used as the padding value. Any None dimensions will be padded to the longest in the current batch, unless if pad_to_bucket_boundary is True. If no padding is wanted, set pad_info - to None (default=None). + to None. Default: None. pad_to_bucket_boundary (bool, optional): If True, will pad each None dimension in pad_info to the bucket_boundary minus 1. If there are any - elements that fall into the last bucket, an error will occur - (default=False). + elements that fall into the last bucket, an error will occur. + Default: False. drop_remainder (bool, optional): If True, will drop the last batch for each - bucket if it is not a full batch (default=False). + bucket if it is not a full batch. Default: False. Returns: Dataset, dataset bucketed and batched by length. @@ -549,15 +549,15 @@ class Dataset: batch_size (Union[int, Callable]): The number of rows each batch is created with. An int or callable object which takes exactly 1 parameter, BatchInfo. drop_remainder (bool, optional): Determines whether or not to drop the last block - whose data row number is less than batch size (default=False). If True, and if there are less + whose data row number is less than batch size. Default: False. If True, and if there are less than batch_size rows available to make the last batch, then those rows will be dropped and not propagated to the child node. - num_parallel_workers (int, optional): Number of workers(threads) to process the dataset in parallel - (default=None). + num_parallel_workers (int, optional): Number of workers(threads) to process the dataset in parallel. + Default: None. **kwargs: - per_batch_map (Callable[[List[numpy.ndarray], ..., List[numpy.ndarray], BatchInfo], \ - (List[numpy.ndarray], ..., List[numpy.ndarray])], optional): Per batch map callable (default=None). + (List[numpy.ndarray], ..., List[numpy.ndarray])], optional): Per batch map callable. Default: None. A callable which takes (List[numpy.ndarray], ..., List[numpy.ndarray], BatchInfo) as input parameters. Each list[numpy.ndarray] represents a batch of numpy.ndarray on a given column. The number of lists should match with the number of entries in input_columns. The last parameter of the callable should @@ -566,20 +566,20 @@ class Dataset: as the input. output_columns is required if the number of output lists is different from input. - input_columns (Union[str, list[str]], optional): List of names of the input columns. The size of - the list should match with signature of per_batch_map callable (default=None). + the list should match with signature of per_batch_map callable. Default: None. - output_columns (Union[str, list[str]], optional): List of names assigned to the columns outputted by the last operation. This parameter is mandatory if len(input_columns) != len(output_columns). The size of this list must match the number of output - columns of the last operation. (default=None, output columns will have the same - name as the input columns, i.e., the columns will be replaced). + columns of the last operation. Default: None, output columns will have the same + name as the input columns, i.e., the columns will be replaced. - python_multiprocessing (bool, optional): Parallelize Python function per_batch_map with - multi-processing. This option could be beneficial if the function is computational heavy - (default=False). + multi-processing. This option could be beneficial if the function is computational heavy. + Default: False. - max_rowsize(int, optional): Maximum size of row in MB that is used for shared memory allocation to - copy data between processes. This is only used if python_multiprocessing is set to True (default=16). + copy data between processes. This is only used if python_multiprocessing is set to True. Default: 16. Returns: BatchDataset, dataset batched. @@ -627,11 +627,11 @@ class Dataset: batch_size (Union[int, Callable]): The number of rows each batch is created with. An int or callable object which takes exactly 1 parameter, BatchInfo. drop_remainder (bool, optional): Determines whether or not to drop the last block - whose data row number is less than batch size (default=False). If True, and if there are less + whose data row number is less than batch size. Default: False. If True, and if there are less than batch_size rows available to make the last batch, then those rows will be dropped and not propagated to the child node. - num_parallel_workers (int, optional): Number of workers(threads) to process the dataset in parallel - (default=None). + num_parallel_workers (int, optional): Number of workers(threads) to process the dataset in parallel. + Default: None. pad_info (dict, optional): The information about how to batch each column. The key corresponds to the column name, and the value must be a tuple of 2 elements. The first element corresponds to the shape to pad to, and the second @@ -640,7 +640,7 @@ class Dataset: batch, and 0 will be used as the padding value. Any None dimensions will be padded to the longest in the current batch, unless if pad_to_bucket_boundary is True. If no padding is wanted, set pad_info - to None (default=None). + to None. Default: None. Returns: PaddedBatchDataset, dataset batched. @@ -668,8 +668,8 @@ class Dataset: Args: condition_name (str): The condition name that is used to toggle sending next row. - num_batch (int): the number of batches without blocking at the start of each epoch (default=1). - callback (function): The callback function that will be invoked when sync_update is called (default=None). + num_batch (int): the number of batches without blocking at the start of each epoch. Default: 1. + callback (function): The callback function that will be invoked when sync_update is called. Default: None. Returns: SyncWaitDataset, dataset added a blocking condition. @@ -830,31 +830,31 @@ class Dataset: applied on the dataset. Operations are applied in the order they appear in this list. input_columns (Union[str, list[str]], optional): List of the names of the columns that will be passed to the first operation as input. The size of this list must match the number of - input columns expected by the first operation. (default=None, the first + input columns expected by the first operation. Default: None, the first operation will be passed however many columns that are required, starting from - the first column). + the first column. output_columns (Union[str, list[str]], optional): List of names assigned to the columns outputted by the last operation. This parameter is mandatory if len(input_columns) != len(output_columns). The size of this list must match the number of output - columns of the last operation. (default=None, output columns will have the same - name as the input columns, i.e., the columns will be replaced). + columns of the last operation. Default: None, output columns will have the same + name as the input columns, i.e., the columns will be replaced. num_parallel_workers (int, optional): Number of threads used to process the dataset in - parallel (default=None, the value from the configuration will be used). + parallel. Default: None, the value from the configuration will be used. **kwargs: - python_multiprocessing (bool, optional): Parallelize Python operations with multiple worker processes. - This option could be beneficial if the Python operation is computational heavy (default=False). + This option could be beneficial if the Python operation is computational heavy. Default: False. - max_rowsize (int, optional): Maximum size of row in MB that is used for shared memory allocation to - copy data between processes. This is only used if python_multiprocessing is set to True (Default=16). + copy data between processes. This is only used if python_multiprocessing is set to True. Default: 16. - cache (DatasetCache, optional): Use tensor caching service to speed up dataset processing. - (default=None, which means no cache is used). + Default: None, which means no cache is used. - - callbacks (DSCallback, list[DSCallback], optional): List of Dataset callbacks to be called - (Default=None). + - callbacks (DSCallback, list[DSCallback], optional): List of Dataset callbacks to be called. + Default: None. - - offload (bool, optional): Flag to indicate whether offload is used (Default=None). + - offload (bool, optional): Flag to indicate whether offload is used. Default: None. Note: - Input `operations` accepts TensorOperations defined in mindspore.dataset part, plus user-defined @@ -939,9 +939,9 @@ class Dataset: Args: predicate (callable): Python callable which returns a boolean value. If False then filter the element. input_columns (Union[str, list[str]], optional): List of names of the input columns. If not provided - or provided with None, the predicate will be applied on all columns in the dataset (default=None). + or provided with None, the predicate will be applied on all columns in the dataset. Default: None. num_parallel_workers (int, optional): Number of workers to process the dataset - in parallel (default=None). + in parallel. Default: None. Returns: Dataset, dataset filtered. @@ -963,7 +963,7 @@ class Dataset: the repeat operation is used after the batch operation. Args: - count (int): Number of times the dataset is going to be repeated (default=None). + count (int): Number of times the dataset is going to be repeated. Default: None. Returns: Dataset, dataset repeated. @@ -1016,7 +1016,7 @@ class Dataset: then take the given number of rows; otherwise take the given number of batches. Args: - count (int, optional): Number of elements to be taken from the dataset (default=-1). + count (int, optional): Number of elements to be taken from the dataset. Default: -1. Returns: Dataset, dataset taken. @@ -1104,7 +1104,7 @@ class Dataset: - The sum of split sizes > K, the difference of sigma(round(fi * K)) - K will be removed from the first large enough split such that it will have at least 1 row after removing the difference. - randomize (bool, optional): Determines whether or not to split the data randomly (default=True). + randomize (bool, optional): Determines whether or not to split the data randomly. Default: True. If True, the data will be randomly split. Otherwise, each split will be created with consecutive rows from the dataset. @@ -1166,7 +1166,7 @@ class Dataset: name. Args: - datasets (Union[tuple, class Dataset]): A tuple of datasets or a single class Dataset + datasets (Union[Dataset, tuple[Dataset]]): A tuple of datasets or a single class Dataset to be zipped together with this dataset. Returns: @@ -1306,9 +1306,9 @@ class Dataset: Return a transferred Dataset that transfers data through a device. Args: - send_epoch_end (bool, optional): Whether to send end of sequence to device or not (default=True). + send_epoch_end (bool, optional): Whether to send end of sequence to device or not. Default: True. create_data_info_queue (bool, optional): Whether to create queue which stores - types and shapes of data or not(default=False). + types and shapes of data or not. Default: False. Note: If device is Ascend, features of data will be transferred one by one. The limitation @@ -1385,8 +1385,8 @@ class Dataset: Args: file_name (str): Path to dataset file. - num_files (int, optional): Number of dataset files (default=1). - file_type (str, optional): Dataset format (default='mindrecord'). + num_files (int, optional): Number of dataset files. Default: 1. + file_type (str, optional): Dataset format. Default: 'mindrecord'. """ ir_tree, api_tree = self.create_ir_tree() @@ -1410,14 +1410,14 @@ class Dataset: is not provided, the order of the columns will remain unchanged. Args: - columns (list[str], optional): List of columns to be used to specify the order of columns - (default=None, means all columns). + columns (list[str], optional): List of columns to be used to specify the order of columns. + Default: None, means all columns. num_epochs (int, optional): Maximum number of epochs that iterator can be iterated. - (default=-1, iterator can be iterated infinite number of epochs) + Default: -1, iterator can be iterated infinite number of epochs. output_numpy (bool, optional): Whether or not to output NumPy datatype. - If output_numpy=False, iterator will output MSTensor (default=False). + If output_numpy=False, iterator will output MSTensor. Default: False. do_copy (bool, optional): when output data type is mindspore.Tensor, - use this param to select the conversion method, only take False for better performance (default=True). + use this param to select the conversion method, only take False for better performance. Default: True. Returns: Iterator, tuple iterator over the dataset. @@ -1444,10 +1444,10 @@ class Dataset: Create an iterator over the dataset. The data retrieved will be a dictionary datatype. Args: - num_epochs (int, optional): Maximum number of epochs that iterator can be iterated - (default=-1, iterator can be iterated infinite number of epochs). + num_epochs (int, optional): Maximum number of epochs that iterator can be iterated. + Default: -1, iterator can be iterated infinite number of epochs. output_numpy (bool, optional): Whether or not to output NumPy datatype, - if output_numpy=False, iterator will output MSTensor (default=False). + if output_numpy=False, iterator will output MSTensor. Default: False. Returns: Iterator, dictionary iterator over the dataset. @@ -1703,8 +1703,8 @@ class Dataset: condition_name (str): The condition name that is used to toggle sending next row. num_batch (Union[int, None]): The number of batches (rows) that are released. When num_batch is None, it will default to the number specified by the - sync_wait operation (default=None). - data (Any): The data passed to the callback, user defined (default=None). + sync_wait operation. Default: None. + data (Any): The data passed to the callback, user defined. Default: None. """ if (not isinstance(num_batch, int) and num_batch is not None) or \ (isinstance(num_batch, int) and num_batch <= 0): @@ -1743,7 +1743,7 @@ class Dataset: def get_repeat_count(self): """ - Get the replication times in RepeatDataset (default is 1). + Get the replication times in RepeatDataset. Default: 1. Returns: int, the count of repeat. @@ -2147,7 +2147,7 @@ class MappableDataset(SourceDataset): - The sum of split sizes > K, the difference will be removed from the first large enough split such that it will have at least 1 row after removing the difference. - randomize (bool, optional): Determines whether or not to split the data randomly (default=True). + randomize (bool, optional): Determines whether or not to split the data randomly. Default: True. If True, the data will be randomly split. Otherwise, each split will be created with consecutive rows from the dataset. @@ -2281,10 +2281,10 @@ class BatchDataset(UnionBaseDataset): batch_size (Union[int, function]): The number of rows each batch is created with. An int or callable which takes exactly 1 parameter, BatchInfo. drop_remainder (bool, optional): Determines whether or not to drop the last - possibly incomplete batch (default=False). If True, and if there are less + possibly incomplete batch. Default: False. If True, and if there are less than batch_size rows available to make the last batch, then those rows will be dropped and not propagated to the child node. - num_parallel_workers (int, optional): Number of workers to process the dataset in parallel (default=None). + num_parallel_workers (int, optional): Number of workers to process the dataset in parallel. Default: None. per_batch_map (callable, optional): Per batch map callable. A callable which takes (list[Tensor], list[Tensor], ..., BatchInfo) as input parameters. Each list[Tensor] represents a batch of Tensors on a given column. The number of lists should match with number of entries in input_columns. The @@ -2294,10 +2294,10 @@ class BatchDataset(UnionBaseDataset): output_columns (Union[str, list[str]], optional): List of names assigned to the columns outputted by the last operation. This parameter is mandatory if len(input_columns) != len(output_columns). The size of this list must match the number of output - columns of the last operation. (default=None, output columns will have the same - name as the input columns, i.e., the columns will be replaced). + columns of the last operation. Default: None, output columns will have the same + name as the input columns, i.e., the columns will be replaced. max_rowsize(int, optional): Maximum size of row in MB that is used for shared memory allocation to copy - data between processes. This is only used if python_multiprocessing is set to True (default=16). + data between processes. This is only used if python_multiprocessing is set to True. Default: 16. """ @@ -2422,7 +2422,7 @@ class BlockReleasePair: Args: init_release_rows (int): Number of lines to allow through the pipeline. - callback (function): The callback function that will be called when release is called (default=None). + callback (function): The callback function that will be called when release is called. Default: None. """ def __init__(self, init_release_rows, callback=None): @@ -2494,10 +2494,10 @@ class PaddedBatchDataset(UnionBaseDataset): batch_size (Union[int, function]): The number of rows each batch is created with. An int or callable which takes exactly 1 parameter, BatchInfo. drop_remainder (bool, optional): Determines whether or not to drop the last - possibly incomplete batch (default=False). If True, and if there are less + possibly incomplete batch. Default: False. If True, and if there are less than batch_size rows available to make the last batch, then those rows will be dropped and not propagated to the child node. - num_parallel_workers (int, optional): Number of workers to process the dataset in parallel (default=None). + num_parallel_workers (int, optional): Number of workers to process the dataset in parallel. Default: None. pad_info (dict, optional): Whether to perform padding on selected columns. pad_info={"col1":([224,224],0)} will pad column with name "col1" to a tensor of size [224,224] and fill the missing with 0. """ @@ -2567,7 +2567,7 @@ class SyncWaitDataset(UnionBaseDataset): input_dataset (Dataset): Input dataset to apply flow control. num_batch (int): Number of batches without blocking at the start of each epoch. condition_name (str): Condition name that is used to toggle sending next row. - callback (function): Callback function that will be invoked when sync_update is called (default=None). + callback (function): Callback function that will be invoked when sync_update is called. Default: None. Raises: RuntimeError: If condition name already exists. @@ -3169,24 +3169,24 @@ class MapDataset(UnionBaseDataset): Args: input_dataset (Dataset): Input Dataset to be mapped. operations (Union[list[TensorOperation], list[functions]]): A function mapping a nested structure of tensors - to another nested structure of tensor (default=None). - input_columns (Union[str, list[str]]): List of names of the input columns - (default=None, the operations will be applied on the first columns in the dataset). + to another nested structure of tensor. Default: None. + input_columns (Union[str, list[str]]): List of names of the input columns. + Default: None, the operations will be applied on the first columns in the dataset. The size of the list should match the number of inputs of the first operation. output_columns (Union[str, list[str]], optional): List of names of the output columns. - The size of the list should match the number of outputs of the last operation - (default=None, output columns will be the input columns, i.e., the columns will - be replaced). + The size of the list should match the number of outputs of the last operation. + Default: None, output columns will be the input columns, i.e., the columns will + be replaced. num_parallel_workers (int, optional): Number of workers to process the dataset - in parallel (default=None). + in parallel. Default: None. python_multiprocessing (bool, optional): Parallelize Python operations with multiple worker process. This - option could be beneficial if the Python operation is computational heavy (default=False). + option could be beneficial if the Python operation is computational heavy. Default: False. cache (DatasetCache, optional): Use tensor caching service to speed up dataset processing. - (default=None, which means no cache is used). - callbacks (DSCallback, list[DSCallback], optional): List of Dataset callbacks to be called (Default=None) + Default: None, which means no cache is used. + callbacks (DSCallback, list[DSCallback], optional): List of Dataset callbacks to be called. Default: None. max_rowsize(int, optional): Maximum size of row in MB that is used for shared memory allocation to copy - data between processes. This is only used if python_multiprocessing is set to True (default=16). - offload (bool, optional): Flag to indicate whether offload is used (Default=None). + data between processes. This is only used if python_multiprocessing is set to True. Default: 16. + offload (bool, optional): Flag to indicate whether offload is used. Default: None. """ def __init__(self, input_dataset, operations=None, input_columns=None, output_columns=None, @@ -3372,10 +3372,10 @@ class FilterDataset(UnionBaseDataset): Args: input_dataset (Dataset): Input Dataset to be mapped. predicate (callable): Python callable which returns a boolean value. If False then filter the element. - input_columns (Union[str, list[str]], optional): List of names of the input columns - (default=None, the predicate will be applied to all columns in the dataset). + input_columns (Union[str, list[str]], optional): List of names of the input columns. + Default: None, the predicate will be applied to all columns in the dataset. num_parallel_workers (int, optional): Number of workers to process the dataset - in parallel (default=None). + in parallel. Default: None. """ def __init__(self, input_dataset, predicate, input_columns=None, num_parallel_workers=None): @@ -3393,7 +3393,7 @@ class RepeatDataset(UnionBaseDataset): Args: input_dataset (Dataset): Input Dataset to be repeated. - count (int): Number of times the dataset will be repeated (default=-1, repeat indefinitely). + count (int): Number of times the dataset will be repeated. Default: -1, repeat indefinitely. """ def __init__(self, input_dataset, count): @@ -3684,9 +3684,9 @@ class TransferDataset(Dataset): Args: input_dataset (Dataset): Input Dataset to be transferred. - send_epoch_end (bool, optional): Whether to send end of sequence to device or not (default=True). + send_epoch_end (bool, optional): Whether to send end of sequence to device or not. Default: True. create_data_info_queue (bool, optional): Whether to create queue which stores - types and shapes of data or not (default=False). + types and shapes of data or not. Default: False. Raises: TypeError: If device_type is empty. @@ -3779,7 +3779,7 @@ class Schema: Class to represent a schema of a dataset. Args: - schema_file(str): Path of the schema file (default=None). + schema_file(str): Path of the schema file. Default: None. Returns: Schema object, schema info about dataset. @@ -3808,8 +3808,8 @@ class Schema: Args: name (str): The new name of the column. de_type (str): Data type of the column. - shape (list[int], optional): Shape of the column - (default=None, [-1] which is an unknown shape of rank 1). + shape (list[int], optional): Shape of the column. + Default: None, [-1] which is an unknown shape of rank 1. Raises: ValueError: If column type is unknown. diff --git a/mindspore/python/mindspore/dataset/engine/datasets_audio.py b/mindspore/python/mindspore/dataset/engine/datasets_audio.py index 24c0d0ee349..dff5e697bba 100644 --- a/mindspore/python/mindspore/dataset/engine/datasets_audio.py +++ b/mindspore/python/mindspore/dataset/engine/datasets_audio.py @@ -45,23 +45,23 @@ class CMUArcticDataset(MappableDataset, AudioBaseDataset): Args: dataset_dir (str): Path to the root directory that contains the dataset. name (str, optional): Part of this dataset, can be 'aew', 'ahw', 'aup', 'awb', 'axb', 'bdl', - 'clb', 'eey', 'fem', 'gka', 'jmk', 'ksp', 'ljm', 'lnh', 'rms', 'rxr', 'slp' or 'slt' - (default=None, equal 'aew'). - num_samples (int, optional): The number of audio to be included in the dataset - (default=None, will read all audio). - num_parallel_workers (int, optional): Number of workers to read the data - (default=None, will use value set in the config). - shuffle (bool, optional): Whether or not to perform shuffle on the dataset - (default=None, expected order behavior shown in the table). + 'clb', 'eey', 'fem', 'gka', 'jmk', 'ksp', 'ljm', 'lnh', 'rms', 'rxr', 'slp' or 'slt'. + Default: None, equal 'aew'. + num_samples (int, optional): The number of audio to be included in the dataset. + Default: None, will read all audio. + num_parallel_workers (int, optional): Number of workers to read the data. + Default: None, will use value set in the config. + shuffle (bool, optional): Whether or not to perform shuffle on the dataset. + Default: None, expected order behavior shown in the table. sampler (Sampler, optional): Object used to choose samples from the - dataset (default=None, expected order behavior shown in the table). - num_shards (int, optional): Number of shards that the dataset will be divided into (default=None). + dataset. Default: None, expected order behavior shown in the table. + num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None. When this argument is specified, `num_samples` reflects the max sample number of per shard. - shard_id (int, optional): The shard ID within `num_shards` (default=None). This + shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only be specified when `num_shards` is also specified. cache (DatasetCache, optional): Use tensor caching service to speed up dataset processing. More details: - `Single-Node Data Cache `_ - (default=None, which means no cache is used). + `Single-Node Data Cache `_ . + Default: None, which means no cache is used. Raises: RuntimeError: If source raises an exception during execution. @@ -181,23 +181,23 @@ class GTZANDataset(MappableDataset, AudioBaseDataset): Args: dataset_dir (str): Path to the root directory that contains the dataset. - usage (str, optional): Usage of this dataset, can be 'train', 'valid', 'test' or 'all' - (default=None, all samples). - num_samples (int, optional): The number of audio to be included in the dataset - (default=None, will read all audio). - num_parallel_workers (int, optional): Number of workers to read the data - (default=None, will use value set in the config). - shuffle (bool, optional): Whether or not to perform shuffle on the dataset - (default=None, expected order behavior shown in the table). + usage (str, optional): Usage of this dataset, can be 'train', 'valid', 'test' or 'all'. + Default: None, all samples. + num_samples (int, optional): The number of audio to be included in the dataset. + Default: None, will read all audio. + num_parallel_workers (int, optional): Number of workers to read the data. + Default: None, will use value set in the config. + shuffle (bool, optional): Whether or not to perform shuffle on the dataset. + Default: None, expected order behavior shown in the table. sampler (Sampler, optional): Object used to choose samples from the - dataset (default=None, expected order behavior shown in the table). - num_shards (int, optional): Number of shards that the dataset will be divided into (default=None). + dataset. Default: None, expected order behavior shown in the table. + num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None. When this argument is specified, `num_samples` reflects the max sample number of per shard. - shard_id (int, optional): The shard ID within `num_shards` (default=None). This + shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only be specified when `num_shards` is also specified. cache (DatasetCache, optional): Use tensor caching service to speed up dataset processing. More details: - `Single-Node Data Cache `_ - (default=None, which means no cache is used). + `Single-Node Data Cache `_ . + Default: None, which means no cache is used. Raises: RuntimeError: If source raises an exception during execution. @@ -318,22 +318,22 @@ class LibriTTSDataset(MappableDataset, AudioBaseDataset): Args: dataset_dir (str): Path to the root directory that contains the dataset. usage (str, optional): Part of this dataset, can be 'dev-clean', 'dev-other', 'test-clean', 'test-other', - 'train-clean-100', 'train-clean-360', 'train-other-500', or 'all' (default=None, equal 'all'). - num_samples (int, optional): The number of images to be included in the dataset - (default=None, will read all audio). - num_parallel_workers (int, optional): Number of workers to read the data - (default=None, will use value set in the config). - shuffle (bool, optional): Whether or not to perform shuffle on the dataset - (default=None, expected order behavior shown in the table). + 'train-clean-100', 'train-clean-360', 'train-other-500', or 'all'. Default: None, equal 'all'. + num_samples (int, optional): The number of images to be included in the dataset. + Default: None, will read all audio. + num_parallel_workers (int, optional): Number of workers to read the data. + Default: None, will use value set in the config. + shuffle (bool, optional): Whether or not to perform shuffle on the dataset. + Default: None, expected order behavior shown in the table. sampler (Sampler, optional): Object used to choose samples from the - dataset (default=None, expected order behavior shown in the table). - num_shards (int, optional): Number of shards that the dataset will be divided into (default=None). + dataset. Default: None, expected order behavior shown in the table. + num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None. When this argument is specified, `num_samples` reflects the max sample number of per shard. - shard_id (int, optional): The shard ID within `num_shards` (default=None). This + shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only be specified when `num_shards` is also specified. cache (DatasetCache, optional): Use tensor caching service to speed up dataset processing. More details: - `Single-Node Data Cache `_ - (default=None, which means no cache is used). + `Single-Node Data Cache `_ . + Default: None, which means no cache is used. Raises: RuntimeError: If source raises an exception during execution. @@ -604,22 +604,22 @@ class SpeechCommandsDataset(MappableDataset, AudioBaseDataset): dataset_dir (str): Path to the root directory that contains the dataset. usage (str, optional): Usage of this dataset, can be 'train', 'test', 'valid' or 'all'. 'train' will read from 84,843 samples, 'test' will read from 11,005 samples, 'valid' will read from 9,981 - test samples and 'all' will read from all 105,829 samples (default=None, will read all samples). - num_samples (int, optional): The number of samples to be included in the dataset - (default=None, will read all samples). - num_parallel_workers (int, optional): Number of workers to read the data - (default=None, will use value set in the config). - shuffle (bool, optional): Whether or not to perform shuffle on the dataset - (default=None, expected order behavior shown in the table). - sampler (Sampler, optional): Object used to choose samples from the dataset - (default=None, expected order behavior shown in the table). - num_shards (int, optional): Number of shards that the dataset will be divided into (default=None). + test samples and 'all' will read from all 105,829 samples. Default: None, will read all samples. + num_samples (int, optional): The number of samples to be included in the dataset. + Default: None, will read all samples. + num_parallel_workers (int, optional): Number of workers to read the data. + Default: None, will use value set in the config. + shuffle (bool, optional): Whether or not to perform shuffle on the dataset. + Default: None, expected order behavior shown in the table. + sampler (Sampler, optional): Object used to choose samples from the dataset. + Default: None, expected order behavior shown in the table. + num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None. When this argument is specified, `num_samples` reflects the maximum sample number of per shard. - shard_id (int, optional): The shard ID within `num_shards` (default=None). This argument can only be specified + shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only be specified when `num_shards` is also specified. cache (DatasetCache, optional): Use tensor caching service to speed up dataset processing. More details: - `Single-Node Data Cache `_ - (default=None, which means no cache is used). + `Single-Node Data Cache `_ . + Default: None, which means no cache is used. Raises: RuntimeError: If `dataset_dir` does not contain data files. @@ -735,25 +735,25 @@ class TedliumDataset(MappableDataset, AudioBaseDataset): 'test' will read from test samples, 'dev' will read from dev samples, 'all' will read from all samples. - For release3, can only be 'all', it will read from data samples (default=None, all samples). + For release3, can only be 'all', it will read from data samples. Default: None, all samples. extensions (str, optional): Extensions of the SPH files, only '.sph' is valid. - (default=None, ".sph"). - num_samples (int, optional): The number of audio samples to be included in the dataset - (default=None, all samples). - num_parallel_workers (int, optional): Number of workers to read the data - (default=None, number set in the config). - shuffle (bool, optional): Whether to perform shuffle on the dataset (default=None, expected - order behavior shown in the table). + Default: None, ".sph". + num_samples (int, optional): The number of audio samples to be included in the dataset. + Default: None, all samples. + num_parallel_workers (int, optional): Number of workers to read the data. + Default: None, number set in the config. + shuffle (bool, optional): Whether to perform shuffle on the dataset. Default: None, expected + order behavior shown in the table. sampler (Sampler, optional): Object used to choose samples from the - dataset (default=None, expected order behavior shown in the table). + dataset. Default: None, expected order behavior shown in the table. num_shards (int, optional): Number of shards that the dataset will be divided - into (default=None). When this argument is specified, `num_samples` reflects + into. Default: None. When this argument is specified, `num_samples` reflects the maximum sample number of per shard. - shard_id (int, optional): The shard ID within `num_shards` (default=None). This + shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only be specified when `num_shards` is also specified. cache (DatasetCache, optional): Use tensor caching service to speed up dataset processing. More details: - `Single-Node Data Cache `_ - (default=None, which means no cache is used). + `Single-Node Data Cache `_ . + Default: None, which means no cache is used. Raises: RuntimeError: If `dataset_dir` does not contain stm files. @@ -937,21 +937,21 @@ class YesNoDataset(MappableDataset, AudioBaseDataset): Args: dataset_dir (str): Path to the root directory that contains the dataset. - num_samples (int, optional): The number of images to be included in the dataset - (default=None, will read all images). - num_parallel_workers (int, optional): Number of workers to read the data - (default=None, will use value set in the config). - shuffle (bool, optional): Whether or not to perform shuffle on the dataset - (default=None, expected order behavior shown in the table). + num_samples (int, optional): The number of images to be included in the dataset. + Default: None, will read all images. + num_parallel_workers (int, optional): Number of workers to read the data. + Default: None, will use value set in the config. + shuffle (bool, optional): Whether or not to perform shuffle on the dataset. + Default: None, expected order behavior shown in the table. sampler (Sampler, optional): Object used to choose samples from the - dataset (default=None, expected order behavior shown in the table). - num_shards (int, optional): Number of shards that the dataset will be divided into (default=None). + dataset. Default: None, expected order behavior shown in the table. + num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None. When this argument is specified, `num_samples` reflects the maximum sample number of per shard. - shard_id (int, optional): The shard ID within `num_shards` (default=None). This argument can only + shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only be specified when `num_shards` is also specified. cache (DatasetCache, optional): Use tensor caching service to speed up dataset processing. More details: - `Single-Node Data Cache `_ - (default=None, which means no cache is used). + `Single-Node Data Cache `_ . + Default: None, which means no cache is used. Raises: RuntimeError: If `dataset_dir` does not contain data files. diff --git a/mindspore/python/mindspore/dataset/engine/datasets_standard_format.py b/mindspore/python/mindspore/dataset/engine/datasets_standard_format.py index a5bae752bbc..316912bb346 100644 --- a/mindspore/python/mindspore/dataset/engine/datasets_standard_format.py +++ b/mindspore/python/mindspore/dataset/engine/datasets_standard_format.py @@ -48,18 +48,18 @@ class CSVDataset(SourceDataset, UnionBaseDataset): Args: dataset_files (Union[str, list[str]]): String or list of files to be read or glob strings to search for a pattern of files. The list will be sorted in a lexicographical order. - field_delim (str, optional): A string that indicates the char delimiter to separate fields (default=','). - column_defaults (list, optional): List of default values for the CSV field (default=None). Each item + field_delim (str, optional): A string that indicates the char delimiter to separate fields. Default: ','. + column_defaults (list, optional): List of default values for the CSV field. Default: None. Each item in the list is either a valid type (float, int, or string). If this is not provided, treats all columns as string type. - column_names (list[str], optional): List of column names of the dataset (default=None). If this + column_names (list[str], optional): List of column names of the dataset. Default: None. If this is not provided, infers the column_names from the first row of CSV file. - num_samples (int, optional): The number of samples to be included in the dataset - (default=None, will include all images). - num_parallel_workers (int, optional): Number of workers to read the data - (default=None, number set in the config). - shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch - (default=Shuffle.GLOBAL). Bool type and Shuffle enum are both supported to pass in. + num_samples (int, optional): The number of samples to be included in the dataset. + Default: None, will include all images. + num_parallel_workers (int, optional): Number of workers to read the data. + Default: None, number set in the config. + shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch. + Default: Shuffle.GLOBAL. Bool type and Shuffle enum are both supported to pass in. If shuffle is False, no shuffling will be performed. If shuffle is True, performs global shuffle. There are three levels of shuffling, desired shuffle enum defined by mindspore.dataset.Shuffle. @@ -68,13 +68,13 @@ class CSVDataset(SourceDataset, UnionBaseDataset): - Shuffle.FILES: Shuffle files only. - num_shards (int, optional): Number of shards that the dataset will be divided into (default=None). + num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None. When this argument is specified, `num_samples` reflects the maximum sample number of per shard. - shard_id (int, optional): The shard ID within `num_shards` (default=None). This + shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only be specified when `num_shards` is also specified. cache (DatasetCache, optional): Use tensor caching service to speed up dataset processing. More details: - `Single-Node Data Cache `_ - (default=None, which means no cache is used). + `Single-Node Data Cache `_ . + Default: None, which means no cache is used. Raises: RuntimeError: If dataset_files are not valid or do not exist. @@ -116,10 +116,10 @@ class MindDataset(MappableDataset, UnionBaseDataset): a file name of one component of a mindrecord source, other files with identical source in the same path will be found and loaded automatically. If dataset_file is a list, it represents for a list of dataset files to be read directly. - columns_list (list[str], optional): List of columns to be read (default=None). - num_parallel_workers (int, optional): The number of readers (default=None). - shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch - (default=None, performs global shuffle). Bool type and Shuffle enum are both supported to pass in. + columns_list (list[str], optional): List of columns to be read. Default: None. + num_parallel_workers (int, optional): The number of readers. Default: None. + shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch. + Default: None, performs global shuffle. Bool type and Shuffle enum are both supported to pass in. If shuffle is False, no shuffling will be performed. If shuffle is True, performs global shuffle. There are three levels of shuffling, desired shuffle enum defined by mindspore.dataset.Shuffle. @@ -130,23 +130,23 @@ class MindDataset(MappableDataset, UnionBaseDataset): - Shuffle.INFILE: Keep the file sequence the same but shuffle the data within each file. - num_shards (int, optional): Number of shards that the dataset will be divided into (default=None). + num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None. When this argument is specified, 'num_samples' reflects the maximum sample number of per shard. - shard_id (int, optional): The shard ID within `num_shards` (default=None). This + shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only be specified when `num_shards` is also specified. sampler (Sampler, optional): Object used to choose samples from the - dataset (default=None, sampler is exclusive - with shuffle and block_reader). Support list: SubsetRandomSampler, + dataset. Default: None, sampler is exclusive + with shuffle and block_reader. Support list: SubsetRandomSampler, PkSampler, RandomSampler, SequentialSampler, DistributedSampler. padded_sample (dict, optional): Samples will be appended to dataset, where keys are the same as column_list. num_padded (int, optional): Number of padding samples. Dataset size plus num_padded should be divisible by num_shards. - num_samples (int, optional): The number of samples to be included in the dataset - (default=None, all samples). + num_samples (int, optional): The number of samples to be included in the dataset. + Default: None, all samples. cache (DatasetCache, optional): Use tensor caching service to speed up dataset processing. More details: - `Single-Node Data Cache `_ - (default=None, which means no cache is used). + `Single-Node Data Cache `_ . + Default: None, which means no cache is used. Raises: ValueError: If dataset_files are not valid or do not exist. @@ -248,17 +248,17 @@ class TFRecordDataset(SourceDataset, UnionBaseDataset): Args: dataset_files (Union[str, list[str]]): String or list of files to be read or glob strings to search for a pattern of files. The list will be sorted in a lexicographical order. - schema (Union[str, Schema], optional): Path to the JSON schema file or schema object (default=None). + schema (Union[str, Schema], optional): Path to the JSON schema file or schema object. Default: None. If the schema is not provided, the meta data from the TFData file is considered the schema. - columns_list (list[str], optional): List of columns to be read (default=None, read all columns). - num_samples (int, optional): The number of samples (rows) to be included in the dataset (default=None). + columns_list (list[str], optional): List of columns to be read. Default: None, read all columns. + num_samples (int, optional): The number of samples (rows) to be included in the dataset. Default: None. If num_samples is None and numRows(parsed from schema) does not exist, read the full dataset; If num_samples is None and numRows(parsed from schema) is greater than 0, read numRows rows; If both num_samples and numRows(parsed from schema) are greater than 0, read num_samples rows. - num_parallel_workers (int, optional): Number of workers to read the data - (default=None, number set in the config). - shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch - (default=Shuffle.GLOBAL). Bool type and Shuffle enum are both supported to pass in. + num_parallel_workers (int, optional): Number of workers to read the data. + Default: None, number set in the config. + shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch. + Default: Shuffle.GLOBAL. Bool type and Shuffle enum are both supported to pass in. If shuffle is False, no shuffling will be performed. If shuffle is True, performs global shuffle. There are three levels of shuffling, desired shuffle enum defined by mindspore.dataset.Shuffle. @@ -268,17 +268,17 @@ class TFRecordDataset(SourceDataset, UnionBaseDataset): - Shuffle.FILES: Shuffle files only. num_shards (int, optional): Number of shards that the dataset will be divided - into (default=None). When this argument is specified, `num_samples` reflects + into. Default: None. When this argument is specified, `num_samples` reflects the maximum sample number of per shard. - shard_id (int, optional): The shard ID within `num_shards` (default=None). This + shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only be specified when `num_shards` is also specified. - shard_equal_rows (bool, optional): Get equal rows for all shards(default=False). If shard_equal_rows + shard_equal_rows (bool, optional): Get equal rows for all shards. Default: False. If shard_equal_rows is false, number of rows of each shard may be not equal, and may lead to a failure in distributed training. When the number of samples of per TFRecord file are not equal, it is suggested to set to true. This argument should only be specified when `num_shards` is also specified. cache (DatasetCache, optional): Use tensor caching service to speed up dataset processing. More details: - `Single-Node Data Cache `_ - (default=None, which means no cache is used). + `Single-Node Data Cache `_ . + Default: None, which means no cache is used. Raises: ValueError: If dataset_files are not valid or do not exist. @@ -346,9 +346,9 @@ class OBSMindDataset(GeneratorDataset): sk (str): Secret key ID of cloud storage. sync_obs_path (str): Remote dir path used for synchronization, users need to create it on cloud storage in advance. Path is in the format of s3://bucketName/objectKey. - columns_list (list[str], optional): List of columns to be read (default=None, read all columns). - shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch - (default=None, performs global shuffle). Bool type and Shuffle enum are both supported to pass in. + columns_list (list[str], optional): List of columns to be read. Default: None, read all columns. + shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch. + Default: None, performs global shuffle. Bool type and Shuffle enum are both supported to pass in. If shuffle is False, no shuffling will be performed. If shuffle is True, performs global shuffle. There are three levels of shuffling, desired shuffle enum defined by mindspore.dataset.Shuffle. @@ -360,10 +360,10 @@ class OBSMindDataset(GeneratorDataset): - Shuffle.INFILE: Keep the file sequence the same but shuffle the data within each file. num_shards (int, optional): Number of shards that the dataset will be divided - into (default=None). - shard_id (int, optional): The shard ID within num_shards (default=None). This + into. Default: None. + shard_id (int, optional): The shard ID within num_shards. Default: None. This argument can only be specified when num_shards is also specified. - shard_equal_rows (bool, optional): Get equal rows for all shards(default=True). If shard_equal_rows + shard_equal_rows (bool, optional): Get equal rows for all shards. Default: True. If shard_equal_rows is false, number of rows of each shard may be not equal, and may lead to a failure in distributed training. When the number of samples of per MindRecord file are not equal, it is suggested to set to true. This argument should only be specified when num_shards is also specified. diff --git a/mindspore/python/mindspore/dataset/engine/datasets_text.py b/mindspore/python/mindspore/dataset/engine/datasets_text.py index 160288f49d6..032e81f123a 100644 --- a/mindspore/python/mindspore/dataset/engine/datasets_text.py +++ b/mindspore/python/mindspore/dataset/engine/datasets_text.py @@ -231,14 +231,14 @@ class CLUEDataset(SourceDataset, TextBaseDataset): dataset_files (Union[str, list[str]]): String or list of files to be read or glob strings to search for a pattern of files. The list will be sorted in a lexicographical order. task (str, optional): The kind of task, one of 'AFQMC', 'TNEWS', 'IFLYTEK', 'CMNLI', 'WSC' and 'CSL'. - (default=AFQMC). - usage (str, optional): Specify the 'train', 'test' or 'eval' part of dataset (default='train'). - num_samples (int, optional): The number of samples to be included in the dataset - (default=None, will include all images). - num_parallel_workers (int, optional): Number of workers to read the data - (default=None, number set in the config). - shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch - (default=Shuffle.GLOBAL). Bool type and Shuffle enum are both supported to pass in. + Default: 'AFQMC'. + usage (str, optional): Specify the 'train', 'test' or 'eval' part of dataset. Default: 'train'. + num_samples (int, optional): The number of samples to be included in the dataset. + Default: None, will include all images. + num_parallel_workers (int, optional): Number of workers to read the data. + Default: None, number set in the config. + shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch. + Default: Shuffle.GLOBAL. Bool type and Shuffle enum are both supported to pass in. If shuffle is False, no shuffling will be performed. If shuffle is True, performs global shuffle. There are three levels of shuffling, desired shuffle enum defined by mindspore.dataset.Shuffle. @@ -247,13 +247,13 @@ class CLUEDataset(SourceDataset, TextBaseDataset): - Shuffle.FILES: Shuffle files only. - num_shards (int, optional): Number of shards that the dataset will be divided into (default=None). + num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None. When this argument is specified, `num_samples` reflects the maximum sample number of per shard. - shard_id (int, optional): The shard ID within `num_shards` (default=None). This + shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only be specified when `num_shards` is also specified. cache (DatasetCache, optional): Use tensor caching service to speed up dataset processing. More details: - `Single-Node Data Cache `_ - (default=None, which means no cache is used). + `Single-Node Data Cache `_ . + Default: None, which means no cache is used. The generated dataset with different task setting has different output columns: @@ -473,7 +473,7 @@ class CoNLL2000Dataset(SourceDataset, TextBaseDataset): 'all' will read from all 1,0948 samples. Default: None, read all samples. num_samples (int, optional): Number of samples (rows) to be read. Default: None, read the full dataset. shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch. - Default: mindspore.dataset.Shuffle.GLOBAL. + Default: `mindspore.dataset.Shuffle.GLOBAL`. If shuffle is False, no shuffling will be performed. If shuffle is True, performs global shuffle. There are three levels of shuffling, desired shuffle enum defined by mindspore.dataset.Shuffle. @@ -488,7 +488,7 @@ class CoNLL2000Dataset(SourceDataset, TextBaseDataset): num_parallel_workers (int, optional): Number of workers to read the data. Default: None, number set in the config. cache (DatasetCache, optional): Use tensor caching service to speed up dataset processing. More details: - `Single-Node Data Cache `_. + `Single-Node Data Cache `_ . Default: None, which means no cache is used. Raises: @@ -908,7 +908,7 @@ class IWSLT2016Dataset(SourceDataset, TextBaseDataset): num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None. When this argument is specified, `num_samples` reflects the max sample number of per shard. - shard_id (int, optional): The shard ID within `num_shards` .Default: None. This + shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only be specified when `num_shards` is also specified. num_parallel_workers (int, optional): Number of workers to read the data. Default: None, number set in the mindspore.dataset.config. @@ -1110,15 +1110,15 @@ class Multi30kDataset(SourceDataset, TextBaseDataset): Args: dataset_dir (str): Path to the root directory that contains the dataset. - usage (str, optional): Acceptable usages include 'train', 'test, 'valid' or 'all' (default='all'). - language_pair (str, optional): Acceptable language_pair include ['en', 'de'], ['de', 'en'] - (default=['en', 'de']). - num_samples (int, optional): The number of images to be included in the dataset - (default=None, all samples). - num_parallel_workers (int, optional): Number of workers to read the data - (default=None, number set in the config). - shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch - (default=Shuffle.GLOBAL). Bool type and Shuffle enum are both supported to pass in. + usage (str, optional): Acceptable usages include 'train', 'test, 'valid' or 'all'. Default: 'all'. + language_pair (str, optional): Acceptable language_pair include ['en', 'de'], ['de', 'en']. + Default: ['en', 'de']. + num_samples (int, optional): The number of images to be included in the dataset. + Default: None, all samples. + num_parallel_workers (int, optional): Number of workers to read the data. + Default: None, number set in the config. + shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch. + Default: `Shuffle.GLOBAL`. Bool type and Shuffle enum are both supported to pass in. If shuffle is False, no shuffling will be performed; If shuffle is True, the behavior is the same as setting shuffle to be Shuffle.GLOBAL Otherwise, there are two levels of shuffling: @@ -1128,13 +1128,13 @@ class Multi30kDataset(SourceDataset, TextBaseDataset): - Shuffle.FILES: Shuffle files only. num_shards (int, optional): Number of shards that the dataset will be divided - into (default=None). When this argument is specified, `num_samples` reflects + into. Default: None. When this argument is specified, `num_samples` reflects the max sample number of per shard. - shard_id (int, optional): The shard ID within `num_shards` (default=None). This + shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only be specified when `num_shards` is also specified. cache (DatasetCache, optional): Use tensor caching service to speed up dataset processing. More details: - `Single-Node Data Cache `_ - (default=None, which means no cache is used). + `Single-Node Data Cache `_ . + Default: None, which means no cache is used. Raises: RuntimeError: If `dataset_dir` does not contain data files. @@ -1309,10 +1309,10 @@ class SogouNewsDataset(SourceDataset, TextBaseDataset): dataset_dir (str): Path to the root directory that contains the dataset. usage (str, optional): Usage of this dataset, can be 'train', 'test' or 'all' . 'train' will read from 450,000 train samples, 'test' will read from 60,000 test samples, - 'all' will read from all 510,000 samples (default=None, all samples). - num_samples (int, optional): Number of samples (rows) to read (default=None, read all samples). - shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch - (default=Shuffle.GLOBAL). Bool type and Shuffle enum are both supported to pass in. + 'all' will read from all 510,000 samples. Default: None, all samples. + num_samples (int, optional): Number of samples (rows) to read. Default: None, read all samples. + shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch. + Default: `Shuffle.GLOBAL`. Bool type and Shuffle enum are both supported to pass in. If shuffle is False, no shuffling will be performed. If shuffle is True, performs global shuffle. There are three levels of shuffling, desired shuffle enum defined by mindspore.dataset.Shuffle. @@ -1320,15 +1320,15 @@ class SogouNewsDataset(SourceDataset, TextBaseDataset): - Shuffle.GLOBAL: Shuffle both the files and samples, same as setting shuffle to True. - Shuffle.FILES: Shuffle files only. - num_shards (int, optional): Number of shards that the dataset will be divided into (default=None). + num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None. When this argument is specified, `num_samples` reflects the max sample number of per shard. - shard_id (int, optional): The shard ID within `num_shards` (default=None). This + shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only be specified when `num_shards` is also specified. - num_parallel_workers (int, optional): Number of workers to read the data - (default=None, number set in the config). + num_parallel_workers (int, optional): Number of workers to read the data. + Default: None, number set in the config. cache (DatasetCache, optional): Use tensor caching service to speed up dataset processing. More details: - `Single-Node Data Cache `_ - (default=None, which means no cache is used). + `Single-Node Data Cache `_ . + Default: None, which means no cache is used. Raises: RuntimeError: If `dataset_dir` does not contain data files. @@ -1398,13 +1398,13 @@ class SQuADDataset(SourceDataset, TextBaseDataset): Args: dataset_dir (str): Path to the root directory that contains the dataset. - usage (str, optional): Specify the `train`, `dev` or `all` part of dataset (default=None, all samples). - num_samples (int, optional): The number of samples to be included in the dataset - (default=None, will include all samples). - num_parallel_workers (int, optional): Number of workers to read the data - (default=None, number set in the config). - shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch - (default=Shuffle.GLOBAL). Bool type and Shuffle enum are both supported to pass in. + usage (str, optional): Specify the 'train', 'dev' or 'all' part of dataset. Default: None, all samples. + num_samples (int, optional): The number of samples to be included in the dataset. + Default: None, will include all samples. + num_parallel_workers (int, optional): Number of workers to read the data. + Default: None, number set in the config. + shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch. + Default: `Shuffle.GLOBAL`. Bool type and Shuffle enum are both supported to pass in. If shuffle is False, no shuffling will be performed; If shuffle is True, the behavior is the same as setting shuffle to be Shuffle.GLOBAL Otherwise, there are two levels of shuffling: @@ -1413,13 +1413,13 @@ class SQuADDataset(SourceDataset, TextBaseDataset): - Shuffle.FILES: Shuffle files only. - num_shards (int, optional): Number of shards that the dataset will be divided into (default=None). + num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None. When this argument is specified, `num_samples` reflects the maximum sample number of per shard. - shard_id (int, optional): The shard ID within `num_shards` (default=None). This + shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only be specified when `num_shards` is also specified. cache (DatasetCache, optional): Use tensor caching service to speed up dataset processing. More details: - `Single-Node Data Cache `_ - (default=None, which means no cache is used). + `Single-Node Data Cache `_ . + Default: None, which means no cache is used. Raises: RuntimeError: If `dataset_dir` does not contain data files. @@ -1504,12 +1504,12 @@ class TextFileDataset(SourceDataset, TextBaseDataset): Args: dataset_files (Union[str, list[str]]): String or list of files to be read or glob strings to search for a pattern of files. The list will be sorted in a lexicographical order. - num_samples (int, optional): The number of samples to be included in the dataset - (default=None, will include all images). - num_parallel_workers (int, optional): Number of workers to read the data - (default=None, number set in the config). - shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch - (default=Shuffle.GLOBAL). Bool type and Shuffle enum are both supported to pass in. + num_samples (int, optional): The number of samples to be included in the dataset. + Default: None, will include all images. + num_parallel_workers (int, optional): Number of workers to read the data. + Default: None, number set in the config. + shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch. + Default: `Shuffle.GLOBAL`. Bool type and Shuffle enum are both supported to pass in. If shuffle is False, no shuffling will be performed. If shuffle is True, performs global shuffle. There are three levels of shuffling, desired shuffle enum defined by mindspore.dataset.Shuffle. @@ -1518,13 +1518,13 @@ class TextFileDataset(SourceDataset, TextBaseDataset): - Shuffle.FILES: Shuffle files only. - num_shards (int, optional): Number of shards that the dataset will be divided into (default=None). + num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None. When this argument is specified, `num_samples` reflects the maximum sample number of per shard. - shard_id (int, optional): The shard ID within `num_shards` (default=None). This + shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only be specified when `num_shards` is also specified. cache (DatasetCache, optional): Use tensor caching service to speed up dataset processing. More details: - `Single-Node Data Cache `_ - (default=None, which means no cache is used). + `Single-Node Data Cache `_ . + Default: None, which means no cache is used. Raises: ValueError: If dataset_files are not valid or do not exist. @@ -1562,10 +1562,10 @@ class UDPOSDataset(SourceDataset, TextBaseDataset): dataset_dir (str): Path to the root directory that contains the dataset. usage (str, optional): Usage of this dataset, can be 'train', 'test', 'valid' or 'all'. 'train' will read from 12,543 train samples, 'test' will read from 2,077 test samples, 'valid' will read from 2,002 test samples, - 'all' will read from all 16,622 samples (default=None, all samples). - num_samples (int, optional): Number of samples (rows) to read (default=None, reads the full dataset). - shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch - (default=Shuffle.GLOBAL). Bool type and Shuffle enum are both supported to pass in. + 'all' will read from all 16,622 samples. Default: None, all samples. + num_samples (int, optional): Number of samples (rows) to read. Default: None, reads the full dataset. + shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch. + Default: `Shuffle.GLOBAL`. Bool type and Shuffle enum are both supported to pass in. If shuffle is False, no shuffling will be performed; If shuffle is True, the behavior is the same as setting shuffle to be Shuffle.GLOBAL Otherwise, there are two levels of shuffling: @@ -1574,15 +1574,15 @@ class UDPOSDataset(SourceDataset, TextBaseDataset): - Shuffle.FILES: Shuffle files only. - num_shards (int, optional): Number of shards that the dataset will be divided into (default=None). + num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None. When this argument is specified, `num_samples` reflects the max sample number of per shard. - shard_id (int, optional): The shard ID within `num_shards` (default=None). This + shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only be specified when `num_shards` is also specified. - num_parallel_workers (int, optional): Number of workers to read the data - (default=None, number set in the config). + num_parallel_workers (int, optional): Number of workers to read the data. + Default: None, number set in the config. cache (DatasetCache, optional): Use tensor caching service to speed up dataset processing. More details: - `Single-Node Data Cache `_ - (default=None, which means no cache is used). + `Single-Node Data Cache `_ . + Default: None, which means no cache is used. Raises: RuntimeError: If `dataset_dir` does not contain data files. @@ -1617,12 +1617,12 @@ class WikiTextDataset(SourceDataset, TextBaseDataset): Args: dataset_dir (str): Path to the root directory that contains the dataset. - usage (str, optional): Acceptable usages include 'train', 'test', 'valid' and 'all' (default=None, all samples). - num_samples (int, optional): Number of samples (rows) to read (default=None, reads the full dataset). - num_parallel_workers (int, optional): Number of workers to read the data - (default=None, number set in the config). - shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch - (default=Shuffle.GLOBAL). Bool type and Shuffle enum are both supported to pass in. + usage (str, optional): Acceptable usages include 'train', 'test', 'valid' and 'all'. Default: None, all samples. + num_samples (int, optional): Number of samples (rows) to read. Default: None, reads the full dataset. + num_parallel_workers (int, optional): Number of workers to read the data. + Default: None, number set in the config. + shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch. + Default: `Shuffle.GLOBAL`. Bool type and Shuffle enum are both supported to pass in. If shuffle is False, no shuffling will be performed; If shuffle is True, the behavior is the same as setting shuffle to be Shuffle.GLOBAL Otherwise, there are two levels of shuffling: @@ -1631,13 +1631,13 @@ class WikiTextDataset(SourceDataset, TextBaseDataset): - Shuffle.FILES: Shuffle files only. - num_shards (int, optional): Number of shards that the dataset will be divided into (default=None). + num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None. When this argument is specified, 'num_samples' reflects the max sample number of per shard. - shard_id (int, optional): The shard ID within `num_shards` (default=None). This + shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only be specified when `num_shards` is also specified. cache (DatasetCache, optional): Use tensor caching service to speed up dataset processing. More details: - `Single-Node Data Cache `_ - (default=None, which means no cache is used). + `Single-Node Data Cache `_ . + Default: None, which means no cache is used. Raises: RuntimeError: If `dataset_dir` does not contain data files or invalid. @@ -1705,13 +1705,13 @@ class YahooAnswersDataset(SourceDataset, TextBaseDataset): dataset_dir (str): Path to the root directory that contains the dataset. usage (str, optional): Usage of this dataset, can be 'train', 'test' or 'all'. 'train' will read from 1,400,000 train samples, 'test' will read from 60,000 test samples, 'all' will read from - all 1,460,000 samples (default=None, all samples). - num_samples (int, optional): The number of samples to be included in the dataset - (default=None, will include all text). - num_parallel_workers (int, optional): Number of workers to read the data - (default=None, number set in the config). - shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch - (default=Shuffle.GLOBAL). Bool type and Shuffle enum are both supported to pass in. + all 1,460,000 samples. Default: None, all samples. + num_samples (int, optional): The number of samples to be included in the dataset. + Default: None, will include all text. + num_parallel_workers (int, optional): Number of workers to read the data. + Default: None, number set in the config. + shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch. + Default: `Shuffle.GLOBAL`. Bool type and Shuffle enum are both supported to pass in. If shuffle is False, no shuffling will be performed; If shuffle is True, the behavior is the same as setting shuffle to be Shuffle.GLOBAL Otherwise, there are two levels of shuffling: @@ -1720,13 +1720,13 @@ class YahooAnswersDataset(SourceDataset, TextBaseDataset): - Shuffle.FILES: Shuffle files only. - num_shards (int, optional): Number of shards that the dataset will be divided into (default=None). + num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None. When this argument is specified, `num_samples` reflects the maximum sample number of per shard. - shard_id (int, optional): The shard ID within `num_shards` (default=None). This + shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only be specified when `num_shards` is also specified. cache (DatasetCache, optional): Use tensor caching service to speed up dataset processing. More details: - `Single-Node Data Cache `_ - (default=None, which means no cache is used). + `Single-Node Data Cache `_ . + Default: None, which means no cache is used. Raises: RuntimeError: If `dataset_dir` does not contain data files. @@ -1801,10 +1801,10 @@ class YelpReviewDataset(SourceDataset, TextBaseDataset): For Polarity, 'train' will read from 560,000 train samples, 'test' will read from 38,000 test samples, 'all' will read from all 598,000 samples. For Full, 'train' will read from 650,000 train samples, 'test' will read from 50,000 test samples, - 'all' will read from all 700,000 samples (default=None, all samples). - num_samples (int, optional): Number of samples (rows) to read (default=None, reads all samples). - shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch - (default=Shuffle.GLOBAL). Bool type and Shuffle enum are both supported to pass in. + 'all' will read from all 700,000 samples. Default: None, all samples. + num_samples (int, optional): Number of samples (rows) to read. Default: None, reads all samples. + shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch. + Default: `Shuffle.GLOBAL`. Bool type and Shuffle enum are both supported to pass in. If shuffle is False, no shuffling will be performed; If shuffle is True, the behavior is the same as setting shuffle to be Shuffle.GLOBAL Otherwise, there are two levels of shuffling: @@ -1812,15 +1812,15 @@ class YelpReviewDataset(SourceDataset, TextBaseDataset): - Shuffle.GLOBAL: Shuffle both the files and samples. - Shuffle.FILES: Shuffle files only. - num_shards (int, optional): Number of shards that the dataset will be divided into (default=None). + num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None. When this argument is specified, `num_samples` reflects the max sample number of per shard. - shard_id (int, optional): The shard ID within `num_shards` (default=None). This + shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only be specified when `num_shards` is also specified. - num_parallel_workers (int, optional): Number of workers to read the data - (default=None, number set in the config). + num_parallel_workers (int, optional): Number of workers to read the data. + Default: None, number set in the config. cache (DatasetCache, optional): Use tensor caching service to speed up dataset processing. More details: - `Single-Node Data Cache `_ - (default=None, which means no cache is used). + `Single-Node Data Cache `_ . + Default: None, which means no cache is used. Raises: RuntimeError: If `dataset_dir` does not contain data files. diff --git a/mindspore/python/mindspore/dataset/engine/datasets_user_defined.py b/mindspore/python/mindspore/dataset/engine/datasets_user_defined.py index d2c9264743b..ac583f5fd34 100644 --- a/mindspore/python/mindspore/dataset/engine/datasets_user_defined.py +++ b/mindspore/python/mindspore/dataset/engine/datasets_user_defined.py @@ -506,28 +506,28 @@ class GeneratorDataset(MappableDataset, UnionBaseDataset): iter(source).next(). Random accessible source is required to return a tuple of NumPy arrays as a row of the dataset on source[idx]. - column_names (Union[str, list[str]], optional): List of column names of the dataset (default=None). Users are + column_names (Union[str, list[str]], optional): List of column names of the dataset. Default: None. Users are required to provide either column_names or schema. - column_types (list[mindspore.dtype], optional): List of column data types of the dataset (default=None). + column_types (list[mindspore.dtype], optional): List of column data types of the dataset. Default: None. If provided, sanity check will be performed on generator output. - schema (Union[Schema, str], optional): Path to the JSON schema file or schema object (default=None). Users are + schema (Union[Schema, str], optional): Path to the JSON schema file or schema object. Default: None. Users are required to provide either column_names or schema. If both are provided, schema will be used. - num_samples (int, optional): The number of samples to be included in the dataset - (default=None, all images). - num_parallel_workers (int, optional): Number of subprocesses used to fetch the dataset in parallel (default=1). + num_samples (int, optional): The number of samples to be included in the dataset. + Default: None, all images. + num_parallel_workers (int, optional): Number of subprocesses used to fetch the dataset in parallel. Default: 1. shuffle (bool, optional): Whether or not to perform shuffle on the dataset. Random accessible input is required. - (default=None, expected order behavior shown in the table). + Default: None, expected order behavior shown in the table. sampler (Union[Sampler, Iterable], optional): Object used to choose samples from the dataset. Random accessible - input is required (default=None, expected order behavior shown in the table). - num_shards (int, optional): Number of shards that the dataset will be divided into (default=None). + input is required. Default: None, expected order behavior shown in the table. + num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None. Random accessible input is required. When this argument is specified, `num_samples` reflects the maximum sample number of per shard. - shard_id (int, optional): The shard ID within `num_shards` (default=None). This argument must be specified only + shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument must be specified only when num_shards is also specified. Random accessible input is required. python_multiprocessing (bool, optional): Parallelize Python operations with multiple worker process. This - option could be beneficial if the Python operation is computational heavy (default=True). + option could be beneficial if the Python operation is computational heavy. Default: True. max_rowsize(int, optional): Maximum size of row in MB that is used for shared memory allocation to copy - data between processes. This is only used if python_multiprocessing is set to True (default 6 MB). + data between processes. This is only used if python_multiprocessing is set to True. Default: 6 MB. Raises: RuntimeError: If source raises an exception during execution. @@ -839,19 +839,19 @@ class NumpySlicesDataset(GeneratorDataset): NumPy formats. Input data will be sliced along the first dimension and generate additional rows, if input is list, there will be one column in each row, otherwise there tends to be multi columns. Large data is not recommended to be loaded in this way as data is loading into memory. - column_names (list[str], optional): List of column names of the dataset (default=None). If column_names is not + column_names (list[str], optional): List of column names of the dataset. Default: None. If column_names is not provided, the output column names will be named as the keys of dict when the input data is a dict, otherwise they will be named like column_0, column_1 ... - num_samples (int, optional): The number of samples to be included in the dataset (default=None, all samples). - num_parallel_workers (int, optional): Number of subprocesses used to fetch the dataset in parallel (default=1). + num_samples (int, optional): The number of samples to be included in the dataset. Default: None, all samples. + num_parallel_workers (int, optional): Number of subprocesses used to fetch the dataset in parallel. Default: 1. shuffle (bool, optional): Whether or not to perform shuffle on the dataset. Random accessible input is required. - (default=None, expected order behavior shown in the table). + Default: None, expected order behavior shown in the table. sampler (Union[Sampler, Iterable], optional): Object used to choose samples from the dataset. Random accessible - input is required (default=None, expected order behavior shown in the table). - num_shards (int, optional): Number of shards that the dataset will be divided into (default=None). + input is required. Default: None, expected order behavior shown in the table. + num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None. Random accessible input is required. When this argument is specified, `num_samples` reflects the max sample number of per shard. - shard_id (int, optional): The shard ID within `num_shards` (default=None). This argument must be specified only + shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument must be specified only when num_shards is also specified. Random accessible input is required. Note: diff --git a/mindspore/python/mindspore/dataset/engine/datasets_vision.py b/mindspore/python/mindspore/dataset/engine/datasets_vision.py index 519d2aacb91..12526463c69 100644 --- a/mindspore/python/mindspore/dataset/engine/datasets_vision.py +++ b/mindspore/python/mindspore/dataset/engine/datasets_vision.py @@ -127,19 +127,19 @@ class Caltech101Dataset(GeneratorDataset): and the other is called Annotations, which stores annotations. target_type (str, optional): Target of the image. If `target_type` is 'category', return category represents the target class. If `target_type` is 'annotation', return annotation. - If `target_type` is 'all', return category and annotation (default=None, means 'category'). - num_samples (int, optional): The number of images to be included in the dataset - (default=None, all images). - num_parallel_workers (int, optional): Number of workers to read the data (default=1). - shuffle (bool, optional): Whether or not to perform shuffle on the dataset - (default=None, expected order behavior shown in the table). - decode (bool, optional): Whether or not to decode the images after reading (default=False). + If `target_type` is 'all', return category and annotation. Default: None, means 'category'. + num_samples (int, optional): The number of images to be included in the dataset. + Default: None, all images. + num_parallel_workers (int, optional): Number of workers to read the data. Default: 1. + shuffle (bool, optional): Whether or not to perform shuffle on the dataset. + Default: None, expected order behavior shown in the table. + decode (bool, optional): Whether or not to decode the images after reading. Default: False. sampler (Sampler, optional): Object used to choose samples from the - dataset (default=None, expected order behavior shown in the table). + dataset. Default: None, expected order behavior shown in the table. num_shards (int, optional): Number of shards that the dataset will be divided - into (default=None). When this argument is specified, `num_samples` reflects + into. Default: None. When this argument is specified, `num_samples` reflects the maximum sample number of per shard. - shard_id (int, optional): The shard ID within `num_shards` (default=None). This + shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only be specified when `num_shards` is also specified. Raises: @@ -288,23 +288,23 @@ class Caltech256Dataset(MappableDataset, VisionBaseDataset): Args: dataset_dir (str): Path to the root directory that contains the dataset. - num_samples (int, optional): The number of images to be included in the dataset - (default=None, all images). - num_parallel_workers (int, optional): Number of workers to read the data - (default=None, set in the config). - shuffle (bool, optional): Whether or not to perform shuffle on the dataset - (default=None, expected order behavior shown in the table). - decode (bool, optional): Whether or not to decode the images after reading (default=False). + num_samples (int, optional): The number of images to be included in the dataset. + Default: None, all images. + num_parallel_workers (int, optional): Number of workers to read the data. + Default: None, set in the config. + shuffle (bool, optional): Whether or not to perform shuffle on the dataset. + Default: None, expected order behavior shown in the table. + decode (bool, optional): Whether or not to decode the images after reading. Default: False. sampler (Sampler, optional): Object used to choose samples from the - dataset (default=None, expected order behavior shown in the table). + dataset. Default: None, expected order behavior shown in the table. num_shards (int, optional): Number of shards that the dataset will be divided - into (default=None). When this argument is specified, `num_samples` reflects + into. Default: None. When this argument is specified, `num_samples` reflects the maximum sample number of per shard. - shard_id (int, optional): The shard ID within `num_shards` (default=None). This + shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only be specified when `num_shards` is also specified. cache (DatasetCache, optional): Use tensor caching service to speed up dataset processing. More details: - `Single-Node Data Cache `_ - (default=None, which means no cache is used). + `Single-Node Data Cache `_ . + Default: None, which means no cache is used. Raises: RuntimeError: If `dataset_dir` does not contain data files. @@ -417,24 +417,24 @@ class CelebADataset(MappableDataset, VisionBaseDataset): Args: dataset_dir (str): Path to the root directory that contains the dataset. - num_parallel_workers (int, optional): Number of workers to read the data (default=None, will use value set in - the config). - shuffle (bool, optional): Whether to perform shuffle on the dataset (default=None). - usage (str, optional): Specify the 'train', 'valid', 'test' part or 'all' parts of dataset - (default= 'all', will read all samples). - sampler (Sampler, optional): Object used to choose samples from the dataset (default=None). - decode (bool, optional): Whether to decode the images after reading (default=False). - extensions (list[str], optional): List of file extensions to be included in the dataset (default=None). - num_samples (int, optional): The number of images to be included in the dataset - (default=None, will include all images). + num_parallel_workers (int, optional): Number of workers to read the data. Default: None, will use value set in + the config. + shuffle (bool, optional): Whether to perform shuffle on the dataset. Default: None. + usage (str, optional): Specify the 'train', 'valid', 'test' part or 'all' parts of dataset. + Default: 'all', will read all samples. + sampler (Sampler, optional): Object used to choose samples from the dataset. Default: None. + decode (bool, optional): Whether to decode the images after reading. Default: False. + extensions (list[str], optional): List of file extensions to be included in the dataset. Default: None. + num_samples (int, optional): The number of images to be included in the dataset. + Default: None, will include all images. num_shards (int, optional): Number of shards that the dataset will be divided - into (default=None). When this argument is specified, `num_samples` reflects + into. Default: None. When this argument is specified, `num_samples` reflects the maximum sample number of per shard. - shard_id (int, optional): The shard ID within `num_shards` (default=None). This + shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only be specified when `num_shards` is also specified. cache (DatasetCache, optional): Use tensor caching service to speed up dataset processing. More details: - `Single-Node Data Cache `_ - (default=None, which means no cache is used). + `Single-Node Data Cache `_ . + Default: None, which means no cache is used. decrypt (callable, optional): Image decryption function, which accepts the path of the encrypted image file and returns the decrypted bytes data. Default: None, no decryption. @@ -588,24 +588,24 @@ class Cifar10Dataset(MappableDataset, VisionBaseDataset): Args: dataset_dir (str): Path to the root directory that contains the dataset. usage (str, optional): Usage of this dataset, can be 'train', 'test' or 'all' . 'train' will read from 50,000 - train samples, 'test' will read from 10,000 test samples, 'all' will read from all 60,000 samples - (default=None, all samples). - num_samples (int, optional): The number of images to be included in the dataset - (default=None, all images). - num_parallel_workers (int, optional): Number of workers to read the data - (default=None, number set in the config). - shuffle (bool, optional): Whether to perform shuffle on the dataset (default=None, expected - order behavior shown in the table). + train samples, 'test' will read from 10,000 test samples, 'all' will read from all 60,000 samples. + Default: None, all samples. + num_samples (int, optional): The number of images to be included in the dataset. + Default: None, all images. + num_parallel_workers (int, optional): Number of workers to read the data. + Default: None, number set in the config. + shuffle (bool, optional): Whether to perform shuffle on the dataset. Default: None, expected + order behavior shown in the table. sampler (Sampler, optional): Object used to choose samples from the - dataset (default=None, expected order behavior shown in the table). + dataset. Default: None, expected order behavior shown in the table. num_shards (int, optional): Number of shards that the dataset will be divided - into (default=None). When this argument is specified, `num_samples` reflects + into. Default: None. When this argument is specified, `num_samples` reflects the maximum sample number of per shard. - shard_id (int, optional): The shard ID within `num_shards` (default=None). This + shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only be specified when `num_shards` is also specified. cache (DatasetCache, optional): Use tensor caching service to speed up dataset processing. More details: - `Single-Node Data Cache `_ - (default=None, which means no cache is used). + `Single-Node Data Cache `_ . + Default: None, which means no cache is used. Raises: RuntimeError: If `dataset_dir` does not contain data files. @@ -720,24 +720,24 @@ class Cifar100Dataset(MappableDataset, VisionBaseDataset): Args: dataset_dir (str): Path to the root directory that contains the dataset. usage (str, optional): Usage of this dataset, can be 'train', 'test' or 'all' . 'train' will read from 50,000 - train samples, 'test' will read from 10,000 test samples, 'all' will read from all 60,000 samples - (default=None, all samples). - num_samples (int, optional): The number of images to be included in the dataset - (default=None, all images). - num_parallel_workers (int, optional): Number of workers to read the data - (default=None, number set in the config). - shuffle (bool, optional): Whether to perform shuffle on the dataset (default=None, expected - order behavior shown in the table). + train samples, 'test' will read from 10,000 test samples, 'all' will read from all 60,000 samples. + Default: None, all samples. + num_samples (int, optional): The number of images to be included in the dataset. + Default: None, all images. + num_parallel_workers (int, optional): Number of workers to read the data. + Default: None, number set in the config. + shuffle (bool, optional): Whether to perform shuffle on the dataset. Default: None, expected + order behavior shown in the table. sampler (Sampler, optional): Object used to choose samples from the - dataset (default=None, expected order behavior shown in the table). + dataset. Default: None, expected order behavior shown in the table. num_shards (int, optional): Number of shards that the dataset will be divided - into (default=None). When this argument is specified, 'num_samples' reflects + into. Default: None. When this argument is specified, 'num_samples' reflects the maximum sample number of per shard. - shard_id (int, optional): The shard ID within `num_shards` (default=None). This + shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only be specified when `num_shards` is also specified. cache (DatasetCache, optional): Use tensor caching service to speed up dataset processing. More details: - `Single-Node Data Cache `_ - (default=None, which means no cache is used). + `Single-Node Data Cache `_ . + Default: None, which means no cache is used. Raises: RuntimeError: If `dataset_dir` does not contain data files. @@ -847,27 +847,27 @@ class CityscapesDataset(MappableDataset, VisionBaseDataset): Args: dataset_dir (str): Path to the root directory that contains the dataset. usage (str, optional): Acceptable usages include 'train', 'test', 'val' or 'all' if quality_mode is 'fine' - otherwise 'train', 'train_extra', 'val' or 'all' (default= 'train'). - quality_mode (str, optional): Acceptable quality_modes include 'fine' or 'coarse' (default= 'fine'). + otherwise 'train', 'train_extra', 'val' or 'all'. Default: 'train'. + quality_mode (str, optional): Acceptable quality_modes include 'fine' or 'coarse'. Default: 'fine'. task (str, optional): Acceptable tasks include 'instance', - 'semantic', 'polygon' or 'color' (default= 'instance'). + 'semantic', 'polygon' or 'color'. Default: 'instance'. num_samples (int, optional): The number of images to be included in the dataset. - (default=None, all images). - num_parallel_workers (int, optional): Number of workers to read the data - (default=None, number set in the config). - shuffle (bool, optional): Whether to perform shuffle on the dataset (default=None, expected - order behavior shown in the table). - decode (bool, optional): Decode the images after reading (default=False). + Default: None, all images. + num_parallel_workers (int, optional): Number of workers to read the data. + Default: None, number set in the config. + shuffle (bool, optional): Whether to perform shuffle on the dataset. Default: None, expected + order behavior shown in the table. + decode (bool, optional): Decode the images after reading. Default: False. sampler (Sampler, optional): Object used to choose samples from the - dataset (default=None, expected order behavior shown in the table). + dataset. Default: None, expected order behavior shown in the table. num_shards (int, optional): Number of shards that the dataset will be divided - into (default=None). When this argument is specified, `num_samples` reflects + into. Default: None. When this argument is specified, `num_samples` reflects the max sample number of per shard. - shard_id (int, optional): The shard ID within `num_shards` (default=None). This + shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only be specified when `num_shards` is also specified. cache (DatasetCache, optional): Use tensor caching service to speed up dataset processing. More details: - `Single-Node Data Cache `_ - (default=None, which means no cache is used). + `Single-Node Data Cache `_ . + Default: None, which means no cache is used. Raises: RuntimeError: If `dataset_dir` is invalid or does not contain data files. @@ -1018,26 +1018,26 @@ class CocoDataset(MappableDataset, VisionBaseDataset): dataset_dir (str): Path to the root directory that contains the dataset. annotation_file (str): Path to the annotation JSON file. task (str, optional): Set the task type for reading COCO data. Supported task types: - 'Detection', 'Stuff', 'Panoptic', 'Keypoint' and 'Captioning' (default='Detection'). - num_samples (int, optional): The number of images to be included in the dataset - (default=None, all images). - num_parallel_workers (int, optional): Number of workers to read the data - (default=None, number set in the configuration file). - shuffle (bool, optional): Whether to perform shuffle on the dataset (default=None, expected - order behavior shown in the table). - decode (bool, optional): Decode the images after reading (default=False). - sampler (Sampler, optional): Object used to choose samples from the dataset - (default=None, expected order behavior shown in the table). + 'Detection', 'Stuff', 'Panoptic', 'Keypoint' and 'Captioning'. Default: 'Detection'. + num_samples (int, optional): The number of images to be included in the dataset. + Default: None, all images. + num_parallel_workers (int, optional): Number of workers to read the data. + Default: None, number set in the configuration file. + shuffle (bool, optional): Whether to perform shuffle on the dataset. Default: None, expected + order behavior shown in the table. + decode (bool, optional): Decode the images after reading. Default: False. + sampler (Sampler, optional): Object used to choose samples from the dataset. + Default: None, expected order behavior shown in the table. num_shards (int, optional): Number of shards that the dataset will be divided - into (default=None). When this argument is specified, `num_samples` reflects + into. Default: None. When this argument is specified, `num_samples` reflects the maximum sample number of per shard. - shard_id (int, optional): The shard ID within `num_shards` (default=None). This + shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only be specified when `num_shards` is also specified. cache (DatasetCache, optional): Use tensor caching service to speed up dataset processing. More details: - `Single-Node Data Cache `_ - (default=None, which means no cache is used). + `Single-Node Data Cache `_ . + Default: None, which means no cache is used. extra_metadata(bool, optional): Flag to add extra meta-data to row. If True, an additional column will be - output at the end :py:obj:`[_meta-filename, dtype=string]` (default=False). + output at the end :py:obj:`[_meta-filename, dtype=string]`. Default: False. decrypt (callable, optional): Image decryption function, which accepts the path of the encrypted image file and returns the decrypted bytes data. Default: None, no decryption. @@ -1261,30 +1261,30 @@ class DIV2KDataset(MappableDataset, VisionBaseDataset): Args: dataset_dir (str): Path to the root directory that contains the dataset. - usage (str, optional): Acceptable usages include 'train', 'valid' or 'all' (default= 'train'). + usage (str, optional): Acceptable usages include 'train', 'valid' or 'all'. Default: 'train'. downgrade (str, optional): Acceptable downgrades include 'bicubic', 'unknown', 'mild', 'difficult' or - 'wild' (default= 'bicubic'). - scale (str, optional): Acceptable scales include 2, 3, 4 or 8 (default=2). + 'wild'. Default: 'bicubic'. + scale (str, optional): Acceptable scales include 2, 3, 4 or 8. Default: 2. When `downgrade` is 'bicubic', scale can be 2, 3, 4, 8. When `downgrade` is 'unknown', scale can only be 2, 3, 4. When `downgrade` is 'mild', 'difficult' or 'wild', scale can only be 4. num_samples (int, optional): The number of images to be included in the dataset. - (default=None, all images). - num_parallel_workers (int, optional): Number of workers to read the data - (default=None, number set in the config). - shuffle (bool, optional): Whether to perform shuffle on the dataset (default=None, expected - order behavior shown in the table). - decode (bool, optional): Decode the images after reading (default=False). + Default: None, all images. + num_parallel_workers (int, optional): Number of workers to read the data. + Default: None, number set in the config. + shuffle (bool, optional): Whether to perform shuffle on the dataset. Default: None, expected + order behavior shown in the table. + decode (bool, optional): Decode the images after reading. Default: False. sampler (Sampler, optional): Object used to choose samples from the - dataset (default=None, expected order behavior shown in the table). + dataset. Default: None, expected order behavior shown in the table. num_shards (int, optional): Number of shards that the dataset will be divided - into (default=None). When this argument is specified, `num_samples` reflects + into. Default: None. When this argument is specified, `num_samples` reflects the max sample number of per shard. - shard_id (int, optional): The shard ID within `num_shards` (default=None). This + shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only be specified when `num_shards` is also specified. cache (DatasetCache, optional): Use tensor caching service to speed up dataset processing. More details: - `Single-Node Data Cache `_ - (default=None, which means no cache is used). + `Single-Node Data Cache `_ . + Default: None, which means no cache is used. Raises: RuntimeError: If `dataset_dir` is invalid or does not contain data files. @@ -1799,22 +1799,22 @@ class FlickrDataset(MappableDataset, VisionBaseDataset): dataset_dir (str): Path to the root directory that contains the dataset. annotation_file (str): Path to the root directory that contains the annotation. num_samples (int, optional): The number of images to be included in the dataset. - (default=None, all images). - num_parallel_workers (int, optional): Number of workers to read the data - (default=None, number set in the config). - shuffle (bool, optional): Whether to perform shuffle on the dataset (default=None, expected - order behavior shown in the table). - decode (bool, optional): Decode the images after reading (default=None). + Default: None, all images. + num_parallel_workers (int, optional): Number of workers to read the data. + Default: None, number set in the config. + shuffle (bool, optional): Whether to perform shuffle on the dataset. Default: None, expected + order behavior shown in the table. + decode (bool, optional): Decode the images after reading. Default: None. sampler (Sampler, optional): Object used to choose samples from the - dataset (default=None, expected order behavior shown in the table). + dataset. Default: None, expected order behavior shown in the table. num_shards (int, optional): Number of shards that the dataset will be divided - into (default=None). When this argument is specified, `num_samples` reflects + into. Default: None. When this argument is specified, `num_samples` reflects the max sample number of per shard. - shard_id (int, optional): The shard ID within `num_shards` (default=None). This + shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only be specified when `num_shards` is also specified. cache (DatasetCache, optional): Use tensor caching service to speed up dataset processing. More details: - `Single-Node Data Cache `_ - (default=None, which means no cache is used). + `Single-Node Data Cache `_ . + Default: None, which means no cache is used. Raises: RuntimeError: If `dataset_dir` is not valid or does not contain data files. @@ -2203,29 +2203,29 @@ class ImageFolderDataset(MappableDataset, VisionBaseDataset): Args: dataset_dir (str): Path to the root directory that contains the dataset. - num_samples (int, optional): The number of images to be included in the dataset - (default=None, all images). - num_parallel_workers (int, optional): Number of workers to read the data - (default=None, set in the config). - shuffle (bool, optional): Whether or not to perform shuffle on the dataset - (default=None, expected order behavior shown in the table). + num_samples (int, optional): The number of images to be included in the dataset. + Default: None, all images. + num_parallel_workers (int, optional): Number of workers to read the data. + Default: None, set in the config. + shuffle (bool, optional): Whether or not to perform shuffle on the dataset. + Default: None, expected order behavior shown in the table. sampler (Sampler, optional): Object used to choose samples from the - dataset (default=None, expected order behavior shown in the table). + dataset. Default: None, expected order behavior shown in the table. extensions (list[str], optional): List of file extensions to be - included in the dataset (default=None). + included in the dataset. Default: None. class_indexing (dict, optional): A str-to-int mapping from folder name to index - (default=None, the folder names will be sorted + Default: None, the folder names will be sorted alphabetically and each class will be given a - unique index starting from 0). - decode (bool, optional): Decode the images after reading (default=False). + unique index starting from 0. + decode (bool, optional): Decode the images after reading. Default: False. num_shards (int, optional): Number of shards that the dataset will be divided - into (default=None). When this argument is specified, `num_samples` reflects + into. Default: None. When this argument is specified, `num_samples` reflects the maximum sample number of per shard. - shard_id (int, optional): The shard ID within `num_shards` (default=None). This + shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only be specified when `num_shards` is also specified. cache (DatasetCache, optional): Use tensor caching service to speed up dataset processing. More details: - `Single-Node Data Cache `_ - (default=None, which means no cache is used). + `Single-Node Data Cache `_ . + Default: None, which means no cache is used. decrypt (callable, optional): Image decryption function, which accepts the path of the encrypted image file and returns the decrypted bytes data. Default: None, no decryption. @@ -2348,24 +2348,24 @@ class KITTIDataset(MappableDataset): Args: dataset_dir (str): Path to the root directory that contains the dataset. usage (str, optional): Usage of this dataset, can be `train` or `test`. `train` will read 7481 - train samples, `test` will read from 7518 test samples without label (default=None, will use `train`). - num_samples (int, optional): The number of images to be included in the dataset - (default=None, will include all images). - num_parallel_workers (int, optional): Number of workers to read the data - (default=None, number set in the config). - shuffle (bool, optional): Whether to perform shuffle on the dataset (default=None, expected - order behavior shown in the table). - decode (bool, optional): Decode the images after reading (default=False). - sampler (Sampler, optional): Object used to choose samples from the dataset - (default=None, expected order behavior shown in the table). + train samples, `test` will read from 7518 test samples without label. Default: None, will use `train`. + num_samples (int, optional): The number of images to be included in the dataset. + Default: None, will include all images. + num_parallel_workers (int, optional): Number of workers to read the data. + Default: None, number set in the config. + shuffle (bool, optional): Whether to perform shuffle on the dataset. Default: None, expected + order behavior shown in the table. + decode (bool, optional): Decode the images after reading. Default: False. + sampler (Sampler, optional): Object used to choose samples from the dataset. + Default: None, expected order behavior shown in the table. num_shards (int, optional): Number of shards that the dataset will be divided - into (default=None). When this argument is specified, 'num_samples' reflects + into. Default: None. When this argument is specified, 'num_samples' reflects the max sample number of per shard. - shard_id (int, optional): The shard ID within num_shards (default=None). This + shard_id (int, optional): The shard ID within num_shards. Default: None. This argument can only be specified when num_shards is also specified. cache (DatasetCache, optional): Use tensor caching service to speed up dataset processing. More details: - `Single-Node Data Cache `_ - (default=None, which means no cache is used). + `Single-Node Data Cache `_ . + Default: None, which means no cache is used. Raises: RuntimeError: If `sampler` and `shuffle` are specified at the same time. @@ -2601,8 +2601,8 @@ class LFWDataset(MappableDataset, VisionBaseDataset): """ A source dataset that reads and parses the LFW dataset. - When task is "people", the generated dataset has two columns: :py:obj:`[image, label]`; - When task is "pairs", the generated dataset has three columns: :py:obj:`[image1, image2, label]`. + When task is 'people', the generated dataset has two columns: :py:obj:`[image, label]`; + When task is 'pairs', the generated dataset has three columns: :py:obj:`[image1, image2, label]`. The tensor of column :py:obj:`image` is of the uint8 type. The tensor of column :py:obj:`image1` is of the uint8 type. The tensor of column :py:obj:`image2` is of the uint8 type. @@ -2610,29 +2610,29 @@ class LFWDataset(MappableDataset, VisionBaseDataset): Args: dataset_dir (str): Path to the root directory that contains the dataset. - task (str, optional): Set the task type of reading lfw data, support "people" and "pairs" - (default="people"). - usage (str, optional): The image split to use, support "10fold", "train", "test" and "all" - (default="all", will read samples including train and test). - image_set (str, optional): Image set of image funneling to use, support "original", "funneled" or - "deepfunneled" (default="funneled", will read "funneled" set). - num_samples (int, optional): The number of images to be included in the dataset - (default=None, all images). - num_parallel_workers (int, optional): Number of workers to read the data - (default=None, set in the config). - shuffle (bool, optional): Whether or not to perform shuffle on the dataset - (default=None, expected order behavior shown in the table). - decode (bool, optional): Decode the images after reading (default=False). + task (str, optional): Set the task type of reading lfw data, support 'people' and 'pairs'. + Default: 'people'. + usage (str, optional): The image split to use, support '10fold', 'train', 'test' and 'all'. + Default: 'all', will read samples including train and test. + image_set (str, optional): Image set of image funneling to use, support 'original', 'funneled' or + 'deepfunneled'. Default: 'funneled', will read 'funneled' set. + num_samples (int, optional): The number of images to be included in the dataset. + Default: None, all images. + num_parallel_workers (int, optional): Number of workers to read the data. + Default: None, set in the config. + shuffle (bool, optional): Whether or not to perform shuffle on the dataset. + Default: None, expected order behavior shown in the table. + decode (bool, optional): Decode the images after reading. Default: False. sampler (Sampler, optional): Object used to choose samples from the - dataset (default=None, expected order behavior shown in the table). + dataset. Default: None, expected order behavior shown in the table. num_shards (int, optional): Number of shards that the dataset will be divided - into (default=None). When this argument is specified, 'num_samples' reflects + into. Default: None. When this argument is specified, 'num_samples' reflects the max sample number of per shard. - shard_id (int, optional): The shard ID within num_shards (default=None). This + shard_id (int, optional): The shard ID within num_shards. Default: None. This argument can only be specified when num_shards is also specified. cache (DatasetCache, optional): Use tensor caching service to speed up dataset processing. More details: - `Single-Node Data Cache `_ - (default=None, which means no cache is used). + `Single-Node Data Cache `_ . + Default: None, which means no cache is used. Raises: RuntimeError: If sampler and shuffle are specified at the same time. @@ -2763,26 +2763,26 @@ class LSUNDataset(MappableDataset, VisionBaseDataset): Args: dataset_dir (str): Path to the root directory that contains the dataset. usage (str, optional): Usage of this dataset, can be `train`, `test`, `valid` or `all` - (default=None, will be set to `all`). - classes(Union[str, list[str]], optional): Choose the specific classes to load (default=None, means loading - all classes in root directory). - num_samples (int, optional): The number of images to be included in the dataset - (default=None, all images). - num_parallel_workers (int, optional): Number of workers to read the data - (default=None, set in the config). - shuffle (bool, optional): Whether or not to perform shuffle on the dataset - (default=None, expected order behavior shown in the table). - decode (bool, optional): Decode the images after reading (default=False). + Default: None, will be set to `all`. + classes(Union[str, list[str]], optional): Choose the specific classes to load. Default: None, means loading + all classes in root directory. + num_samples (int, optional): The number of images to be included in the dataset. + Default: None, all images. + num_parallel_workers (int, optional): Number of workers to read the data. + Default: None, set in the config. + shuffle (bool, optional): Whether or not to perform shuffle on the dataset. + Default: None, expected order behavior shown in the table. + decode (bool, optional): Decode the images after reading. Default: False. sampler (Sampler, optional): Object used to choose samples from the - dataset (default=None, expected order behavior shown in the table). + dataset. Default: None, expected order behavior shown in the table. num_shards (int, optional): Number of shards that the dataset will be divided - into (default=None). When this argument is specified, 'num_samples' reflects + into. Default: None. When this argument is specified, 'num_samples' reflects the max sample number of per shard. - shard_id (int, optional): The shard ID within num_shards (default=None). This + shard_id (int, optional): The shard ID within num_shards. Default: None. This argument can only be specified when num_shards is also specified. cache (DatasetCache, optional): Use tensor caching service to speed up dataset processing. More details: - `Single-Node Data Cache `_ - (default=None, which means no cache is used). + `Single-Node Data Cache `_ . + Default: None, which means no cache is used. Raises: RuntimeError: If 'sampler' and 'shuffle' are specified at the same time. @@ -2893,27 +2893,27 @@ class ManifestDataset(MappableDataset, VisionBaseDataset): Args: dataset_file (str): File to be read. - usage (str, optional): Acceptable usages include 'train', 'eval' and 'inference' (default= 'train'). + usage (str, optional): Acceptable usages include 'train', 'eval' and 'inference'. Default: 'train'. num_samples (int, optional): The number of images to be included in the dataset. - (default=None, will include all images). - num_parallel_workers (int, optional): Number of workers to read the data - (default=None, will use value set in the config). - shuffle (bool, optional): Whether to perform shuffle on the dataset (default=None, expected - order behavior shown in the table). + Default: None, will include all images. + num_parallel_workers (int, optional): Number of workers to read the data. + Default: None, will use value set in the config. + shuffle (bool, optional): Whether to perform shuffle on the dataset. Default: None, expected + order behavior shown in the table. sampler (Sampler, optional): Object used to choose samples from the - dataset (default=None, expected order behavior shown in the table). - class_indexing (dict, optional): A str-to-int mapping from label name to index - (default=None, the folder names will be sorted alphabetically and each - class will be given a unique index starting from 0). - decode (bool, optional): decode the images after reading (default=False). + dataset. Default: None, expected order behavior shown in the table. + class_indexing (dict, optional): A str-to-int mapping from label name to index. + Default: None, the folder names will be sorted alphabetically and each + class will be given a unique index starting from 0. + decode (bool, optional): decode the images after reading. Default: False. num_shards (int, optional): Number of shards that the dataset will be divided - into (default=None). When this argument is specified, `num_samples` reflects + into. Default: None. When this argument is specified, `num_samples` reflects the max number of samples per shard. - shard_id (int, optional): The shard ID within `num_shards` (default=None). This + shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only be specified when `num_shards` is also specified. cache (DatasetCache, optional): Use tensor caching service to speed up dataset processing. More details: - `Single-Node Data Cache `_ - (default=None, which means no cache is used). + `Single-Node Data Cache `_ . + Default: None, which means no cache is used. Raises: RuntimeError: If dataset_files are not valid or do not exist. @@ -3015,22 +3015,22 @@ class MnistDataset(MappableDataset, VisionBaseDataset): dataset_dir (str): Path to the root directory that contains the dataset. usage (str, optional): Usage of this dataset, can be 'train', 'test' or 'all' . 'train' will read from 60,000 train samples, 'test' will read from 10,000 test samples, 'all' will read from all 70,000 samples. - (default=None, will read all samples) - num_samples (int, optional): The number of images to be included in the dataset - (default=None, will read all images). - num_parallel_workers (int, optional): Number of workers to read the data - (default=None, will use value set in the config). - shuffle (bool, optional): Whether or not to perform shuffle on the dataset - (default=None, expected order behavior shown in the table). + Default: None, will read all samples. + num_samples (int, optional): The number of images to be included in the dataset. + Default: None, will read all images. + num_parallel_workers (int, optional): Number of workers to read the data. + Default: None, will use value set in the config. + shuffle (bool, optional): Whether or not to perform shuffle on the dataset. + Default: None, expected order behavior shown in the table. sampler (Sampler, optional): Object used to choose samples from the - dataset (default=None, expected order behavior shown in the table). - num_shards (int, optional): Number of shards that the dataset will be divided into (default=None). + dataset. Default: None, expected order behavior shown in the table. + num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None. When this argument is specified, `num_samples` reflects the maximum sample number of per shard. - shard_id (int, optional): The shard ID within `num_shards` (default=None). This + shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only be specified when `num_shards` is also specified. cache (DatasetCache, optional): Use tensor caching service to speed up dataset processing. More details: - `Single-Node Data Cache `_ - (default=None, which means no cache is used). + `Single-Node Data Cache `_ . + Default: None, which means no cache is used. Raises: RuntimeError: If `dataset_dir` does not contain data files. @@ -3135,25 +3135,25 @@ class OmniglotDataset(MappableDataset): Args: dataset_dir (str): Path to the root directory that contains the dataset. - background(bool, optional): Use the background dataset or the evaluation dataset - (default=None, will use the background dataset). - num_samples (int, optional): The number of images to be included in the dataset - (default=None, all images). - num_parallel_workers (int, optional): Number of workers to read the data - (default=None, set in the config). - shuffle (bool, optional): Whether or not to perform shuffle on the dataset - (default=None, expected order behavior shown in the table). - decode (bool, optional): Decode the images after reading (default=False). + background(bool, optional): Use the background dataset or the evaluation dataset. + Default: None, will use the background dataset. + num_samples (int, optional): The number of images to be included in the dataset. + Default: None, all images. + num_parallel_workers (int, optional): Number of workers to read the data. + Default: None, set in the config. + shuffle (bool, optional): Whether or not to perform shuffle on the dataset. + Default: None, expected order behavior shown in the table. + decode (bool, optional): Decode the images after reading. Default: False. sampler (Sampler, optional): Object used to choose samples from the - dataset (default=None, expected order behavior shown in the table). + dataset. Default: None, expected order behavior shown in the table. num_shards (int, optional): Number of shards that the dataset will be divided - into (default=None). When this argument is specified, 'num_samples' reflects + into. Default: None. When this argument is specified, 'num_samples' reflects the max sample number of per shard. - shard_id (int, optional): The shard ID within num_shards (default=None). This + shard_id (int, optional): The shard ID within num_shards. Default: None. This argument can only be specified when num_shards is also specified. cache (DatasetCache, optional): Use tensor caching service to speed up dataset processing. More details: - `Single-Node Data Cache `_ - (default=None, which means no cache is used). + `Single-Node Data Cache `_ . + Default: None, which means no cache is used. Raises: RuntimeError: If `sampler` and `shuffle` are specified at the same time. @@ -3418,25 +3418,25 @@ class Places365Dataset(MappableDataset, VisionBaseDataset): Args: dataset_dir (str): Path to the root directory that contains the dataset. - usage (str, optional): Usage of this dataset, can be 'train-standard', 'train-challenge' or 'val' - (default=None, will be set to 'train-standard'). - small (bool, optional): Use 256 * 256 images (True) or high resolution images (False) (default=False). - decode (bool, optional): Decode the images after reading (default=True). - num_samples (int, optional): The number of images to be included in the dataset - (default=None, will read all images). - num_parallel_workers (int, optional): Number of workers to read the data - (default=None, will use value set in the config). - shuffle (bool, optional): Whether or not to perform shuffle on the dataset - (default=None, expected order behavior shown in the table). + usage (str, optional): Usage of this dataset, can be 'train-standard', 'train-challenge' or 'val'. + Default: None, will be set to 'train-standard'. + small (bool, optional): Use 256 * 256 images (True) or high resolution images (False). Default: False. + decode (bool, optional): Decode the images after reading. Default: True. + num_samples (int, optional): The number of images to be included in the dataset. + Default: None, will read all images. + num_parallel_workers (int, optional): Number of workers to read the data. + Default: None, will use value set in the config. + shuffle (bool, optional): Whether or not to perform shuffle on the dataset. + Default: None, expected order behavior shown in the table. sampler (Sampler, optional): Object used to choose samples from the - dataset (default=None, expected order behavior shown in the table). - num_shards (int, optional): Number of shards that the dataset will be divided into (default=None). + dataset. Default: None, expected order behavior shown in the table. + num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None. When this argument is specified, `num_samples` reflects the max sample number of per shard. - shard_id (int, optional): The shard ID within `num_shards` (default=None). This + shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only be specified when `num_shards` is also specified. cache (DatasetCache, optional): Use tensor caching service to speed up dataset processing. More details: - `Single-Node Data Cache `_ - (default=None, which means no cache is used). + `Single-Node Data Cache `_ . + Default: None, which means no cache is used. Raises: RuntimeError: If `dataset_dir` does not contain data files. @@ -3561,24 +3561,24 @@ class QMnistDataset(MappableDataset, VisionBaseDataset): Args: dataset_dir (str): Path to the root directory that contains the dataset. usage (str, optional): Usage of this dataset, can be 'train', 'test', 'test10k', 'test50k', 'nist' - or 'all' (default=None, will read all samples). + or 'all'. Default: None, will read all samples. compat (bool, optional): Whether the label for each example is class number (compat=True) or the full QMNIST - information (compat=False) (default=True). - num_samples (int, optional): The number of images to be included in the dataset - (default=None, will read all images). - num_parallel_workers (int, optional): Number of workers to read the data - (default=None, will use value set in the config). - shuffle (bool, optional): Whether or not to perform shuffle on the dataset - (default=None, expected order behavior shown in the table). + information (compat=False). Default: True. + num_samples (int, optional): The number of images to be included in the dataset. + Default: None, will read all images. + num_parallel_workers (int, optional): Number of workers to read the data. + Default: None, will use value set in the config. + shuffle (bool, optional): Whether or not to perform shuffle on the dataset. + Default: None, expected order behavior shown in the table. sampler (Sampler, optional): Object used to choose samples from the - dataset (default=None, expected order behavior shown in the table). - num_shards (int, optional): Number of shards that the dataset will be divided into (default=None). + dataset. Default: None, expected order behavior shown in the table. + num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None. When this argument is specified, `num_samples` reflects the maximum sample number of per shard. - shard_id (int, optional): The shard ID within `num_shards` (default=None). This + shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only be specified when `num_shards` is also specified. cache (DatasetCache, optional): Use tensor caching service to speed up dataset processing. More details: - `Single-Node Data Cache `_ - (default=None, which means no cache is used). + `Single-Node Data Cache `_ . + Default: None, which means no cache is used. Raises: RuntimeError: If `dataset_dir` does not contain data files. @@ -3681,25 +3681,25 @@ class RandomDataset(SourceDataset, VisionBaseDataset): A source dataset that generates random data. Args: - total_rows (int, optional): Number of samples for the dataset to generate - (default=None, number of samples is random). - schema (Union[str, Schema], optional): Path to the JSON schema file or schema object (default=None). + total_rows (int, optional): Number of samples for the dataset to generate. + Default: None, number of samples is random. + schema (Union[str, Schema], optional): Path to the JSON schema file or schema object. Default: None. If the schema is not provided, the random dataset generates a random schema. - columns_list (list[str], optional): List of column names of the dataset - (default=None, the columns will be named like this "c0", "c1", "c2" etc). - num_samples (int, optional): The number of samples to be included in the dataset - (default=None, all samples). - num_parallel_workers (int, optional): Number of workers to read the data - (default=None, number set in the config). + columns_list (list[str], optional): List of column names of the dataset. + Default: None, the columns will be named like this "c0", "c1", "c2" etc. + num_samples (int, optional): The number of samples to be included in the dataset. + Default: None, all samples. + num_parallel_workers (int, optional): Number of workers to read the data. + Default: None, number set in the config. cache (DatasetCache, optional): Use tensor caching service to speed up dataset processing. More details: - `Single-Node Data Cache `_ - (default=None, which means no cache is used). - shuffle (bool, optional): Whether or not to perform shuffle on the dataset - (default=None, expected order behavior shown in the table). + `Single-Node Data Cache `_ . + Default: None, which means no cache is used. + shuffle (bool, optional): Whether or not to perform shuffle on the dataset. + Default: None, expected order behavior shown in the table. num_shards (int, optional): Number of shards that the dataset will be divided - into (default=None). When this argument is specified, 'num_samples' reflects + into. Default: None. When this argument is specified, 'num_samples' reflects the maximum sample number of per shard. - shard_id (int, optional): The shard ID within `num_shards` (default=None). This + shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only be specified when `num_shards` is also specified. """ @@ -3796,21 +3796,21 @@ class SBDataset(GeneratorDataset): Args: dataset_dir (str): Path to the root directory that contains the dataset. - task (str, optional): Acceptable tasks include 'Boundaries' or 'Segmentation' (default= 'Boundaries'). - usage (str, optional): Acceptable usages include 'train', 'val', 'train_noval' and 'all' (default= 'all'). + task (str, optional): Acceptable tasks include 'Boundaries' or 'Segmentation'. Default: 'Boundaries'. + usage (str, optional): Acceptable usages include 'train', 'val', 'train_noval' and 'all'. Default: 'all'. num_samples (int, optional): The number of images to be included in the dataset. - (default=None, all images). - num_parallel_workers (int, optional): Number of workers to read the data - (default=None, number set in the config). - shuffle (bool, optional): Whether to perform shuffle on the dataset (default=None, expected - order behavior shown in the table). - decode (bool, optional): Decode the images after reading (default=None). + Default: None, all images. + num_parallel_workers (int, optional): Number of workers to read the data. + Default: None, number set in the config. + shuffle (bool, optional): Whether to perform shuffle on the dataset. Default: None, expected + order behavior shown in the table. + decode (bool, optional): Decode the images after reading. Default: None. sampler (Sampler, optional): Object used to choose samples from the - dataset (default=None, expected order behavior shown in the table). + dataset. Default: None, expected order behavior shown in the table. num_shards (int, optional): Number of shards that the dataset will be divided - into (default=None). When this argument is specified, `num_samples` reflects + into. Default: None. When this argument is specified, `num_samples` reflects the max sample number of per shard. - shard_id (int, optional): The shard ID within `num_shards` (default=None). This + shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only be specified when `num_shards` is also specified. Raises: @@ -3928,22 +3928,22 @@ class SBUDataset(MappableDataset, VisionBaseDataset): Args: dataset_dir (str): Path to the root directory that contains the dataset. - decode (bool, optional): Decode the images after reading (default=False). - num_samples (int, optional): The number of images to be included in the dataset - (default=None, will read all images). - num_parallel_workers (int, optional): Number of workers to read the data - (default=None, will use value set in the config). - shuffle (bool, optional): Whether or not to perform shuffle on the dataset - (default=None, expected order behavior shown in the table). + decode (bool, optional): Decode the images after reading. Default: False. + num_samples (int, optional): The number of images to be included in the dataset. + Default: None, will read all images. + num_parallel_workers (int, optional): Number of workers to read the data. + Default: None, will use value set in the config. + shuffle (bool, optional): Whether or not to perform shuffle on the dataset. + Default: None, expected order behavior shown in the table. sampler (Sampler, optional): Object used to choose samples from the - dataset (default=None, expected order behavior shown in the table). - num_shards (int, optional): Number of shards that the dataset will be divided into (default=None). + dataset. Default: None, expected order behavior shown in the table. + num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None. When this argument is specified, `num_samples` reflects the max sample number of per shard. - shard_id (int, optional): The shard ID within `num_shards` (default=None). This + shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only be specified when `num_shards` is also specified. cache (DatasetCache, optional): Use tensor caching service to speed up dataset processing. More details: - `Single-Node Data Cache `_ - (default=None, which means no cache is used). + `Single-Node Data Cache `_ . + Default: None, which means no cache is used. Raises: RuntimeError: If `dataset_dir` does not contain data files. @@ -4043,22 +4043,22 @@ class SemeionDataset(MappableDataset, VisionBaseDataset): Args: dataset_dir (str): Path to the root directory that contains the dataset. - num_samples (int, optional): The number of samples to be included in the dataset - (default=None, will read all images). - num_parallel_workers (int, optional): Number of workers to read the data - (default=None, number set in the config). - shuffle (bool, optional): Whether to perform shuffle on the dataset (default=None, expected - order behavior shown in the table). + num_samples (int, optional): The number of samples to be included in the dataset. + Default: None, will read all images. + num_parallel_workers (int, optional): Number of workers to read the data. + Default: None, number set in the config. + shuffle (bool, optional): Whether to perform shuffle on the dataset. Default: None, expected + order behavior shown in the table. sampler (Sampler, optional): Object used to choose samples from the - dataset (default=None, expected order behavior shown in the table). + dataset. Default: None, expected order behavior shown in the table. num_shards (int, optional): Number of shards that the dataset will be divided - into (default=None). When this argument is specified, `num_samples` reflects + into. Default: None. When this argument is specified, `num_samples` reflects the maximum sample number of per shard. - shard_id (int, optional): The shard ID within `num_shards` (default=None). This + shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only be specified when `num_shards` is also specified. cache (DatasetCache, optional): Use tensor caching service to speed up dataset processing. More details: - `Single-Node Data Cache `_ - (default=None, which means no cache is used). + `Single-Node Data Cache `_ . + Default: None, which means no cache is used. Raises: RuntimeError: If `dataset_dir` does not contain data files. @@ -4170,23 +4170,23 @@ class STL10Dataset(MappableDataset, VisionBaseDataset): train samples, 'test' will read from 8,000 test samples, 'unlabeled' will read from all 100,000 samples, and 'train+unlabeled' will read from 105000 samples, 'all' will read all the samples - (default=None, all samples). + Default: None, all samples. num_samples (int, optional): The number of images to be included in the dataset. - (default=None, all images). - num_parallel_workers (int, optional): Number of workers to read the data - (default=None, number set in the config). - shuffle (bool, optional): Whether to perform shuffle on the dataset (default=None, expected - order behavior shown in the table). + Default: None, all images. + num_parallel_workers (int, optional): Number of workers to read the data. + Default: None, number set in the config. + shuffle (bool, optional): Whether to perform shuffle on the dataset. Default: None, expected + order behavior shown in the table. sampler (Sampler, optional): Object used to choose samples from the - dataset (default=None, expected order behavior shown in the table). + dataset. Default: None, expected order behavior shown in the table. num_shards (int, optional): Number of shards that the dataset will be divided - into (default=None). When this argument is specified, 'num_samples' reflects + into. Default: None. When this argument is specified, 'num_samples' reflects the max sample number of per shard. - shard_id (int, optional): The shard ID within `num_shards` (default=None). This + shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only be specified when `num_shards` is also specified. cache (DatasetCache, optional): Use tensor caching service to speed up dataset processing. More details: - `Single-Node Data Cache `_ - (default=None, which means no cache is used). + `Single-Node Data Cache `_ . + Default: None, which means no cache is used. Raises: RuntimeError: If `dataset_dir` is not valid or does not exist or does not contain data files. @@ -4337,18 +4337,18 @@ class SVHNDataset(GeneratorDataset): Args: dataset_dir (str): Path to the root directory that contains the dataset. - usage (str, optional): Specify the 'train', 'test', 'extra' or 'all' parts of dataset - (default=None, will read all samples). - num_samples (int, optional): The number of samples to be included in the dataset (default=None, all images). - num_parallel_workers (int, optional): Number of subprocesses used to fetch the dataset in parallel (default=1). + usage (str, optional): Specify the 'train', 'test', 'extra' or 'all' parts of dataset. + Default: None, will read all samples. + num_samples (int, optional): The number of samples to be included in the dataset. Default: None, all images. + num_parallel_workers (int, optional): Number of subprocesses used to fetch the dataset in parallel. Default: 1. shuffle (bool, optional): Whether or not to perform shuffle on the dataset. Random accessible input is required. - (default=None, expected order behavior shown in the table). + Default: None, expected order behavior shown in the table. sampler (Sampler, optional): Object used to choose samples from the dataset. Random accessible - input is required (default=None, expected order behavior shown in the table). - num_shards (int, optional): Number of shards that the dataset will be divided into (default=None). + input is required. Default: None, expected order behavior shown in the table. + num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None. Random accessible input is required. When this argument is specified, 'num_samples' reflects the max sample number of per shard. - shard_id (int, optional): The shard ID within `num_shards` (default=None). This argument must be specified only + shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument must be specified only when num_shards is also specified. Random accessible input is required. Raises: @@ -4449,13 +4449,13 @@ class USPSDataset(SourceDataset, VisionBaseDataset): dataset_dir (str): Path to the root directory that contains the dataset. usage (str, optional): Usage of this dataset, can be 'train', 'test' or 'all'. 'train' will read from 7,291 train samples, 'test' will read from 2,007 test samples, 'all' will read from all 9,298 samples. - (default=None, will read all samples) - num_samples (int, optional): The number of images to be included in the dataset - (default=None, will read all images). - num_parallel_workers (int, optional): Number of workers to read the data - (default=None, will use value set in the config). - shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch - (default=Shuffle.GLOBAL). Bool type and Shuffle enum are both supported to pass in. + Default: None, will read all samples. + num_samples (int, optional): The number of images to be included in the dataset. + Default: None, will read all images. + num_parallel_workers (int, optional): Number of workers to read the data. + Default: None, will use value set in the config. + shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch. + Default: Shuffle.GLOBAL. Bool type and Shuffle enum are both supported to pass in. If shuffle is False, no shuffling will be performed; If shuffle is True, the behavior is the same as setting shuffle to be Shuffle.GLOBAL Otherwise, there are two levels of shuffling: @@ -4464,13 +4464,13 @@ class USPSDataset(SourceDataset, VisionBaseDataset): - Shuffle.FILES: Shuffle files only. - num_shards (int, optional): Number of shards that the dataset will be divided into (default=None). + num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None. When this argument is specified, `num_samples` reflects the max sample number of per shard. - shard_id (int, optional): The shard ID within `num_shards` (default=None). This + shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only be specified when `num_shards` is also specified. cache (DatasetCache, optional): Use tensor caching service to speed up dataset processing. More details: - `Single-Node Data Cache `_ - (default=None, which means no cache is used). + `Single-Node Data Cache `_ . + Default: None, which means no cache is used. Raises: RuntimeError: If `dataset_dir` is not valid or does not exist or does not contain data files. @@ -4546,34 +4546,34 @@ class VOCDataset(MappableDataset, VisionBaseDataset): Args: dataset_dir (str): Path to the root directory that contains the dataset. - task (str, optional): Set the task type of reading voc data, now only support 'Segmentation' or 'Detection' - (default= 'Segmentation'). - usage (str, optional): Set the task type of ImageSets(default= 'train'). If task is 'Segmentation', image and + task (str, optional): Set the task type of reading voc data, now only support 'Segmentation' or 'Detection'. + Default: 'Segmentation'. + usage (str, optional): Set the task type of ImageSets. Default: 'train'. If task is 'Segmentation', image and annotation list will be loaded in ./ImageSets/Segmentation/usage + ".txt"; If task is 'Detection', image and annotation list will be loaded in ./ImageSets/Main/usage + ".txt"; if task and usage are not set, image and annotation list will be loaded in ./ImageSets/Segmentation/train.txt as default. class_indexing (dict, optional): A str-to-int mapping from label name to index, only valid in - 'Detection' task (default=None, the folder names will be sorted alphabetically and each - class will be given a unique index starting from 0). - num_samples (int, optional): The number of images to be included in the dataset - (default=None, all images). - num_parallel_workers (int, optional): Number of workers to read the data - (default=None, number set in the config). - shuffle (bool, optional): Whether to perform shuffle on the dataset (default=None, expected - order behavior shown in the table). - decode (bool, optional): Decode the images after reading (default=False). - sampler (Sampler, optional): Object used to choose samples from the dataset - (default=None, expected order behavior shown in the table). + 'Detection' task. Default: None, the folder names will be sorted alphabetically and each + class will be given a unique index starting from 0. + num_samples (int, optional): The number of images to be included in the dataset. + Default: None, all images. + num_parallel_workers (int, optional): Number of workers to read the data. + Default: None, number set in the config. + shuffle (bool, optional): Whether to perform shuffle on the dataset. Default: None, expected + order behavior shown in the table. + decode (bool, optional): Decode the images after reading. Default: False. + sampler (Sampler, optional): Object used to choose samples from the dataset. + Default: None, expected order behavior shown in the table. num_shards (int, optional): Number of shards that the dataset will be divided - into (default=None). When this argument is specified, `num_samples` reflects + into. Default: None. When this argument is specified, `num_samples` reflects the maximum sample number of per shard. - shard_id (int, optional): The shard ID within `num_shards` (default=None). This + shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only be specified when `num_shards` is also specified. cache (DatasetCache, optional): Use tensor caching service to speed up dataset processing. More details: - `Single-Node Data Cache `_ - (default=None, which means no cache is used). + `Single-Node Data Cache `_ . + Default: None, which means no cache is used. extra_metadata(bool, optional): Flag to add extra meta-data to row. If True, an additional column named - :py:obj:`[_meta-filename, dtype=string]` will be output at the end (default=False). + :py:obj:`[_meta-filename, dtype=string]` will be output at the end. Default: False. decrypt (callable, optional): Image decryption function, which accepts the path of the encrypted image file and returns the decrypted bytes data. Default: None, no decryption. @@ -4753,23 +4753,23 @@ class WIDERFaceDataset(MappableDataset, VisionBaseDataset): dataset_dir (str): Path to the root directory that contains the dataset. usage (str, optional): Usage of this dataset, can be 'train', 'test', 'valid' or 'all'. 'train' will read from 12,880 samples, 'test' will read from 16,097 samples, 'valid' will read from 3,226 test samples - and 'all' will read all 'train' and 'valid' samples (default=None, will be set to 'all'). - num_samples (int, optional): The number of images to be included in the dataset - (default=None, will read all images). - num_parallel_workers (int, optional): Number of workers to read the data - (default=None, will use value set in the config). - shuffle (bool, optional): Whether or not to perform shuffle on the dataset - (default=None, expected order behavior shown in the table). - decode (bool, optional): Decode the images after reading (default=False). - sampler (Sampler, optional): Object used to choose samples from the dataset - (default=None, expected order behavior shown in the table). - num_shards (int, optional): Number of shards that the dataset will be divided into (default=None). + and 'all' will read all 'train' and 'valid' samples. Default: None, will be set to 'all'. + num_samples (int, optional): The number of images to be included in the dataset. + Default: None, will read all images. + num_parallel_workers (int, optional): Number of workers to read the data. + Default: None, will use value set in the config. + shuffle (bool, optional): Whether or not to perform shuffle on the dataset. + Default: None, expected order behavior shown in the table. + decode (bool, optional): Decode the images after reading. Default: False. + sampler (Sampler, optional): Object used to choose samples from the dataset. + Default: None, expected order behavior shown in the table. + num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None. When this argument is specified, `num_samples` reflects the maximum sample number of per shard. - shard_id (int, optional): The shard ID within `num_shards` (default=None). This argument can only be specified + shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only be specified when `num_shards` is also specified. cache (DatasetCache, optional): Use tensor caching service to speed up dataset processing. More details: - `Single-Node Data Cache `_ - (default=None, which means no cache is used). + `Single-Node Data Cache `_ . + Default: None, which means no cache is used. Raises: RuntimeError: If `dataset_dir` does not contain data files. diff --git a/mindspore/python/mindspore/dataset/engine/graphdata.py b/mindspore/python/mindspore/dataset/engine/graphdata.py index 297ebf85ae2..343e1ee796b 100644 --- a/mindspore/python/mindspore/dataset/engine/graphdata.py +++ b/mindspore/python/mindspore/dataset/engine/graphdata.py @@ -80,9 +80,9 @@ class GraphData: Args: dataset_file (str): One of file names in the dataset. - num_parallel_workers (int, optional): Number of workers to process the dataset in parallel - (default=None). - working_mode (str, optional): Set working mode, now supports 'local'/'client'/'server' (default='local'). + num_parallel_workers (int, optional): Number of workers to process the dataset in parallel. + Default: None. + working_mode (str, optional): Set working mode, now supports 'local'/'client'/'server'. Default: 'local'. - 'local', used in non-distributed training scenarios. @@ -93,15 +93,15 @@ class GraphData: and is available to the client. hostname (str, optional): Hostname of the graph data server. This parameter is only valid when - working_mode is set to 'client' or 'server' (default='127.0.0.1'). + `working_mode` is set to 'client' or 'server'. Default: '127.0.0.1'. port (int, optional): Port of the graph data server. The range is 1024-65535. This parameter is - only valid when working_mode is set to 'client' or 'server' (default=50051). + only valid when `working_mode` is set to 'client' or 'server'. Default: 50051. num_client (int, optional): Maximum number of clients expected to connect to the server. The server will - allocate resources according to this parameter. This parameter is only valid when working_mode - is set to 'server' (default=1). - auto_shutdown (bool, optional): Valid when working_mode is set to 'server', - when the number of connected clients reaches num_client and no client is being connected, - the server automatically exits (default=True). + allocate resources according to this parameter. This parameter is only valid when `working_mode` + is set to 'server'. Default: 1. + auto_shutdown (bool, optional): Valid when `working_mode` is set to 'server', + when the number of connected clients reaches `num_client` and no client is being connected, + the server automatically exits. Default: True. Raises: ValueError: If `dataset_file` does not exist or permission denied. @@ -320,7 +320,7 @@ class GraphData: Args: node_list (Union[list, numpy.ndarray]): The given list of nodes. neighbor_type (int): Specify the type of neighbor node. - output_format (OutputFormat, optional): Output storage format (default=OutputFormat.NORMAL) + output_format (OutputFormat, optional): Output storage format. Default: OutputFormat.NORMAL. It can be any of [OutputFormat.NORMAL, OutputFormat.COO, OutputFormat.CSR]. Returns: @@ -368,7 +368,7 @@ class GraphData: neighbor_nums (Union[list, numpy.ndarray]): Number of neighbors sampled per hop. neighbor_types (Union[list, numpy.ndarray]): Neighbor type sampled per hop, type of each element in neighbor_types should be int. - strategy (SamplingStrategy, optional): Sampling strategy (default=SamplingStrategy.RANDOM). + strategy (SamplingStrategy, optional): Sampling strategy. Default: SamplingStrategy.RANDOM. It can be any of [SamplingStrategy.RANDOM, SamplingStrategy.EDGE_WEIGHT]. - SamplingStrategy.RANDOM, random sampling with replacement. @@ -501,9 +501,9 @@ class GraphData: Args: target_nodes (list[int]): Start node list in random walk meta_path (list[int]): node type for each walk step - step_home_param (float, optional): return hyper parameter in node2vec algorithm (Default = 1.0). - step_away_param (float, optional): in out hyper parameter in node2vec algorithm (Default = 1.0). - default_node (int, optional): default node if no more neighbors found (Default = -1). + step_home_param (float, optional): return hyper parameter in node2vec algorithm. Default: 1.0. + step_away_param (float, optional): in out hyper parameter in node2vec algorithm. Default: 1.0. + default_node (int, optional): default node if no more neighbors found. Default: -1. A default value of -1 indicates that no node is given. Returns: @@ -550,8 +550,8 @@ class Graph(GraphData): type of corresponding node. If not provided, default type for each node is "0". edge_type(Union[list, numpy.ndarray], optional): type of edges, each element should be string which represent type of corresponding edge. If not provided, default type for each edge is "0". - num_parallel_workers (int, optional): Number of workers to process the dataset in parallel (default=None). - working_mode (str, optional): Set working mode, now supports 'local'/'client'/'server' (default='local'). + num_parallel_workers (int, optional): Number of workers to process the dataset in parallel. Default: None. + working_mode (str, optional): Set working mode, now supports 'local'/'client'/'server'. Default: 'local'. - 'local', used in non-distributed training scenarios. @@ -562,15 +562,15 @@ class Graph(GraphData): and is available to the client. hostname (str, optional): Hostname of the graph data server. This parameter is only valid when - working_mode is set to 'client' or 'server' (default='127.0.0.1'). + `working_mode` is set to 'client' or 'server'. Default: '127.0.0.1'. port (int, optional): Port of the graph data server. The range is 1024-65535. This parameter is - only valid when working_mode is set to 'client' or 'server' (default=50051). + only valid when `working_mode` is set to 'client' or 'server'. Default: 50051. num_client (int, optional): Maximum number of clients expected to connect to the server. The server will - allocate resources according to this parameter. This parameter is only valid when working_mode - is set to 'server' (default=1). - auto_shutdown (bool, optional): Valid when working_mode is set to 'server', - when the number of connected clients reaches num_client and no client is being connected, - the server automatically exits (default=True). + allocate resources according to this parameter. This parameter is only valid when `working_mode` + is set to 'server'. Default: 1. + auto_shutdown (bool, optional): Valid when `working_mode` is set to 'server', + when the number of connected clients reaches `num_client` and no client is being connected, + the server automatically exits. Default: True. Raises: TypeError: If `edges` not list or NumPy array. @@ -813,7 +813,7 @@ class Graph(GraphData): Args: node_list (Union[list, numpy.ndarray]): The given list of nodes. neighbor_type (str): Specify the type of neighbor node. - output_format (OutputFormat, optional): Output storage format (default=OutputFormat.NORMAL) + output_format (OutputFormat, optional): Output storage format. Default: OutputFormat.NORMAL. It can be any of [OutputFormat.NORMAL, OutputFormat.COO, OutputFormat.CSR]. Returns: @@ -865,7 +865,7 @@ class Graph(GraphData): neighbor_nums (Union[list, numpy.ndarray]): Number of neighbors sampled per hop. neighbor_types (Union[list, numpy.ndarray]): Neighbor type sampled per hop, type of each element in neighbor_types should be str. - strategy (SamplingStrategy, optional): Sampling strategy (default=SamplingStrategy.RANDOM). + strategy (SamplingStrategy, optional): Sampling strategy. Default: SamplingStrategy.RANDOM. It can be any of [SamplingStrategy.RANDOM, SamplingStrategy.EDGE_WEIGHT]. - SamplingStrategy.RANDOM, random sampling with replacement. @@ -1275,24 +1275,24 @@ class InMemoryGraphDataset(GeneratorDataset): Args: data_dir (str): directory for loading dataset, here contains origin format data and will be loaded in `process` method. - save_dir (str): relative directory for saving processed dataset, this directory is under `data_dir` - (default="./processed"). + save_dir (str): relative directory for saving processed dataset, this directory is under `data_dir`. + Default: './processed'. column_names (Union[str, list[str]], optional): single column name or list of column names of the dataset, - num of column name should be equal to num of item in return data when implement method like `__getitem__`, - (default="graph"). - num_samples (int, optional): The number of samples to be included in the dataset (default=None, all samples). - num_parallel_workers (int, optional): Number of subprocesses used to fetch the dataset in parallel (default=1). + num of column name should be equal to num of item in return data when implement method like `__getitem__`. + Default: 'graph'. + num_samples (int, optional): The number of samples to be included in the dataset. Default: None, all samples. + num_parallel_workers (int, optional): Number of subprocesses used to fetch the dataset in parallel. Default: 1. shuffle (bool, optional): Whether or not to perform shuffle on the dataset. Random accessible input is required. - (default=None, expected order behavior shown in the table). - num_shards (int, optional): Number of shards that the dataset will be divided into (default=None). + Default: None, expected order behavior shown in the table. + num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None. Random accessible input is required. When this argument is specified, `num_samples` reflects the max sample number of per shard. - shard_id (int, optional): The shard ID within `num_shards` (default=None). This argument must be specified only + shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument must be specified only when num_shards is also specified. Random accessible input is required. python_multiprocessing (bool, optional): Parallelize Python operations with multiple worker process. This - option could be beneficial if the Python operation is computational heavy (default=True). + option could be beneficial if the Python operation is computational heavy. Default: True. max_rowsize(int, optional): Maximum size of row in MB that is used for shared memory allocation to copy - data between processes. This is only used if python_multiprocessing is set to True (default 6 MB). + data between processes. This is only used if python_multiprocessing is set to True. Default: 6 MB. Examples: >>> from mindspore.dataset import InMemoryGraphDataset, Graph @@ -1311,7 +1311,7 @@ class InMemoryGraphDataset(GeneratorDataset): ... def __getitem__(self, index): ... # this method and '__len__' method are required when iterating created dataset ... graph = self.graphs[index] - ... return graph.get_all_edges("0") + ... return graph.get_all_edges('0') ... ... def __len__(self): ... return len(self.graphs) @@ -1385,11 +1385,11 @@ class ArgoverseDataset(InMemoryGraphDataset): num of column name should be equal to num of item in return data when implement method like `__getitem__`, recommend to specify it with `column_names=["edge_index", "x", "y", "cluster", "valid_len", "time_step_len"]` like the following example. - num_parallel_workers (int, optional): Number of subprocesses used to fetch the dataset in parallel (default=1). + num_parallel_workers (int, optional): Number of subprocesses used to fetch the dataset in parallel. Default: 1. shuffle (bool, optional): Whether or not to perform shuffle on the dataset. Random accessible input is required. - (default=None, expected order behavior shown in the table). + Default: None, expected order behavior shown in the table. python_multiprocessing (bool, optional): Parallelize Python operations with multiple worker process. This - option could be beneficial if the Python operation is computational heavy (default=True). + option could be beneficial if the Python operation is computational heavy. Default: True. perf_mode(bool, optional): mode for obtaining higher performance when iterate created dataset(will call `__getitem__` method in this process). Default True, will save all the data in graph (like edge index, node feature and graph feature) into graph feature. diff --git a/mindspore/python/mindspore/dataset/engine/samplers.py b/mindspore/python/mindspore/dataset/engine/samplers.py index 8c56dd834da..eb1b8dfced9 100644 --- a/mindspore/python/mindspore/dataset/engine/samplers.py +++ b/mindspore/python/mindspore/dataset/engine/samplers.py @@ -161,7 +161,7 @@ class BuiltinSampler: def get_num_samples(self): """ Get num_samples value of the current sampler instance. - This parameter can be optionally passed in when defining the Sampler (default is None). + This parameter can be optionally passed in when defining the Sampler. Default: None. This method will return the num_samples value. If the current sampler has child samplers, it will continue to access the child samplers and process the obtained value according to certain rules. @@ -329,12 +329,12 @@ class DistributedSampler(BuiltinSampler): Args: num_shards (int): Number of shards to divide the dataset into. shard_id (int): Shard ID of the current shard, which should within the range of [0, `num_shards`-1]. - shuffle (bool, optional): If True, the indices are shuffled, otherwise it will not be shuffled(default=True). - num_samples (int, optional): The number of samples to draw (default=None, which means sample all elements). + shuffle (bool, optional): If True, the indices are shuffled, otherwise it will not be shuffled. Default: True. + num_samples (int, optional): The number of samples to draw. Default: None, which means sample all elements. offset(int, optional): The starting shard ID where the elements in the dataset are sent to, which should be no more than `num_shards`. This parameter is only valid when a ConcatDataset takes - a DistributedSampler as its sampler. It will affect the number of samples of per shard - (default=-1, which means each shard has the same number of samples). + a DistributedSampler as its sampler. It will affect the number of samples of per shard. + Default: -1, which means each shard has the same number of samples. Raises: TypeError: If `num_shards` is not of type int. @@ -432,12 +432,12 @@ class PKSampler(BuiltinSampler): Args: num_val (int): Number of elements to sample for each class. - num_class (int, optional): Number of classes to sample (default=None, sample all classes). + num_class (int, optional): Number of classes to sample. Default: None, sample all classes. The parameter does not support to specify currently. shuffle (bool, optional): If True, the class IDs are shuffled, otherwise it will not be - shuffled (default=False). - class_column (str, optional): Name of column with class labels for MindDataset (default='label'). - num_samples (int, optional): The number of samples to draw (default=None, which means sample all elements). + shuffled. Default: False. + class_column (str, optional): Name of column with class labels for MindDataset. Default: 'label'. + num_samples (int, optional): The number of samples to draw. Default: None, which means sample all elements. Raises: TypeError: If `shuffle` is not of type bool. @@ -519,8 +519,8 @@ class RandomSampler(BuiltinSampler): Samples the elements randomly. Args: - replacement (bool, optional): If True, put the sample ID back for the next draw (default=False). - num_samples (int, optional): Number of elements to sample (default=None, which means sample all elements). + replacement (bool, optional): If True, put the sample ID back for the next draw. Default: False. + num_samples (int, optional): Number of elements to sample. Default: None, which means sample all elements. Raises: TypeError: If `replacement` is not of type bool. @@ -584,8 +584,8 @@ class SequentialSampler(BuiltinSampler): Samples the dataset elements sequentially that is equivalent to not using a sampler. Args: - start_index (int, optional): Index to start sampling at. (default=None, start at first ID) - num_samples (int, optional): Number of elements to sample (default=None, which means sample all elements). + start_index (int, optional): Index to start sampling at. Default: None, start at first ID. + num_samples (int, optional): Number of elements to sample. Default: None, which means sample all elements. Raises: TypeError: If `start_index` is not of type int. @@ -653,7 +653,7 @@ class SubsetSampler(BuiltinSampler): Args: indices (Iterable): A sequence of indices (Any iterable Python object but string). - num_samples (int, optional): Number of elements to sample (default=None, which means sample all elements). + num_samples (int, optional): Number of elements to sample. Default: None, which means sample all elements. Raises: TypeError: If elements of `indices` are not of type number. @@ -741,7 +741,7 @@ class SubsetRandomSampler(SubsetSampler): Args: indices (Iterable): A sequence of indices (Any iterable Python object but string). - num_samples (int, optional): Number of elements to sample (default=None, which means sample all elements). + num_samples (int, optional): Number of elements to sample. Default: None, which means sample all elements. Raises: TypeError: If elements of `indices` are not of type number. @@ -786,7 +786,7 @@ class IterSampler(Sampler): Args: sampler (iterable object): an user defined iterable object. - num_samples (int, optional): Number of elements to sample (default=None, which means sample all elements). + num_samples (int, optional): Number of elements to sample. Default: None, which means sample all elements. Examples: >>> class MySampler: @@ -817,8 +817,8 @@ class WeightedRandomSampler(BuiltinSampler): Args: weights (list[float, int]): A sequence of weights, not necessarily summing up to 1. - num_samples (int, optional): Number of elements to sample (default=None, which means sample all elements). - replacement (bool): If True, put the sample ID back for the next draw (default=True). + num_samples (int, optional): Number of elements to sample. Default: None, which means sample all elements. + replacement (bool): If True, put the sample ID back for the next draw. Default: True. Raises: TypeError: If elements of `weights` are not of type number. diff --git a/mindspore/python/mindspore/dataset/engine/serializer_deserializer.py b/mindspore/python/mindspore/dataset/engine/serializer_deserializer.py index a3b47f17824..793d7f2982f 100644 --- a/mindspore/python/mindspore/dataset/engine/serializer_deserializer.py +++ b/mindspore/python/mindspore/dataset/engine/serializer_deserializer.py @@ -38,7 +38,7 @@ def serialize(dataset, json_filepath=""): Args: dataset (Dataset): The starting node. - json_filepath (str): The filepath where a serialized JSON file will be generated (default=""). + json_filepath (str): The filepath where a serialized JSON file will be generated. Default: ''. Returns: Dict, the dictionary contains the serialized dataset graph. @@ -62,9 +62,9 @@ def deserialize(input_dict=None, json_filepath=None): Construct dataset pipeline from a JSON file produced by dataset serialize function. Args: - input_dict (dict): A Python dictionary containing a serialized dataset graph (default=None). + input_dict (dict): A Python dictionary containing a serialized dataset graph. Default: None. json_filepath (str): A path to the JSON file containing dataset graph. - User can obtain this file by calling API `mindspore.dataset.serialize()` (default=None). + User can obtain this file by calling API `mindspore.dataset.serialize()`. Default: None. Returns: de.Dataset or None if error occurs. @@ -109,7 +109,7 @@ def show(dataset, indentation=2): Args: dataset (Dataset): The starting node. indentation (int, optional): The indentation used by the JSON print. - Do not indent if indentation is None (default=2). + Do not indent if indentation is None. Default: 2. Examples: >>> dataset = ds.MnistDataset(mnist_dataset_dir, num_samples=100) diff --git a/mindspore/python/mindspore/dataset/text/__init__.py b/mindspore/python/mindspore/dataset/text/__init__.py index 21c551caaa0..2ea26bd65c4 100644 --- a/mindspore/python/mindspore/dataset/text/__init__.py +++ b/mindspore/python/mindspore/dataset/text/__init__.py @@ -41,7 +41,7 @@ The data transform operation can be executed in the data processing pipeline or .. code-block:: - from mindspore.dataset import text + import mindspore.dataset.text as text from mindspore.dataset.text import NormalizeForm # construct vocab diff --git a/mindspore/python/mindspore/dataset/text/transforms.py b/mindspore/python/mindspore/dataset/text/transforms.py index afce4ab48b9..fd7f0d3b6a6 100644 --- a/mindspore/python/mindspore/dataset/text/transforms.py +++ b/mindspore/python/mindspore/dataset/text/transforms.py @@ -104,12 +104,15 @@ class JiebaTokenizer(TextTensorOperation): mp_path (str): Dictionary file is used by MPSegment algorithm. The dictionary can be obtained on the official website of cppjieba. mode (JiebaMode, optional): Valid values can be any of [JiebaMode.MP, JiebaMode.HMM, - JiebaMode.MIX](default=JiebaMode.MIX). + JiebaMode.MIX]. Default: JiebaMode.MIX. - JiebaMode.MP, tokenize with MPSegment algorithm. + - JiebaMode.HMM, tokenize with Hidden Markov Model Segment algorithm. + - JiebaMode.MIX, tokenize with a mix of MPSegment and HMMSegment algorithm. - with_offsets (bool, optional): Whether or not output offsets of tokens (default=False). + + with_offsets (bool, optional): Whether or not output offsets of tokens. Default: False. Raises: ValueError: If path of HMMSegment dict is not provided. @@ -173,7 +176,7 @@ class JiebaTokenizer(TextTensorOperation): word (str): The word to be added to the JiebaTokenizer instance. The added word will not be written into the built-in dictionary on disk. freq (int, optional): The frequency of the word to be added. The higher the frequency, - the better chance the word will be tokenized (default=None, use default frequency). + the better chance the word will be tokenized. Default: None, use default frequency. Examples: >>> import mindspore.dataset.text as text @@ -281,9 +284,9 @@ class Lookup(TextTensorOperation): vocab (Vocab): A vocabulary object. unknown_token (str, optional): Word is used for lookup. In case of the word is out of vocabulary (OOV), the result of lookup will be replaced with unknown_token. If the unknown_token is not specified or - it is OOV, runtime error will be thrown (default=None, means no unknown_token is specified). + it is OOV, runtime error will be thrown. Default: None, means no unknown_token is specified. data_type (mindspore.dtype, optional): The data type that lookup operation maps - string to(default=mindspore.int32). + string to. Default: mindspore.int32. Raises: TypeError: If `vocab` is not of type text.Vocab. @@ -327,13 +330,13 @@ class Ngram(TextTensorOperation): an empty string produced. left_pad (tuple, optional): Padding performed on left side of the sequence shaped like ("pad_token", pad_width). `pad_width` will be capped at n-1. For example, specifying left_pad=("_", 2) would pad left side of the - sequence with "__" (default=("", 0)). + sequence with "__". Default: ('', 0). right_pad (tuple, optional): Padding performed on right side of the sequence shaped like ("pad_token", pad_width). `pad_width` will be capped at n-1. For example, specifying right_pad=("_", 2) - would pad right side of the sequence with "__" (default=("", 0)). + would pad right side of the sequence with "__". Default: ('', 0). separator (str, optional): Symbol used to join strings together. For example, if 2-gram is - ["mindspore", "amazing"] with separator="-", the result would be ["mindspore-amazing"] - (default=" ", which will use whitespace as separator). + ["mindspore", "amazing"] with separator="-", the result would be ["mindspore-amazing"]. + Default: ' ', which will use whitespace as separator. Raises: TypeError: If values of `n` not positive is not of type int. @@ -459,7 +462,7 @@ class SlidingWindow(TextTensorOperation): Args: width (int): The width of the window. It must be an integer and greater than zero. - axis (int, optional): The axis along which the sliding window is computed (default=0). + axis (int, optional): The axis along which the sliding window is computed. Default: 0. Raises: TypeError: If `width` is not of type int. @@ -542,11 +545,11 @@ class ToVectors(TextTensorOperation): Args: vectors (Vectors): A vectors object. - unk_init (sequence, optional): Sequence used to initialize out-of-vectors (OOV) token - (default=None, initialize with zero vectors). + unk_init (sequence, optional): Sequence used to initialize out-of-vectors (OOV) token. + Default: None, initialize with zero vectors. lower_case_backup (bool, optional): Whether to look up the token in the lower case. If False, each token in the original case will be looked up; if True, each token in the original case will be looked up first, if not - found in the keys of the property stoi, the token in the lower case will be looked up (default=False). + found in the keys of the property stoi, the token in the lower case will be looked up. Default: False. Raises: TypeError: If `unk_init` is not of type sequence. @@ -622,7 +625,7 @@ class UnicodeCharTokenizer(TextTensorOperation): Tokenize a scalar tensor of UTF-8 string to Unicode characters. Args: - with_offsets (bool, optional): Whether or not output offsets of tokens (default=False). + with_offsets (bool, optional): Whether or not output offsets of tokens. Default: False. Raises: TypeError: If `with_offsets` is not of type bool. @@ -952,7 +955,7 @@ if platform.system().lower() != 'windows': Args: normalize_form (NormalizeForm, optional): Valid values can be [NormalizeForm.NONE, NormalizeForm.NFC, NormalizeForm.NFKC, NormalizeForm.NFD, NormalizeForm.NFKD] any of the four unicode - normalized forms(default=NormalizeForm.NFKC). + normalized forms. Default: NormalizeForm.NFKC. See http://unicode.org/reports/tr15/ for details. - NormalizeForm.NONE, do nothing for input string tensor. @@ -999,7 +1002,7 @@ if platform.system().lower() != 'windows': pattern (str): the regex expression patterns. replace (str): the string to replace matched element. replace_all (bool, optional): If False, only replace first matched element; - if True, replace all matched elements (default=True). + if True, replace all matched elements. Default: True. Raises: TypeError: If `pattern` is not of type string. @@ -1042,8 +1045,8 @@ if platform.system().lower() != 'windows': The original string will be split by matched elements. keep_delim_pattern (str, optional): The string matched by 'delim_pattern' can be kept as a token if it can be matched by 'keep_delim_pattern'. The default value is an empty str - which means that delimiters will not be kept as an output token (default=''). - with_offsets (bool, optional): Whether or not output offsets of tokens(default=False). + which means that delimiters will not be kept as an output token. Default: ''. + with_offsets (bool, optional): Whether or not output offsets of tokens. Default: False. Raises: TypeError: If `delim_pattern` is not of type string. @@ -1087,8 +1090,8 @@ if platform.system().lower() != 'windows': UnicodeScriptTokenizer is not supported on Windows platform yet. Args: - keep_whitespace (bool, optional): Whether or not emit whitespace tokens (default=False). - with_offsets (bool, optional): Whether or not output offsets of tokens (default=False). + keep_whitespace (bool, optional): Whether or not emit whitespace tokens. Default: False. + with_offsets (bool, optional): Whether or not output offsets of tokens. Default: False. Raises: TypeError: If `keep_whitespace` is not of type bool. @@ -1131,7 +1134,7 @@ if platform.system().lower() != 'windows': WhitespaceTokenizer is not supported on Windows platform yet. Args: - with_offsets (bool, optional): Whether or not output offsets of tokens (default=False). + with_offsets (bool, optional): Whether or not output offsets of tokens. Default: False. Raises: TypeError: If `with_offsets` is not of type bool. diff --git a/mindspore/python/mindspore/dataset/text/utils.py b/mindspore/python/mindspore/dataset/text/utils.py index 8e59e653fe4..f6f5be09819 100644 --- a/mindspore/python/mindspore/dataset/text/utils.py +++ b/mindspore/python/mindspore/dataset/text/utils.py @@ -43,7 +43,7 @@ class CharNGram(cde.CharNGram): max_vectors (int, optional): This can be used to limit the number of pre-trained vectors loaded. Most pre-trained vector sets are sorted in the descending order of word frequency. Thus, in situations where the entire set doesn't fit in memory, or is not needed for another reason, - passing `max_vectors` can limit the size of the loaded set (default=None, no limit). + passing `max_vectors` can limit the size of the loaded set. Default: None, no limit. Returns: CharNGram, CharNGram vector build from a file. @@ -54,7 +54,7 @@ class CharNGram(cde.CharNGram): TypeError: If `max_vectors` is not type of integer. Examples: - >>> from mindspore.dataset import text + >>> import mindspore.dataset.text as text >>> char_n_gram = text.CharNGram.from_file("/path/to/char_n_gram/file", max_vectors=None) """ @@ -79,7 +79,7 @@ class FastText(cde.FastText): max_vectors (int, optional): This can be used to limit the number of pre-trained vectors loaded. Most pre-trained vector sets are sorted in the descending order of word frequency. Thus, in situations where the entire set doesn't fit in memory, or is not needed for another reason, - passing `max_vectors` can limit the size of the loaded set (default=None, no limit). + passing `max_vectors` can limit the size of the loaded set. Default: None, no limit. Returns: FastText, FastText vector build from a file. @@ -90,7 +90,7 @@ class FastText(cde.FastText): TypeError: If `max_vectors` is not type of integer. Examples: - >>> from mindspore.dataset import text + >>> import mindspore.dataset.text as text >>> fast_text = text.FastText.from_file("/path/to/fast_text/file", max_vectors=None) """ @@ -115,7 +115,7 @@ class GloVe(cde.GloVe): max_vectors (int, optional): This can be used to limit the number of pre-trained vectors loaded. Most pre-trained vector sets are sorted in the descending order of word frequency. Thus, in situations where the entire set doesn't fit in memory, or is not needed for another reason, - passing `max_vectors` can limit the size of the loaded set (default=None, no limit). + passing `max_vectors` can limit the size of the loaded set. Default: None, no limit. Returns: GloVe, GloVe vector build from a file. @@ -126,7 +126,7 @@ class GloVe(cde.GloVe): TypeError: If `max_vectors` is not type of integer. Examples: - >>> from mindspore.dataset import text + >>> import mindspore.dataset.text as text >>> glove = text.GloVe.from_file("/path/to/glove/file", max_vectors=None) """ @@ -356,7 +356,7 @@ class Vectors(cde.Vectors): max_vectors (int, optional): This can be used to limit the number of pre-trained vectors loaded. Most pre-trained vector sets are sorted in the descending order of word frequency. Thus, in situations where the entire set doesn't fit in memory, or is not needed for another reason, - passing `max_vectors` can limit the size of the loaded set (default=None, no limit). + passing `max_vectors` can limit the size of the loaded set. Default: None, no limit. Returns: Vectors, Vectors build from a file. @@ -367,7 +367,7 @@ class Vectors(cde.Vectors): TypeError: If `max_vectors` is not type of integer. Examples: - >>> from mindspore.dataset import text + >>> import mindspore.dataset.text as text >>> vector = text.Vectors.from_file("/path/to/vectors/file", max_vectors=None) """ @@ -399,27 +399,27 @@ class Vocab: Args: dataset (Dataset): dataset to build vocab from. columns (list[str], optional): column names to get words from. It can be a list of column names. - (default=None). + Default: None. freq_range (tuple, optional): A tuple of integers (min_frequency, max_frequency). Words within the frequency range would be kept. 0 <= min_frequency <= max_frequency <= total_words. min_frequency=0 is the same as min_frequency=1. max_frequency > total_words is the same as max_frequency = total_words. - min_frequency/max_frequency can be None, which corresponds to 0/total_words separately - (default=None, all words are included). + min_frequency/max_frequency can be None, which corresponds to 0/total_words separately. + Default: None, all words are included. top_k (int, optional): top_k is greater than 0. Number of words to be built into vocab. top_k means most - frequent words are taken. top_k is taken after freq_range. If not enough top_k, all words will be taken - (default=None, all words are included). + frequent words are taken. top_k is taken after freq_range. If not enough top_k, all words will be taken. + Default: None, all words are included. special_tokens (list, optional): A list of strings, each one is a special token. For example - special_tokens=["",""] (default=None, no special tokens will be added). + special_tokens=["",""]. Default: None, no special tokens will be added. special_first (bool, optional): Whether special_tokens will be prepended/appended to vocab. If - special_tokens is specified and special_first is set to True, special_tokens will be prepended - (default=True). + special_tokens is specified and special_first is set to True, special_tokens will be prepended. + Default: True. Returns: Vocab, Vocab object built from the dataset. Examples: >>> import mindspore.dataset as ds - >>> from mindspore.dataset import text + >>> import mindspore.dataset.text as text >>> dataset = ds.TextFileDataset("/path/to/sentence/piece/vocab/file", shuffle=False) >>> vocab = text.Vocab.from_dataset(dataset, "text", freq_range=None, top_k=None, ... special_tokens=["", ""], @@ -440,15 +440,15 @@ class Vocab: Args: word_list (list): A list of string where each element is a word of type string. special_tokens (list, optional): A list of strings, each one is a special token. For example - special_tokens=["",""] (default=None, no special tokens will be added). + special_tokens=["",""]. Default: None, no special tokens will be added. special_first (bool, optional): Whether special_tokens is prepended or appended to vocab. If special_tokens - is specified and special_first is set to True, special_tokens will be prepended (default=True). + is specified and special_first is set to True, special_tokens will be prepended. Default: True. Returns: Vocab, Vocab object built from the list. Examples: - >>> from mindspore.dataset import text + >>> import mindspore.dataset.text as text >>> vocab = text.Vocab.from_list(["w1", "w2", "w3"], special_tokens=[""], special_first=True) """ @@ -467,19 +467,19 @@ class Vocab: Args: file_path (str): Path to the file which contains the vocab list. delimiter (str, optional): A delimiter to break up each line in file, the first element is taken to be - the word (default="", the whole line will be treated as a word). - vocab_size (int, optional): Number of words to read from file_path (default=None, all words are taken). + the word. Default: '', the whole line will be treated as a word. + vocab_size (int, optional): Number of words to read from file_path. Default: None, all words are taken. special_tokens (list, optional): A list of strings, each one is a special token. For example - special_tokens=["",""] (default=None, no special tokens will be added). + special_tokens=["",""]. Default: None, no special tokens will be added. special_first (bool, optional): Whether special_tokens will be prepended/appended to vocab, If special_tokens is specified and special_first is set to True, - special_tokens will be prepended (default=True). + special_tokens will be prepended. Default: True. Returns: Vocab, Vocab object built from the file. Examples: - >>> from mindspore.dataset import text + >>> import mindspore.dataset.text as text >>> # Assume vocab file contains the following content: >>> # --- begin of file --- >>> # apple,apple2 @@ -517,7 +517,7 @@ class Vocab: Vocab, Vocab object built from the dict. Examples: - >>> from mindspore.dataset import text + >>> import mindspore.dataset.text as text >>> vocab = text.Vocab.from_dict({"home": 3, "behind": 2, "the": 4, "world": 5, "": 6}) """ @@ -533,7 +533,7 @@ class Vocab: A vocabulary consisting of word and id pairs. Examples: - >>> from mindspore.dataset import text + >>> import mindspore.dataset.text as text >>> vocab = text.Vocab.from_list(["word_1", "word_2", "word_3", "word_4"]) >>> vocabory_dict = vocab.vocab() """ @@ -553,7 +553,7 @@ class Vocab: The token id or list of token ids. Examples: - >>> from mindspore.dataset import text + >>> import mindspore.dataset.text as text >>> vocab = text.Vocab.from_list(["w1", "w2", "w3"], special_tokens=[""], special_first=True) >>> ids = vocab.tokens_to_ids(["w1", "w3"]) """ @@ -577,7 +577,7 @@ class Vocab: The decoded token(s). Examples: - >>> from mindspore.dataset import text + >>> import mindspore.dataset.text as text >>> vocab = text.Vocab.from_list(["w1", "w2", "w3"], special_tokens=[""], special_first=True) >>> token = vocab.ids_to_tokens(0) """ @@ -595,7 +595,7 @@ def to_bytes(array, encoding='utf8'): Args: array (numpy.ndarray): Array of `str` type representing strings. - encoding (str): Indicating the charset for encoding (default='utf8'). + encoding (str): Indicating the charset for encoding. Default: 'utf8'. Returns: numpy.ndarray, NumPy array of `bytes`. @@ -622,7 +622,7 @@ def to_str(array, encoding='utf8'): Args: array (numpy.ndarray): Array of `bytes` type representing strings. - encoding (str): Indicating the charset for decoding (default='utf8'). + encoding (str): Indicating the charset for decoding. Default: 'utf8'. Returns: numpy.ndarray, NumPy array of `str`. diff --git a/mindspore/python/mindspore/dataset/transforms/c_transforms.py b/mindspore/python/mindspore/dataset/transforms/c_transforms.py index 4cc4906cb9c..31ace38e556 100644 --- a/mindspore/python/mindspore/dataset/transforms/c_transforms.py +++ b/mindspore/python/mindspore/dataset/transforms/c_transforms.py @@ -375,10 +375,10 @@ class Concatenate(TensorOperation): Tensor operation that concatenates all columns into a single tensor. Args: - axis (int, optional): Concatenate the tensors along given axis (Default=0). - prepend (numpy.array, optional): NumPy array to be prepended to the already concatenated tensors - (Default=None). - append (numpy.array, optional): NumPy array to be appended to the already concatenated tensors (Default=None). + axis (int, optional): Concatenate the tensors along given axis. Default: 0. + prepend (numpy.array, optional): NumPy array to be prepended to the already concatenated tensors. + Default: None. + append (numpy.array, optional): NumPy array to be appended to the already concatenated tensors. Default: None. Raises: TypeError: If `axis` is not of type int. @@ -534,7 +534,7 @@ class RandomApply(TensorOperation): Args: transforms (list): List of transformations to be applied. - prob (float, optional): The probability to apply the transformation list (default=0.5). + prob (float, optional): The probability to apply the transformation list. Default: 0.5. Raises: TypeError: If `transforms` is not of type list. diff --git a/mindspore/python/mindspore/dataset/transforms/py_transforms.py b/mindspore/python/mindspore/dataset/transforms/py_transforms.py index 79c6e5f60eb..c58d15f750f 100644 --- a/mindspore/python/mindspore/dataset/transforms/py_transforms.py +++ b/mindspore/python/mindspore/dataset/transforms/py_transforms.py @@ -94,7 +94,7 @@ class OneHotOp(PyTensorOperation): num_classes (int): Number of classes of objects in dataset. It should be larger than the largest label number in the dataset. smoothing_rate (float, optional): Adjustable hyperparameter for label smoothing level. - (Default=0.0 means no smoothing is applied.) + Default: 0.0, means no smoothing is applied. Raises: TypeError: `num_classes` is not of type int. @@ -260,7 +260,7 @@ class RandomApply(PyTensorOperation): Args: transforms (list): List of transformations to apply. - prob (float, optional): The probability to apply the transformation list (default=0.5). + prob (float, optional): The probability to apply the transformation list. Default: 0.5. Raises: TypeError: If `transforms` is not of type list. diff --git a/mindspore/python/mindspore/dataset/transforms/transforms.py b/mindspore/python/mindspore/dataset/transforms/transforms.py index 91b0794fbcb..9ebcecd7293 100644 --- a/mindspore/python/mindspore/dataset/transforms/transforms.py +++ b/mindspore/python/mindspore/dataset/transforms/transforms.py @@ -346,10 +346,10 @@ class Concatenate(TensorOperation): Tensor operation that concatenates all columns into a single tensor, only 1D tenspr is supported. Args: - axis (int, optional): Concatenate the tensors along given axis (Default=0). - prepend (numpy.ndarray, optional): NumPy array to be prepended to the already concatenated tensors - (Default=None). - append (numpy.ndarray, optional): NumPy array to be appended to the already concatenated tensors (Default=None). + axis (int, optional): Concatenate the tensors along given axis. Default: 0. + prepend (numpy.ndarray, optional): NumPy array to be prepended to the already concatenated tensors. + Default: None. + append (numpy.ndarray, optional): NumPy array to be appended to the already concatenated tensors. Default: None. Raises: TypeError: If `axis` is not of type int. @@ -513,7 +513,7 @@ class OneHot(TensorOperation): num_classes (int): Number of classes of objects in dataset. It should be larger than the largest label number in the dataset. smoothing_rate (float, optional): Adjustable hyperparameter for label smoothing level. - (Default=0.0 means no smoothing is applied.) + Default: 0.0, means no smoothing is applied. Raises: TypeError: `num_classes` is not of type int. @@ -629,7 +629,7 @@ class RandomApply(CompoundOperation): Args: transforms (list): List of transformations to be applied. - prob (float, optional): The probability to apply the transformation list (default=0.5). + prob (float, optional): The probability to apply the transformation list. Default: 0.5. Raises: TypeError: If `transforms` is not of type list. diff --git a/mindspore/python/mindspore/dataset/utils/browse_dataset.py b/mindspore/python/mindspore/dataset/utils/browse_dataset.py index 1eb7996eee4..27bf3b70b33 100644 --- a/mindspore/python/mindspore/dataset/utils/browse_dataset.py +++ b/mindspore/python/mindspore/dataset/utils/browse_dataset.py @@ -32,22 +32,22 @@ def imshow_det_bbox(image, bboxes, labels, segm=None, class_names=None, score_th bboxes (numpy.ndarray): Bounding boxes (with scores), shaped (N, 4) or (N, 5), data should be ordered with (N, x, y, w, h). labels (numpy.ndarray): Labels of bboxes, shaped (N, 1). - segm (numpy.ndarray): The segmentation masks of image in M classes, shaped (M, H, W) (Default=None). - class_names (list[str], tuple[str], dict): Names of each class to map label to class name - (Default=None, only display label). - score_threshold (float): Minimum score of bboxes to be shown (Default=0). + segm (numpy.ndarray): The segmentation masks of image in M classes, shaped (M, H, W). Default: None. + class_names (list[str], tuple[str], dict): Names of each class to map label to class name. + Default: None, only display label. + score_threshold (float): Minimum score of bboxes to be shown. Default: 0. bbox_color (tuple(int)): Color of bbox lines. - The tuple of color should be in BGR order (Default=(0, 255 ,0), means 'green'). + The tuple of color should be in BGR order. Default: (0, 255 ,0), means 'green'. text_color (tuple(int)): Color of texts. - The tuple of color should be in BGR order (Default=(203, 192, 255), means 'pink'). + The tuple of color should be in BGR order. Default: (203, 192, 255), means 'pink'. mask_color (tuple(int)): Color of mask. - The tuple of color should be in BGR order (Default=(128, 0, 128), means 'purple'). - thickness (int): Thickness of lines (Default=2). - font_size (int, float): Font size of texts (Default=0.8). - show (bool): Whether to show the image (Default=True). - win_name (str): The window name (Default="win"). - wait_time (int): Value of waitKey param (Default=2000, means display interval is 2000ms). - out_file (str, optional): The filename to write the imagee (Default=None). File extension name + The tuple of color should be in BGR order. Default: (128, 0, 128), means 'purple'. + thickness (int): Thickness of lines. Default: 2. + font_size (int, float): Font size of texts. Default: 0.8. + show (bool): Whether to show the image. Default: True. + win_name (str): The window name. Default: "win". + wait_time (int): Value of waitKey param. Default: 2000, means display interval is 2000ms. + out_file (str, optional): The filename to write the imagee. Default: None. File extension name is required to indicate the image compression type, e.g. 'jpg', 'png'. Returns: diff --git a/mindspore/python/mindspore/dataset/vision/c_transforms.py b/mindspore/python/mindspore/dataset/vision/c_transforms.py index 871ee0b27bd..bd206247c53 100644 --- a/mindspore/python/mindspore/dataset/vision/c_transforms.py +++ b/mindspore/python/mindspore/dataset/vision/c_transforms.py @@ -151,7 +151,7 @@ class AdjustGamma(ImageTensorOperation): The output image pixel value is exponentially related to the input image pixel value. gamma larger than 1 make the shadows darker, while gamma smaller than 1 make dark regions lighter. - gain (float, optional): The constant multiplier (default=1.0). + gain (float, optional): The constant multiplier. Default: 1.0. Raises: TypeError: If `gain` is not of type float. @@ -185,8 +185,8 @@ class AutoAugment(ImageTensorOperation): This operation works only with 3-channel RGB images. Args: - policy (AutoAugmentPolicy, optional): AutoAugment policies learned on different datasets - (default=AutoAugmentPolicy.IMAGENET). + policy (AutoAugmentPolicy, optional): AutoAugment policies learned on different datasets. + Default: AutoAugmentPolicy.IMAGENET. It can be any of [AutoAugmentPolicy.IMAGENET, AutoAugmentPolicy.CIFAR10, AutoAugmentPolicy.SVHN]. Randomly apply 2 operations from a candidate set. See auto augmentation details in AutoAugmentPolicy. @@ -196,7 +196,7 @@ class AutoAugment(ImageTensorOperation): - AutoAugmentPolicy.SVHN, means to apply AutoAugment learned on SVHN dataset. - interpolation (Inter, optional): Image interpolation mode for Resize operation (default=Inter.NEAREST). + interpolation (Inter, optional): Image interpolation mode for Resize operation. Default: Inter.NEAREST. It can be any of [Inter.NEAREST, Inter.BILINEAR, Inter.BICUBIC, Inter.AREA]. - Inter.NEAREST: means interpolation method is nearest-neighbor interpolation. @@ -209,8 +209,8 @@ class AutoAugment(ImageTensorOperation): fill_value (Union[int, tuple], optional): Pixel fill value for the area outside the transformed image. It can be an int or a 3-tuple. If it is a 3-tuple, it is used to fill R, G, B channels respectively. - If it is an integer, it is used for all RGB channels. The fill_value values must be in range [0, 255] - (default=0). + If it is an integer, it is used for all RGB channels. The fill_value values must be in range [0, 255]. + Default: 0. Raises: TypeError: If `policy` is not of type AutoAugmentPolicy. @@ -252,9 +252,9 @@ class AutoContrast(ImageTensorOperation): Args: cutoff (float, optional): Percent of lightest and darkest pixels to cut off from - the histogram of input image. The value must be in the range [0.0, 50.0) (default=0.0). + the histogram of input image. The value must be in the range [0.0, 50.0). Default: 0.0. ignore (Union[int, sequence], optional): The background pixel values to ignore, - The ignore values must be in range [0, 255] (default=None). + The ignore values must be in range [0, 255]. Default: None. Raises: TypeError: If `cutoff` is not of type float. @@ -294,7 +294,7 @@ class BoundingBoxAugment(ImageTensorOperation): transform (TensorOperation): C++ transformation operation to be applied on random selection of bounding box regions of a given image. ratio (float, optional): Ratio of bounding boxes to apply augmentation on. - Range: [0.0, 1.0] (default=0.3). + Range: [0.0, 1.0]. Default: 0.3. Raises: TypeError: If `transform` is not an image processing operation @@ -495,8 +495,8 @@ class CutMixBatch(ImageTensorOperation): Args: image_batch_format (ImageBatchFormat): The method of padding. Can be any of [ImageBatchFormat.NHWC, ImageBatchFormat.NCHW]. - alpha (float, optional): Hyperparameter of beta distribution, must be larger than 0 (default = 1.0). - prob (float, optional): The probability by which CutMix is applied to each image, range: [0, 1] (default = 1.0). + alpha (float, optional): Hyperparameter of beta distribution, must be larger than 0. Default: 1.0. + prob (float, optional): The probability by which CutMix is applied to each image, range: [0, 1]. Default: 1.0. Raises: TypeError: If `image_batch_format` is not of type :class:`mindspore.dataset.vision.ImageBatchFormat`. @@ -537,7 +537,7 @@ class CutOut(ImageTensorOperation): Args: length (int): The side length of each square patch, must be larger than 0. - num_patches (int, optional): Number of patches to be cut out of an image, must be larger than 0. (default=1). + num_patches (int, optional): Number of patches to be cut out of an image, must be larger than 0. Default: 1. Raises: TypeError: If `length` is not of type int. @@ -570,7 +570,7 @@ class Decode(ImageTensorOperation): Decode the input image. Args: - rgb (bool, optional): Mode of decoding input image (default=True). + rgb (bool, optional): Mode of decoding input image. Default: True. If True means format of decoded image is RGB else BGR (deprecated). Raises: @@ -643,8 +643,8 @@ class GaussianBlur(ImageTensorOperation): kernel_size (Union[int, Sequence[int]]): Size of the Gaussian kernel to use. The value must be positive and odd. If only an integer is provided, the kernel size will be (kernel_size, kernel_size). If a sequence of integer is provided, it must be a sequence of 2 values which represents (width, height). - sigma (Union[float, Sequence[float]], optional): Standard deviation of the Gaussian kernel to use - (default=None). The value must be positive. If only a float is provided, the sigma will be (sigma, sigma). + sigma (Union[float, Sequence[float]], optional): Standard deviation of the Gaussian kernel to use. + Default: None. The value must be positive. If only a float is provided, the sigma will be (sigma, sigma). If a sequence of float is provided, it must be a sequence of 2 values which represents (width, height). If None is provided, the sigma will be calculated as ((kernel_size - 1) * 0.5 - 1) * 0.3 + 0.8. @@ -771,7 +771,7 @@ class MixUpBatch(ImageTensorOperation): Note that you need to make labels into one-hot format and batched before calling this operation. Args: - alpha (float, optional): Hyperparameter of beta distribution. The value must be positive (default = 1.0). + alpha (float, optional): Hyperparameter of beta distribution. The value must be positive. Default: 1.0. Raises: TypeError: If `alpha` is not of type float. @@ -851,7 +851,7 @@ class NormalizePad(ImageTensorOperation): The mean values must be in range (0.0, 255.0]. std (sequence): List or tuple of standard deviations for each channel, with respect to channel order. The standard deviation values must be in range (0.0, 255.0]. - dtype (str, optional): Set the dtype of the output image (default is "float32"). + dtype (str, optional): Set the dtype of the output image. Default: "float32". Raises: TypeError: If `mean` is not of type sequence. @@ -899,8 +899,8 @@ class Pad(ImageTensorOperation): fill_value (Union[int, tuple[int]], optional): The pixel intensity of the borders, only valid for padding_mode Border.CONSTANT. If it is a 3-tuple, it is used to fill R, G, B channels respectively. If it is an integer, it is used for all RGB channels. - The fill_value values must be in range [0, 255] (default=0). - padding_mode (Border, optional): The method of padding (default=Border.CONSTANT). Can be any of + The fill_value values must be in range [0, 255]. Default: 0. + padding_mode (Border, optional): The method of padding. Default: Border.CONSTANT. Can be any of [Border.CONSTANT, Border.EDGE, Border.REFLECT, Border.SYMMETRIC]. - Border.CONSTANT, means it fills the border with constant values. @@ -959,7 +959,7 @@ class RandomAdjustSharpness(ImageTensorOperation): Degree of 0.0 gives a blurred image, degree of 1.0 gives the original image, and degree of 2.0 increases the sharpness by a factor of 2. prob (float, optional): Probability of the image being sharpness adjusted, which - must be in range of [0, 1] (default=0.5). + must be in range of [0, 1]. Default: 0.5. Raises: TypeError: If `degree` is not of type float. @@ -996,7 +996,7 @@ class RandomAffine(ImageTensorOperation): If `degrees` is a number, the range will be (-degrees, degrees). If `degrees` is a sequence, it should be (min, max). translate (sequence, optional): Sequence (tx_min, tx_max, ty_min, ty_max) of minimum/maximum translation in - x(horizontal) and y(vertical) directions, range [-1.0, 1.0] (default=None). + x(horizontal) and y(vertical) directions, range [-1.0, 1.0]. Default: None. The horizontal and vertical shift is selected randomly from the range: (tx_min*width, tx_max*width) and (ty_min*height, ty_max*height), respectively. If a tuple or list of size 2, then a translate parallel to the X axis in the range of @@ -1005,16 +1005,16 @@ class RandomAffine(ImageTensorOperation): (translate[0], translate[1]) and a translate parallel to the Y axis in the range of (translate[2], translate[3]) are applied. If None, no translation is applied. - scale (sequence, optional): Scaling factor interval, which must be non negative - (default=None, original scale is used). - shear (Union[int, float, sequence], optional): Range of shear factor, which must be positive (default=None). + scale (sequence, optional): Scaling factor interval, which must be non negative. + Default: None, original scale is used. + shear (Union[int, float, sequence], optional): Range of shear factor, which must be positive. Default: None. If a number, then a shear parallel to the X axis in the range of (-shear, +shear) is applied. If a tuple or list of size 2, then a shear parallel to the X axis in the range of (shear[0], shear[1]) is applied. If a tuple or list of size 4, then a shear parallel to X axis in the range of (shear[0], shear[1]) and a shear parallel to Y axis in the range of (shear[2], shear[3]) is applied. If None, no shear is applied. - resample (Inter, optional): An optional resampling filter (default=Inter.NEAREST). + resample (Inter, optional): An optional resampling filter. Default: Inter.NEAREST. It can be any of [Inter.BILINEAR, Inter.NEAREST, Inter.BICUBIC, Inter.AREA]. - Inter.BILINEAR, means resample method is bilinear interpolation. @@ -1027,7 +1027,7 @@ class RandomAffine(ImageTensorOperation): fill_value (Union[int, tuple[int]], optional): Optional fill_value to fill the area outside the transform in the output image. There must be three elements in tuple and the value of single element is [0, 255]. - (default=0, filling is performed). + Default: 0, filling is performed. Raises: TypeError: If `degrees` is not of type int, float or sequence. @@ -1106,11 +1106,11 @@ class RandomAutoContrast(ImageTensorOperation): Args: cutoff (float, optional): Percent of the lightest and darkest pixels to be cut off from - the histogram of the input image. The value must be in range of [0.0, 50.0) (default=0.0). + the histogram of the input image. The value must be in range of [0.0, 50.0). Default: 0.0. ignore (Union[int, sequence], optional): The background pixel values to be ignored, each of - which must be in range of [0, 255] (default=None). + which must be in range of [0, 255]. Default: None. prob (float, optional): Probability of the image being automatically contrasted, which - must be in range of [0, 1] (default=0.5). + must be in range of [0, 1]. Default: 0.5. Raises: TypeError: If `cutoff` is not of type float. @@ -1153,7 +1153,7 @@ class RandomColor(ImageTensorOperation): Args: degrees (Sequence[float], optional): Range of random color adjustment degrees, which must be non-negative. It should be in (min, max) format. If min=max, then it is a - single fixed magnitude operation (default=(0.1, 1.9)). + single fixed magnitude operation. Default: (0.1, 1.9). Raises: TypeError: If `degrees` is not of type Sequence[float]. @@ -1186,19 +1186,19 @@ class RandomColorAdjust(ImageTensorOperation): This operation supports running on Ascend or GPU platforms by Offload. Args: - brightness (Union[float, Sequence[float]], optional): Brightness adjustment factor (default=(1, 1)). + brightness (Union[float, Sequence[float]], optional): Brightness adjustment factor. Default: (1, 1). Cannot be negative. If it is a float, the factor is uniformly chosen from the range [max(0, 1-brightness), 1+brightness]. If it is a sequence, it should be [min, max] for the range. - contrast (Union[float, Sequence[float]], optional): Contrast adjustment factor (default=(1, 1)). + contrast (Union[float, Sequence[float]], optional): Contrast adjustment factor. Default: (1, 1). Cannot be negative. If it is a float, the factor is uniformly chosen from the range [max(0, 1-contrast), 1+contrast]. If it is a sequence, it should be [min, max] for the range. - saturation (Union[float, Sequence[float]], optional): Saturation adjustment factor (default=(1, 1)). + saturation (Union[float, Sequence[float]], optional): Saturation adjustment factor. Default: (1, 1). Cannot be negative. If it is a float, the factor is uniformly chosen from the range [max(0, 1-saturation), 1+saturation]. If it is a sequence, it should be [min, max] for the range. - hue (Union[float, Sequence[float]], optional): Hue adjustment factor (default=(0, 0)). + hue (Union[float, Sequence[float]], optional): Hue adjustment factor. Default: (0, 0). If it is a float, the range will be [-hue, hue]. Value should be 0 <= hue <= 0.5. If it is a sequence, it should be [min, max] where -0.5 <= min <= max <= 0.5. @@ -1266,7 +1266,7 @@ class RandomCrop(ImageTensorOperation): If size is an integer, a square crop of size (size, size) is returned. If size is a sequence of length 2, an image of size (height, width) will be cropped. padding (Union[int, Sequence[int]], optional): The number of pixels to pad each border of the image. - The padding value(s) must be non-negative (default=None). + The padding value(s) must be non-negative. Default: None. If padding is not None, pad image first with padding values. If a single number is provided, pad all borders with this value. If a tuple or lists of 2 values are provided, pad the (left and top) @@ -1274,12 +1274,12 @@ class RandomCrop(ImageTensorOperation): If 4 values are provided as a list or tuple, pad the left, top, right and bottom respectively. pad_if_needed (bool, optional): Pad the image if either side is smaller than - the given output size (default=False). + the given output size. Default: False. fill_value (Union[int, tuple[int]], optional): The pixel intensity of the borders, only valid for padding_mode Border.CONSTANT. If it is a 3-tuple, it is used to fill R, G, B channels respectively. If it is an integer, it is used for all RGB channels. - The fill_value values must be in range [0, 255] (default=0). - padding_mode (Border, optional): The method of padding (default=Border.CONSTANT). It can be any of + The fill_value values must be in range [0, 255]. Default: 0. + padding_mode (Border, optional): The method of padding. Default: Border.CONSTANT. It can be any of [Border.CONSTANT, Border.EDGE, Border.REFLECT, Border.SYMMETRIC]. - Border.CONSTANT, means it fills the border with constant values. @@ -1354,10 +1354,10 @@ class RandomCropDecodeResize(ImageTensorOperation): If size is an integer, a square crop of size (size, size) is returned. If size is a sequence of length 2, an image of size (height, width) will be cropped. scale (Union[list, tuple], optional): Range [min, max) of respective size of the - original size to be cropped, which must be non-negative (default=(0.08, 1.0)). + original size to be cropped, which must be non-negative. Default: (0.08, 1.0). ratio (Union[list, tuple], optional): Range [min, max) of aspect ratio to be - cropped, which must be non-negative (default=(3. / 4., 4. / 3.)). - interpolation (Inter, optional): Image interpolation mode for resize operation (default=Inter.BILINEAR). + cropped, which must be non-negative. Default: (3. / 4., 4. / 3.). + interpolation (Inter, optional): Image interpolation mode for resize operation. Default: Inter.BILINEAR. It can be any of [Inter.BILINEAR, Inter.NEAREST, Inter.BICUBIC, Inter.AREA, Inter.PILCUBIC]. - Inter.BILINEAR, means interpolation method is bilinear interpolation. @@ -1371,7 +1371,7 @@ class RandomCropDecodeResize(ImageTensorOperation): - Inter.PILCUBIC, means interpolation method is bicubic interpolation like implemented in pillow, input should be in 3 channels format. - max_attempts (int, optional): The maximum number of attempts to propose a valid crop_area (default=10). + max_attempts (int, optional): The maximum number of attempts to propose a valid crop_area. Default: 10. If exceeded, fall back to use center_crop instead. The max_attempts value must be positive. Raises: @@ -1436,19 +1436,19 @@ class RandomCropWithBBox(ImageTensorOperation): If size is an integer, a square crop of size (size, size) is returned. If size is a sequence of length 2, an image of size (height, width) will be cropped. padding (Union[int, Sequence[int]], optional): The number of pixels to pad the image - The padding value(s) must be non-negative (default=None). + The padding value(s) must be non-negative. Default: None. If padding is not None, first pad image with padding values. If a single number is provided, pad all borders with this value. If a tuple or lists of 2 values are provided, pad the (left and top) with the first value and (right and bottom) with the second value. If 4 values are provided as a list or tuple, pad the left, top, right and bottom respectively. pad_if_needed (bool, optional): Pad the image if either side is smaller than - the given output size (default=False). + the given output size. Default: False. fill_value (Union[int, tuple[int]], optional): The pixel intensity of the borders, only valid for padding_mode Border.CONSTANT. If it is a 3-tuple, it is used to fill R, G, B channels respectively. If it is an integer, it is used for all RGB channels. - The fill_value values must be in range [0, 255] (default=0). - padding_mode (Border, optional): The method of padding (default=Border.CONSTANT). It can be any of + The fill_value values must be in range [0, 255]. Default: 0. + padding_mode (Border, optional): The method of padding. Default: Border.CONSTANT. It can be any of [Border.CONSTANT, Border.EDGE, Border.REFLECT, Border.SYMMETRIC]. - Border.CONSTANT, means it fills the border with constant values. @@ -1520,7 +1520,7 @@ class RandomEqualize(ImageTensorOperation): Args: prob (float, optional): Probability of the image being equalized, which - must be in range of [0, 1] (default=0.5). + must be in range of [0, 1]. Default: 0.5. Raises: TypeError: If `prob` is not of type float. @@ -1553,7 +1553,7 @@ class RandomHorizontalFlip(ImageTensorOperation): This operation supports running on Ascend or GPU platforms by Offload. Args: - prob (float, optional): Probability of the image being flipped, which must be in range of [0, 1] (default=0.5). + prob (float, optional): Probability of the image being flipped, which must be in range of [0, 1]. Default: 0.5. Raises: TypeError: If `prob` is not of type float. @@ -1583,7 +1583,7 @@ class RandomHorizontalFlipWithBBox(ImageTensorOperation): Flip the input image horizontally randomly with a given probability and adjust bounding boxes accordingly. Args: - prob (float, optional): Probability of the image being flipped, which must be in range of [0, 1] (default=0.5). + prob (float, optional): Probability of the image being flipped, which must be in range of [0, 1]. Default: 0.5. Raises: TypeError: If `prob` is not of type float. @@ -1613,7 +1613,7 @@ class RandomInvert(ImageTensorOperation): Randomly invert the colors of image with a given probability. Args: - prob (float, optional): Probability of the image being inverted, which must be in range of [0, 1] (default=0.5). + prob (float, optional): Probability of the image being inverted, which must be in range of [0, 1]. Default: 0.5. Raises: TypeError: If `prob` is not of type float. @@ -1644,7 +1644,7 @@ class RandomLighting(ImageTensorOperation): calculated from the imagenet dataset. Args: - alpha (float, optional): Intensity of the image, which must be non-negative (default=0.05). + alpha (float, optional): Intensity of the image, which must be non-negative. Default: 0.05. Raises: TypeError: If `alpha` is not of type float. @@ -1678,7 +1678,7 @@ class RandomPosterize(ImageTensorOperation): Bits values must be in range of [1,8], and include at least one integer value in the given range. It must be in (min, max) or integer format. If min=max, then it is a single fixed - magnitude operation (default=(8, 8)). + magnitude operation. Default: (8, 8). Raises: TypeError: If `bits` is not of type int or sequence of int. @@ -1718,10 +1718,10 @@ class RandomResizedCrop(ImageTensorOperation): If size is an integer, a square of size (size, size) will be cropped with this value. If size is a sequence of length 2, an image of size (height, width) will be cropped. scale (Union[list, tuple], optional): Range [min, max) of respective size of the original - size to be cropped, which must be non-negative (default=(0.08, 1.0)). + size to be cropped, which must be non-negative. Default: (0.08, 1.0). ratio (Union[list, tuple], optional): Range [min, max) of aspect ratio to be - cropped, which must be non-negative (default=(3. / 4., 4. / 3.)). - interpolation (Inter, optional): Method of interpolation (default=Inter.BILINEAR). + cropped, which must be non-negative. Default: (3. / 4., 4. / 3.). + interpolation (Inter, optional): Method of interpolation. Default: Inter.BILINEAR. It can be any of [Inter.BILINEAR, Inter.NEAREST, Inter.BICUBIC, Inter.AREA, Inter.PILCUBIC]. - Inter.BILINEAR, means interpolation method is bilinear interpolation. @@ -1736,7 +1736,7 @@ class RandomResizedCrop(ImageTensorOperation): should be in 3 channels format. max_attempts (int, optional): The maximum number of attempts to propose a valid - crop_area (default=10). If exceeded, fall back to use center_crop instead. + crop_area. Default: 10. If exceeded, fall back to use center_crop instead. Raises: TypeError: If `size` is not of type int or Sequence[int]. @@ -1789,10 +1789,10 @@ class RandomResizedCropWithBBox(ImageTensorOperation): If size is an integer, a square of size (size, size) will be cropped with this value. If size is a sequence of length 2, an image of size (height, width) will be cropped. scale (Union[list, tuple], optional): Range (min, max) of respective size of the original - size to be cropped, which must be non-negative (default=(0.08, 1.0)). + size to be cropped, which must be non-negative. Default: (0.08, 1.0). ratio (Union[list, tuple], optional): Range (min, max) of aspect ratio to be - cropped, which must be non-negative (default=(3. / 4., 4. / 3.)). - interpolation (Inter mode, optional): Method of interpolation (default=Inter.BILINEAR). + cropped, which must be non-negative. Default: (3. / 4., 4. / 3.). + interpolation (Inter mode, optional): Method of interpolation. Default: Inter.BILINEAR. It can be any of [Inter.BILINEAR, Inter.NEAREST, Inter.BICUBIC] . - Inter.BILINEAR, means interpolation method is bilinear interpolation. @@ -1802,7 +1802,7 @@ class RandomResizedCropWithBBox(ImageTensorOperation): - Inter.BICUBIC, means interpolation method is bicubic interpolation. max_attempts (int, optional): The maximum number of attempts to propose a valid - crop area (default=10). If exceeded, fall back to use center crop instead. + crop area. Default: 10. If exceeded, fall back to use center crop instead. Raises: TypeError: If `size` is not of type int or Sequence[int]. @@ -1934,7 +1934,7 @@ class RandomRotation(ImageTensorOperation): degrees (Union[int, float, sequence]): Range of random rotation degrees. If `degrees` is a number, the range will be converted to (-degrees, degrees). If `degrees` is a sequence, it should be (min, max). - resample (Inter, optional): An optional resampling filter (default=Inter.NEAREST). + resample (Inter, optional): An optional resampling filter. Default: Inter.NEAREST. It can be any of [Inter.BILINEAR, Inter.NEAREST, Inter.BICUBIC, Inter.AREA]. - Inter.BILINEAR, means resample method is bilinear interpolation. @@ -1945,16 +1945,16 @@ class RandomRotation(ImageTensorOperation): - Inter.AREA: means the interpolation method is pixel area interpolation. - expand (bool, optional): Optional expansion flag (default=False). If set to True, expand the output + expand (bool, optional): Optional expansion flag. Default: False. If set to True, expand the output image to make it large enough to hold the entire rotated image. If set to False or omitted, make the output image the same size as the input. Note that the expand flag assumes rotation around the center and no translation. - center (tuple, optional): Optional center of rotation (a 2-tuple) (default=None). + center (tuple, optional): Optional center of rotation (a 2-tuple). Default: None. Origin is the top left corner. None sets to the center of the image. fill_value (Union[int, tuple[int]], optional): Optional fill color for the area outside the rotated image. If it is a 3-tuple, it is used to fill R, G, B channels respectively. If it is an integer, it is used for all RGB channels. - The fill_value values must be in range [0, 255] (default=0). + The fill_value values must be in range [0, 255]. Default: 0. Raises: TypeError: If `degrees` is not of type int, float or sequence. @@ -2062,7 +2062,7 @@ class RandomSharpness(ImageTensorOperation): Args: degrees (Union[list, tuple], optional): Range of random sharpness adjustment degrees, which must be non-negative. It should be in (min, max) format. If min=max, then - it is a single fixed magnitude operation (default = (0.1, 1.9)). + it is a single fixed magnitude operation. Default: (0.1, 1.9). Raises: TypeError : If `degrees` is not of type list or tuple. @@ -2093,7 +2093,7 @@ class RandomSolarize(ImageTensorOperation): the subrange to (255 - pixel). Args: - threshold (tuple, optional): Range of random solarize threshold (default=(0, 255)). + threshold (tuple, optional): Range of random solarize threshold. Default: (0, 255). Threshold values should always be in (min, max) format, where min and max are integers in the range [0, 255], and min <= max. If min=max, then invert all pixel values above min(max). @@ -2128,7 +2128,7 @@ class RandomVerticalFlip(ImageTensorOperation): This operation supports running on Ascend or GPU platforms by Offload. Args: - prob (float, optional): Probability of the image being flipped (default=0.5). + prob (float, optional): Probability of the image being flipped. Default: 0.5. Raises: TypeError: If `prob` is not of type float. @@ -2158,7 +2158,7 @@ class RandomVerticalFlipWithBBox(ImageTensorOperation): Flip the input image vertically, randomly with a given probability and adjust bounding boxes accordingly. Args: - prob (float, optional): Probability of the image being flipped (default=0.5). + prob (float, optional): Probability of the image being flipped. Default: 0.5. Raises: TypeError: If `prob` is not of type float. @@ -2227,7 +2227,7 @@ class Resize(ImageTensorOperation): If size is an integer, the smaller edge of the image will be resized to this value with the same image aspect ratio. If size is a sequence of length 2, it should be (height, width). - interpolation (Inter, optional): Image interpolation mode (default=Inter.LINEAR). + interpolation (Inter, optional): Image interpolation mode. Default: Inter.LINEAR. It can be any of [Inter.LINEAR, Inter.NEAREST, Inter.BICUBIC, Inter.AREA, Inter.PILCUBIC]. - Inter.LINEAR, means interpolation method is bilinear interpolation. @@ -2280,7 +2280,7 @@ class ResizeWithBBox(ImageTensorOperation): If size is an integer, smaller edge of the image will be resized to this value with the same image aspect ratio. If size is a sequence of length 2, it should be (height, width). - interpolation (Inter, optional): Image interpolation mode (default=Inter.LINEAR). + interpolation (Inter, optional): Image interpolation mode. Default: Inter.LINEAR. It can be any of [Inter.LINEAR, Inter.NEAREST, Inter.BICUBIC]. - Inter.LINEAR, means interpolation method is bilinear interpolation. @@ -2355,23 +2355,23 @@ class Rotate(ImageTensorOperation): Args: degrees (Union[int, float]): Rotation degrees. - resample (Inter, optional): An optional resampling filter (default=Inter.NEAREST). + resample (Inter, optional): An optional resampling filter. Default: Inter.NEAREST. It can be any of [Inter.BILINEAR, Inter.NEAREST, Inter.BICUBIC]. - Inter.BILINEAR, means resample method is bilinear interpolation. - Inter.NEAREST, means resample method is nearest-neighbor interpolation. - Inter.BICUBIC, means resample method is bicubic interpolation. - expand (bool, optional): Optional expansion flag (default=False). If set to True, expand the output + expand (bool, optional): Optional expansion flag. Default: False. If set to True, expand the output image to make it large enough to hold the entire rotated image. If set to False or omitted, make the output image the same size as the input. Note that the expand flag assumes rotation around the center and no translation. - center (tuple, optional): Optional center of rotation (a 2-tuple) (default=None). + center (tuple, optional): Optional center of rotation (a 2-tuple). Default: None. Origin is the top left corner. None sets to the center of the image. fill_value (Union[int, tuple[int]], optional): Optional fill color for the area outside the rotated image. If it is a 3-tuple, it is used to fill R, G, B channels respectively. If it is an integer, it is used for all RGB channels. - The fill_value values must be in range [0, 255] (default=0). + The fill_value values must be in range [0, 255]. Default: 0. Raises: TypeError: If `degrees` is not of type int or float. @@ -2424,13 +2424,13 @@ class SlicePatches(ImageTensorOperation): number of output tensors is equal to num_height*num_width. Args: - num_height (int, optional): The number of patches in vertical direction, which must be positive (default=1). - num_width (int, optional): The number of patches in horizontal direction, which must be positive (default=1). - slice_mode (Inter, optional): A mode represents pad or drop (default=SliceMode.PAD). + num_height (int, optional): The number of patches in vertical direction, which must be positive. Default: 1. + num_width (int, optional): The number of patches in horizontal direction, which must be positive. Default: 1. + slice_mode (Inter, optional): A mode represents pad or drop. Default: SliceMode.PAD. It can be any of [SliceMode.PAD, SliceMode.DROP]. fill_value (int, optional): The border width in number of pixels in right and bottom direction if slice_mode is set to be SliceMode.PAD. - The fill_value must be in range [0, 255] (default=0). + The fill_value must be in range [0, 255]. Default: 0. Raises: TypeError: If `num_height` is not of type int. @@ -2488,10 +2488,10 @@ class SoftDvppDecodeRandomCropResizeJpeg(ImageTensorOperation): If size is an integer, a square crop of size (size, size) is returned. If size is a sequence of length 2, an image of size (height, width) will be cropped. scale (Union[list, tuple], optional): Range [min, max) of respective size of the - original size to be cropped, which must be non-negative (default=(0.08, 1.0)). + original size to be cropped, which must be non-negative. Default: (0.08, 1.0). ratio (Union[list, tuple], optional): Range [min, max) of aspect ratio to be - cropped, which must be non-negative (default=(3. / 4., 4. / 3.)). - max_attempts (int, optional): The maximum number of attempts to propose a valid crop_area (default=10). + cropped, which must be non-negative. Default: (3. / 4., 4. / 3.). + max_attempts (int, optional): The maximum number of attempts to propose a valid crop_area. Default: 10. If exceeded, fall back to use center_crop instead. The max_attempts value must be positive. Raises: @@ -2564,7 +2564,7 @@ class UniformAugment(ImageTensorOperation): Args: transforms (TensorOperation): C++ transformation operation to be applied on random selection of bounding box regions of a given image (Python operations are not accepted). - num_ops (int, optional): Number of operations to be selected and applied, which must be positive (default=2). + num_ops (int, optional): Number of operations to be selected and applied, which must be positive. Default: 2. Raises: TypeError: If `transform` is not an image processing operation diff --git a/mindspore/python/mindspore/dataset/vision/py_transforms.py b/mindspore/python/mindspore/dataset/vision/py_transforms.py index ca3415849a6..c7d99bed85a 100644 --- a/mindspore/python/mindspore/dataset/vision/py_transforms.py +++ b/mindspore/python/mindspore/dataset/vision/py_transforms.py @@ -113,7 +113,7 @@ class AutoContrast(py_transforms.PyTensorOperation): Args: cutoff (float, optional): Percent to cut off from the histogram on the low and - high ends, must be in range of [0.0, 50.0). Default: 0.0. + high ends, must be in range of [0.0, 50.0]. Default: 0.0. ignore (Union[int, Sequence[int]], optional): Background pixel value, which will be directly remapped to white. Default: None, means no background. diff --git a/mindspore/python/mindspore/dataset/vision/py_transforms_util.py b/mindspore/python/mindspore/dataset/vision/py_transforms_util.py index a07af9ad40e..a286acd5682 100644 --- a/mindspore/python/mindspore/dataset/vision/py_transforms_util.py +++ b/mindspore/python/mindspore/dataset/vision/py_transforms_util.py @@ -51,7 +51,7 @@ def normalize(img, mean, std, pad_channel=False, dtype="float32"): mean (list): List of mean values for each channel, w.r.t channel order. std (list): List of standard deviations for each channel, w.r.t. channel order. pad_channel (bool): Whether to pad a extra channel with value zero. - dtype (str): Output datatype of normalize, only worked when pad_channel is True. (default is "float32") + dtype (str): Output datatype of normalize, only worked when pad_channel is True. Default: "float32". Returns: img (numpy.ndarray), Normalized image. @@ -359,7 +359,7 @@ def random_resize_crop(img, size, scale, ratio, interpolation=Inter.BILINEAR, ma scale (tuple): Range (min, max) of respective size of the original size to be cropped. ratio (tuple): Range (min, max) of aspect ratio to be cropped. interpolation (interpolation mode): Image interpolation mode. Default is Inter.BILINEAR = 2. - max_attempts (int): The maximum number of attempts to propose a valid crop_area. Default 10. + max_attempts (int): The maximum number of attempts to propose a valid crop_area. Default: 10. If exceeded, fall back to use center_crop instead. Returns: @@ -432,9 +432,9 @@ def random_crop(img, size, padding, pad_if_needed, fill_value, padding_mode): with the first value and (right and bottom) with the second value. If 4 values are provided as a list or tuple, it pads the left, top, right and bottom respectively. - Default is None. + Default: None. pad_if_needed (bool): Pad the image if either side is smaller than - the given output size. Default is False. + the given output size. Default: False. fill_value (Union[int, tuple]): The pixel intensity of the borders if the padding_mode is 'constant'. If it is a 3-tuple, it is used to fill R, G, B channels respectively. @@ -912,7 +912,7 @@ def pad(img, padding, fill_value, padding_mode): with the first value and (right and bottom) with the second value. If 4 values are provided as a list or tuple, it pads the left, top, right and bottom respectively. - Default is None. + Default: None. fill_value (Union[int, tuple]): The pixel intensity of the borders if the padding_mode is "constant". If it is a 3-tuple, it is used to fill R, G, B channels respectively. @@ -1100,7 +1100,7 @@ def erase(np_img, i, j, height, width, erase_value, inplace=False): height (int): Height of the erased region. width (int): Width of the erased region. erase_value: Erase value return from helper function get_erase_params(). - inplace (bool, optional): Apply this transform inplace. Default is False. + inplace (bool, optional): Apply this transform inplace. Default: False. Returns: np_img (numpy.ndarray), Erased NumPy image array. @@ -1514,7 +1514,7 @@ def random_color(img, degrees): Args: img (PIL.Image.Image): Image to be color adjusted. degrees (sequence): Range of random color adjustment degrees. - It should be in (min, max) format (default=(0.1,1.9)). + It should be in (min, max) format. Default: (0.1,1.9). Returns: PIL.Image.Image, color adjusted image. @@ -1534,7 +1534,7 @@ def random_sharpness(img, degrees): Args: img (PIL.Image.Image): Image to be sharpness adjusted. degrees (sequence): Range of random sharpness adjustment degrees. - It should be in (min, max) format (default=(0.1,1.9)). + It should be in (min, max) format. Default: (0.1,1.9). Returns: PIL.Image.Image, sharpness adjusted image. @@ -1579,8 +1579,8 @@ def auto_contrast(img, cutoff, ignore): Args: img (PIL.Image): Image to be augmented with AutoContrast. - cutoff (float, optional): Percent of pixels to cut off from the histogram (default=0.0). - ignore (Union[int, Sequence[int]], optional): Pixel values to ignore (default=None). + cutoff (float, optional): Percent of pixels to cut off from the histogram. Default: 0.0. + ignore (Union[int, Sequence[int]], optional): Pixel values to ignore. Default: None. Returns: PIL.Image, augmented image. diff --git a/mindspore/python/mindspore/dataset/vision/transforms.py b/mindspore/python/mindspore/dataset/vision/transforms.py index 0055a2a8fab..adb7c735d79 100644 --- a/mindspore/python/mindspore/dataset/vision/transforms.py +++ b/mindspore/python/mindspore/dataset/vision/transforms.py @@ -201,7 +201,7 @@ class AdjustGamma(ImageTensorOperation, PyTensorOperation): The output image pixel value is exponentially related to the input image pixel value. gamma larger than 1 make the shadows darker, while gamma smaller than 1 make dark regions lighter. - gain (float, optional): The constant multiplier (default=1.0). + gain (float, optional): The constant multiplier. Default: 1.0. Raises: TypeError: If `gain` is not of type float. @@ -441,8 +441,8 @@ class AutoAugment(ImageTensorOperation): This operation works only with 3-channel RGB images. Args: - policy (AutoAugmentPolicy, optional): AutoAugment policies learned on different datasets - (default=AutoAugmentPolicy.IMAGENET). + policy (AutoAugmentPolicy, optional): AutoAugment policies learned on different datasets. + Default: AutoAugmentPolicy.IMAGENET. It can be any of [AutoAugmentPolicy.IMAGENET, AutoAugmentPolicy.CIFAR10, AutoAugmentPolicy.SVHN]. Randomly apply 2 operations from a candidate set. See auto augmentation details in AutoAugmentPolicy. @@ -452,7 +452,7 @@ class AutoAugment(ImageTensorOperation): - AutoAugmentPolicy.SVHN, means to apply AutoAugment learned on SVHN dataset. - interpolation (Inter, optional): Image interpolation mode for Resize operation (default=Inter.NEAREST). + interpolation (Inter, optional): Image interpolation mode for Resize operation. Default: Inter.NEAREST. It can be any of [Inter.NEAREST, Inter.BILINEAR, Inter.BICUBIC, Inter.AREA]. - Inter.NEAREST: means interpolation method is nearest-neighbor interpolation. @@ -465,8 +465,8 @@ class AutoAugment(ImageTensorOperation): fill_value (Union[int, tuple[int]], optional): Pixel fill value for the area outside the transformed image. It can be an int or a 3-tuple. If it is a 3-tuple, it is used to fill R, G, B channels respectively. - If it is an integer, it is used for all RGB channels. The fill_value values must be in range [0, 255] - (default=0). + If it is an integer, it is used for all RGB channels. The fill_value values must be in range [0, 255]. + Default: 0. Raises: TypeError: If `policy` is not of type :class:`mindspore.dataset.vision.AutoAugmentPolicy`. @@ -509,9 +509,9 @@ class AutoContrast(ImageTensorOperation, PyTensorOperation): Args: cutoff (float, optional): Percent of lightest and darkest pixels to cut off from - the histogram of input image. The value must be in the range [0.0, 50.0) (default=0.0). + the histogram of input image. The value must be in the range [0.0, 50.0]. Default: 0.0. ignore (Union[int, sequence], optional): The background pixel values to ignore, - The ignore values must be in range [0, 255] (default=None). + The ignore values must be in range [0, 255]. Default: None. Raises: TypeError: If `cutoff` is not of type float. @@ -564,7 +564,7 @@ class BoundingBoxAugment(ImageTensorOperation): transform (TensorOperation): Transformation operation to be applied on random selection of bounding box regions of a given image. ratio (float, optional): Ratio of bounding boxes to apply augmentation on. - Range: [0.0, 1.0] (default=0.3). + Range: [0.0, 1.0]. Default: 0.3. Raises: TypeError: If `transform` is an image processing operation in :class:`mindspore.dataset.vision.transforms`. @@ -780,9 +780,9 @@ class CutMixBatch(ImageTensorOperation): Args: image_batch_format (ImageBatchFormat): The method of padding. Can be any of [ImageBatchFormat.NHWC, ImageBatchFormat.NCHW]. - alpha (float, optional): Hyperparameter of beta distribution, must be larger than 0 (default = 1.0). + alpha (float, optional): Hyperparameter of beta distribution, must be larger than 0. Default: 1.0. prob (float, optional): The probability by which CutMix is applied to each image, - which must be in range: [0.0, 1.0] (default = 1.0). + which must be in range: [0.0, 1.0]. Default: 1.0. Raises: TypeError: If `image_batch_format` is not of type :class:`mindspore.dataset.vision.ImageBatchFormat`. @@ -824,7 +824,7 @@ class CutOut(ImageTensorOperation): Args: length (int): The side length of each square patch, must be larger than 0. - num_patches (int, optional): Number of patches to be cut out of an image, must be larger than 0. (default=1). + num_patches (int, optional): Number of patches to be cut out of an image, must be larger than 0. Default: 1. is_hwc (bool, optional): Whether the input image is in HWC format. True - HWC format, False - CHW format. Default: True. @@ -864,7 +864,7 @@ class Decode(ImageTensorOperation, PyTensorOperation): Supported image formats: JPEG, BMP, PNG, TIFF, GIF(need `to_pil=True`), WEBP(need `to_pil=True`). Args: - to_pil (bool, optional): decode to PIL Image (default=False). + to_pil (bool, optional): decode to PIL Image. Default: False. Raises: RuntimeError: If given tensor is not a 1D sequence. @@ -1069,8 +1069,8 @@ class GaussianBlur(ImageTensorOperation): kernel_size (Union[int, Sequence[int]]): Size of the Gaussian kernel to use. The value must be positive and odd. If only an integer is provided, the kernel size will be (kernel_size, kernel_size). If a sequence of integer is provided, it must be a sequence of 2 values which represents (width, height). - sigma (Union[float, Sequence[float]], optional): Standard deviation of the Gaussian kernel to use - (default=None). The value must be positive. If only a float is provided, the sigma will be (sigma, sigma). + sigma (Union[float, Sequence[float]], optional): Standard deviation of the Gaussian kernel to use. + Default: None. The value must be positive. If only a float is provided, the sigma will be (sigma, sigma). If a sequence of float is provided, it must be a sequence of 2 values which represents (width, height). If None is provided, the sigma will be calculated as ((kernel_size - 1) * 0.5 - 1) * 0.3 + 0.8. @@ -1433,7 +1433,7 @@ class MixUpBatch(ImageTensorOperation): Note that you need to make labels into one-hot format and batched before calling this operation. Args: - alpha (float, optional): Hyperparameter of beta distribution. The value must be positive (default = 1.0). + alpha (float, optional): Hyperparameter of beta distribution. The value must be positive. Default: 1.0. Raises: TypeError: If `alpha` is not of type float. @@ -1520,7 +1520,7 @@ class NormalizePad(ImageTensorOperation): The mean values must be in range (0.0, 255.0]. std (sequence): List or tuple of standard deviations for each channel, with respect to channel order. The standard deviation values must be in range (0.0, 255.0]. - dtype (str, optional): Set the output data type of normalized image (default is "float32"). + dtype (str, optional): Set the output data type of normalized image. Default: "float32". is_hwc (bool, optional): Whether the input image is HWC. True - HWC format, False - CHW format. Default: True. @@ -1575,8 +1575,8 @@ class Pad(ImageTensorOperation, PyTensorOperation): fill_value (Union[int, tuple[int]], optional): The pixel intensity of the borders, only valid for padding_mode Border.CONSTANT. If it is a 3-tuple, it is used to fill R, G, B channels respectively. If it is an integer, it is used for all RGB channels. - The fill_value values must be in range [0, 255] (default=0). - padding_mode (Border, optional): The method of padding (default=Border.CONSTANT). Can be any of + The fill_value values must be in range [0, 255]. Default: 0. + padding_mode (Border, optional): The method of padding. Default: Border.CONSTANT. Can be any of [Border.CONSTANT, Border.EDGE, Border.REFLECT, Border.SYMMETRIC]. - Border.CONSTANT, means it fills the border with constant values. @@ -1650,7 +1650,7 @@ class PadToSize(ImageTensorOperation): If int is provided, it will be used for all RGB channels. If tuple[int, int, int] is provided, it will be used for R, G, B channels respectively. Default: 0. padding_mode (Border, optional): Method of padding. It can be Border.CONSTANT, Border.EDGE, Border.REFLECT - or Border.SYMMETRIC. Default: Border.CONSTANT. Default: Border.CONSTANT. + or Border.SYMMETRIC. Default: Border.CONSTANT. - Border.CONSTANT, pads with a constant value. - Border.EDGE, pads with the last value at the edge of the image. @@ -1862,7 +1862,7 @@ class RandomAdjustSharpness(ImageTensorOperation): Degree of 0.0 gives a blurred image, degree of 1.0 gives the original image, and degree of 2.0 increases the sharpness by a factor of 2. prob (float, optional): Probability of the image being sharpness adjusted, which - must be in range of [0.0, 1.0] (default=0.5). + must be in range of [0.0, 1.0]. Default: 0.5. Raises: TypeError: If `degree` is not of type float. @@ -1900,7 +1900,7 @@ class RandomAffine(ImageTensorOperation, PyTensorOperation): If `degrees` is a number, the range will be (-degrees, degrees). If `degrees` is a sequence, it should be (min, max). translate (sequence, optional): Sequence (tx_min, tx_max, ty_min, ty_max) of minimum/maximum translation in - x(horizontal) and y(vertical) directions, range [-1.0, 1.0] (default=None). + x(horizontal) and y(vertical) directions, range [-1.0, 1.0]. Default: None. The horizontal and vertical shift is selected randomly from the range: (tx_min*width, tx_max*width) and (ty_min*height, ty_max*height), respectively. If a tuple or list of size 2, then a translate parallel to the X axis in the range of @@ -1909,8 +1909,8 @@ class RandomAffine(ImageTensorOperation, PyTensorOperation): (translate[0], translate[1]) and a translate parallel to the Y axis in the range of (translate[2], translate[3]) are applied. If None, no translation is applied. - scale (sequence, optional): Scaling factor interval, which must be non negative - (default=None, original scale is used). + scale (sequence, optional): Scaling factor interval, which must be non negative. + Default: None, original scale is used. shear (Union[float, Sequence[float, float], Sequence[float, float, float, float]], optional): Range of shear factor to select from. If float is provided, a shearing parallel to X axis with a factor selected from @@ -1920,7 +1920,7 @@ class RandomAffine(ImageTensorOperation, PyTensorOperation): If Sequence[float, float, float, float] is provided, a shearing parallel to X axis with a factor selected from ( `shear` [0], `shear` [1]) and a shearing parallel to Y axis with a factor selected from ( `shear` [2], `shear` [3]) will be applied. Default: None, means no shearing. - resample (Inter, optional): An optional resampling filter (default=Inter.NEAREST). + resample (Inter, optional): An optional resampling filter. Default: Inter.NEAREST. It can be any of [Inter.BILINEAR, Inter.NEAREST, Inter.BICUBIC, Inter.AREA]. - Inter.BILINEAR, means resample method is bilinear interpolation. @@ -1933,7 +1933,7 @@ class RandomAffine(ImageTensorOperation, PyTensorOperation): fill_value (Union[int, tuple[int]], optional): Optional fill_value to fill the area outside the transform in the output image. There must be three elements in tuple and the value of single element is [0, 255]. - (default=0, filling is performed). + Default: 0, filling is performed. Raises: TypeError: If `degrees` is not of type int, float or sequence. @@ -2038,11 +2038,11 @@ class RandomAutoContrast(ImageTensorOperation): Args: cutoff (float, optional): Percent of the lightest and darkest pixels to be cut off from - the histogram of the input image. The value must be in range of [0.0, 50.0) (default=0.0). + the histogram of the input image. The value must be in range of [0.0, 50.0]. Default: 0.0. ignore (Union[int, sequence], optional): The background pixel values to be ignored, each of - which must be in range of [0, 255] (default=None). + which must be in range of [0, 255]. Default: None. prob (float, optional): Probability of the image being automatically contrasted, which - must be in range of [0.0, 1.0] (default=0.5). + must be in range of [0.0, 1.0]. Default: 0.5. Raises: TypeError: If `cutoff` is not of type float. @@ -2086,7 +2086,7 @@ class RandomColor(ImageTensorOperation, PyTensorOperation): Args: degrees (Sequence[float], optional): Range of random color adjustment degrees, which must be non-negative. It should be in (min, max) format. If min=max, then it is a - single fixed magnitude operation (default=(0.1, 1.9)). + single fixed magnitude operation. Default: (0.1, 1.9). Raises: TypeError: If `degrees` is not of type Sequence[float]. @@ -2132,19 +2132,19 @@ class RandomColorAdjust(ImageTensorOperation, PyTensorOperation): This operation supports running on Ascend or GPU platforms by Offload. Args: - brightness (Union[float, Sequence[float]], optional): Brightness adjustment factor (default=(1, 1)). + brightness (Union[float, Sequence[float]], optional): Brightness adjustment factor. Default: (1, 1). Cannot be negative. If it is a float, the factor is uniformly chosen from the range [max(0, 1-brightness), 1+brightness]. If it is a sequence, it should be [min, max] for the range. - contrast (Union[float, Sequence[float]], optional): Contrast adjustment factor (default=(1, 1)). + contrast (Union[float, Sequence[float]], optional): Contrast adjustment factor. Default: (1, 1). Cannot be negative. If it is a float, the factor is uniformly chosen from the range [max(0, 1-contrast), 1+contrast]. If it is a sequence, it should be [min, max] for the range. - saturation (Union[float, Sequence[float]], optional): Saturation adjustment factor (default=(1, 1)). + saturation (Union[float, Sequence[float]], optional): Saturation adjustment factor. Default: (1, 1). Cannot be negative. If it is a float, the factor is uniformly chosen from the range [max(0, 1-saturation), 1+saturation]. If it is a sequence, it should be [min, max] for the range. - hue (Union[float, Sequence[float]], optional): Hue adjustment factor (default=(0, 0)). + hue (Union[float, Sequence[float]], optional): Hue adjustment factor. Default: (0, 0). If it is a float, the range will be [-hue, hue]. Value should be 0 <= hue <= 0.5. If it is a sequence, it should be [min, max] where -0.5 <= min <= max <= 0.5. @@ -2225,7 +2225,7 @@ class RandomCrop(ImageTensorOperation, PyTensorOperation): If size is an integer, a square crop of size (size, size) is returned. If size is a sequence of length 2, an image of size (height, width) will be cropped. padding (Union[int, Sequence[int]], optional): The number of pixels to pad each border of the image. - The padding value(s) must be non-negative (default=None). + The padding value(s) must be non-negative. Default: None. If padding is not None, pad image first with padding values. If a single number is provided, pad all borders with this value. If a tuple or lists of 2 values are provided, pad the (left and right) @@ -2233,12 +2233,12 @@ class RandomCrop(ImageTensorOperation, PyTensorOperation): If 4 values are provided as a list or tuple, pad the left, top, right and bottom respectively. pad_if_needed (bool, optional): Pad the image if either side is smaller than - the given output size (default=False). + the given output size. Default: False. fill_value (Union[int, tuple[int]], optional): The pixel intensity of the borders, only valid for padding_mode Border.CONSTANT. If it is a 3-tuple, it is used to fill R, G, B channels respectively. If it is an integer, it is used for all RGB channels. - The fill_value values must be in range [0, 255] (default=0). - padding_mode (Border, optional): The method of padding (default=Border.CONSTANT). It can be any of + The fill_value values must be in range [0, 255]. Default: 0. + padding_mode (Border, optional): The method of padding. Default: Border.CONSTANT. It can be any of [Border.CONSTANT, Border.EDGE, Border.REFLECT, Border.SYMMETRIC]. - Border.CONSTANT, means it fills the border with constant values. @@ -2320,10 +2320,10 @@ class RandomCropDecodeResize(ImageTensorOperation): If size is an integer, a square crop of size (size, size) is returned. If size is a sequence of length 2, it should be (height, width). scale (Union[list, tuple], optional): Range [min, max) of respective size of the - original size to be cropped, which must be non-negative (default=(0.08, 1.0)). + original size to be cropped, which must be non-negative. Default: (0.08, 1.0). ratio (Union[list, tuple], optional): Range [min, max) of aspect ratio to be - cropped, which must be non-negative (default=(3. / 4., 4. / 3.)). - interpolation (Inter, optional): Image interpolation mode for resize operation (default=Inter.BILINEAR). + cropped, which must be non-negative. Default: (3. / 4., 4. / 3.). + interpolation (Inter, optional): Image interpolation mode for resize operation. Default: Inter.BILINEAR. It can be any of [Inter.BILINEAR, Inter.NEAREST, Inter.BICUBIC, Inter.AREA, Inter.PILCUBIC]. - Inter.BILINEAR, means interpolation method is bilinear interpolation. @@ -2337,7 +2337,7 @@ class RandomCropDecodeResize(ImageTensorOperation): - Inter.PILCUBIC, means interpolation method is bicubic interpolation like implemented in pillow, input should be in 3 channels format. - max_attempts (int, optional): The maximum number of attempts to propose a valid crop_area (default=10). + max_attempts (int, optional): The maximum number of attempts to propose a valid crop_area. Default: 10. If exceeded, fall back to use center_crop instead. The max_attempts value must be positive. Raises: @@ -2403,19 +2403,19 @@ class RandomCropWithBBox(ImageTensorOperation): If size is an integer, a square crop of size (size, size) is returned. If size is a sequence of length 2, an image of size (height, width) will be cropped. padding (Union[int, Sequence[int]], optional): The number of pixels to pad the image - The padding value(s) must be non-negative (default=None). + The padding value(s) must be non-negative. Default: None. If padding is not None, first pad image with padding values. If a single number is provided, pad all borders with this value. If a tuple or lists of 2 values are provided, pad the (left and right) with the first value and (top and bottom) with the second value. If 4 values are provided as a list or tuple, pad the left, top, right and bottom respectively. pad_if_needed (bool, optional): Pad the image if either side is smaller than - the given output size (default=False). + the given output size. Default: False. fill_value (Union[int, tuple[int]], optional): The pixel intensity of the borders, only valid for padding_mode Border.CONSTANT. If it is a 3-tuple, it is used to fill R, G, B channels respectively. If it is an integer, it is used for all RGB channels. - The fill_value values must be in range [0, 255] (default=0). - padding_mode (Border, optional): The method of padding (default=Border.CONSTANT). It can be any of + The fill_value values must be in range [0, 255]. Default: 0. + padding_mode (Border, optional): The method of padding. Default: Border.CONSTANT. It can be any of [Border.CONSTANT, Border.EDGE, Border.REFLECT, Border.SYMMETRIC]. - Border.CONSTANT, means it fills the border with constant values. @@ -2483,7 +2483,7 @@ class RandomEqualize(ImageTensorOperation): Args: prob (float, optional): Probability of the image being equalized, which - must be in range of [0.0, 1.0] (default=0.5). + must be in range of [0.0, 1.0]. Default: 0.5. Raises: TypeError: If `prob` is not of type float. @@ -2517,7 +2517,7 @@ class RandomErasing(PyTensorOperation): Args: prob (float, optional): Probability of performing erasing, which - must be in range of [0.0, 1.0] (default: 0.5). + must be in range of [0.0, 1.0]. Default: 0.5. scale (Sequence[float, float], optional): Range of area scale of the erased area relative to the original image to select from, arranged in order of (min, max). Default: (0.02, 0.33). @@ -2594,7 +2594,7 @@ class RandomGrayscale(PyTensorOperation): Args: prob (float, optional): Probability of performing grayscale conversion, - which must be in range of [0.0, 1.0] (default: 0.1). + which must be in range of [0.0, 1.0]. Default: 0.1. Raises: TypeError: If `prob` is not of type float. @@ -2648,7 +2648,7 @@ class RandomHorizontalFlip(ImageTensorOperation, PyTensorOperation): Args: prob (float, optional): Probability of the image being flipped, - which must be in range of [0.0, 1.0] (default=0.5). + which must be in range of [0.0, 1.0]. Default: 0.5. Raises: TypeError: If `prob` is not of type float. @@ -2691,7 +2691,7 @@ class RandomHorizontalFlipWithBBox(ImageTensorOperation): Args: prob (float, optional): Probability of the image being flipped, - which must be in range of [0.0, 1.0] (default=0.5). + which must be in range of [0.0, 1.0]. Default: 0.5. Raises: TypeError: If `prob` is not of type float. @@ -2723,7 +2723,7 @@ class RandomInvert(ImageTensorOperation): Args: prob (float, optional): Probability of the image being inverted, - which must be in range of [0.0, 1.0] (default=0.5). + which must be in range of [0.0, 1.0]. Default: 0.5. Raises: TypeError: If `prob` is not of type float. @@ -2755,7 +2755,7 @@ class RandomLighting(ImageTensorOperation, PyTensorOperation): calculated from the imagenet dataset. Args: - alpha (float, optional): Intensity of the image, which must be non-negative (default=0.05). + alpha (float, optional): Intensity of the image, which must be non-negative. Default: 0.05. Raises: TypeError: If `alpha` is not of type float. @@ -2800,7 +2800,7 @@ class RandomPerspective(PyTensorOperation): Args: distortion_scale (float, optional): Scale of distortion, in range of [0.0, 1.0]. Default: 0.5. prob (float, optional): Probability of performing perspective transformation, which - must be in range of [0.0, 1.0] (default: 0.5). + must be in range of [0.0, 1.0]. Default: 0.5. interpolation (Inter, optional): Method of interpolation. It can be Inter.BILINEAR, Inter.NEAREST or Inter.BICUBIC. Default: Inter.BICUBIC. @@ -2865,7 +2865,7 @@ class RandomPosterize(ImageTensorOperation): Bits values must be in range of [1,8], and include at least one integer value in the given range. It must be in (min, max) or integer format. If min=max, then it is a single fixed - magnitude operation (default=(8, 8)). + magnitude operation. Default: (8, 8). Raises: TypeError: If `bits` is not of type integer or sequence of integer. @@ -2907,10 +2907,10 @@ class RandomResizedCrop(ImageTensorOperation, PyTensorOperation): If size is an integer, a square of size (size, size) will be cropped with this value. If size is a sequence of length 2, an image of size (height, width) will be cropped. scale (Union[list, tuple], optional): Range [min, max) of respective size of the original - size to be cropped, which must be non-negative (default=(0.08, 1.0)). + size to be cropped, which must be non-negative. Default: (0.08, 1.0). ratio (Union[list, tuple], optional): Range [min, max) of aspect ratio to be - cropped, which must be non-negative (default=(3. / 4., 4. / 3.)). - interpolation (Inter, optional): Method of interpolation (default=Inter.BILINEAR). + cropped, which must be non-negative. Default: (3. / 4., 4. / 3.). + interpolation (Inter, optional): Method of interpolation. Default: Inter.BILINEAR. It can be any of [Inter.BILINEAR, Inter.NEAREST, Inter.BICUBIC, Inter.AREA, Inter.PILCUBIC]. - Inter.BILINEAR, means interpolation method is bilinear interpolation. @@ -2927,7 +2927,7 @@ class RandomResizedCrop(ImageTensorOperation, PyTensorOperation): - Inter.ANTIALIAS, means the interpolation method is antialias interpolation. max_attempts (int, optional): The maximum number of attempts to propose a valid - crop_area (default=10). If exceeded, fall back to use center_crop instead. + crop_area. Default: 10. If exceeded, fall back to use center_crop instead. Raises: TypeError: If `size` is not of type int or Sequence[int]. @@ -3001,10 +3001,10 @@ class RandomResizedCropWithBBox(ImageTensorOperation): If size is an integer, a square crop of size (size, size) is returned. If size is a sequence of length 2, it should be (height, width). scale (Union[list, tuple], optional): Range (min, max) of respective size of the original - size to be cropped, which must be non-negative (default=(0.08, 1.0)). + size to be cropped, which must be non-negative. Default: (0.08, 1.0). ratio (Union[list, tuple], optional): Range (min, max) of aspect ratio to be - cropped, which must be non-negative (default=(3. / 4., 4. / 3.)). - interpolation (Inter, optional): Image interpolation mode (default=Inter.BILINEAR). + cropped, which must be non-negative. Default: (3. / 4., 4. / 3.). + interpolation (Inter, optional): Image interpolation mode. Default: Inter.BILINEAR. It can be any of [Inter.BILINEAR, Inter.NEAREST, Inter.BICUBIC]. - Inter.BILINEAR, means interpolation method is bilinear interpolation. @@ -3014,7 +3014,7 @@ class RandomResizedCropWithBBox(ImageTensorOperation): - Inter.BICUBIC, means interpolation method is bicubic interpolation. max_attempts (int, optional): The maximum number of attempts to propose a valid - crop area (default=10). If exceeded, fall back to use center crop instead. + crop area. Default: 10. If exceeded, fall back to use center crop instead. Raises: TypeError: If `size` is not of type int or Sequence[int]. @@ -3152,7 +3152,7 @@ class RandomRotation(ImageTensorOperation, PyTensorOperation): degrees (Union[int, float, sequence]): Range of random rotation degrees. If `degrees` is a number, the range will be converted to (-degrees, degrees). If `degrees` is a sequence, it should be (min, max). - resample (Inter, optional): An optional resampling filter (default=Inter.NEAREST). + resample (Inter, optional): An optional resampling filter. Default: Inter.NEAREST. It can be any of [Inter.BILINEAR, Inter.NEAREST, Inter.BICUBIC, Inter.AREA]. - Inter.BILINEAR, means resample method is bilinear interpolation. @@ -3163,16 +3163,16 @@ class RandomRotation(ImageTensorOperation, PyTensorOperation): - Inter.AREA, means the interpolation method is pixel area interpolation. - expand (bool, optional): Optional expansion flag (default=False). If set to True, expand the output + expand (bool, optional): Optional expansion flag. Default: False. If set to True, expand the output image to make it large enough to hold the entire rotated image. If set to False or omitted, make the output image the same size as the input. Note that the expand flag assumes rotation around the center and no translation. - center (tuple, optional): Optional center of rotation (a 2-tuple) (default=None). + center (tuple, optional): Optional center of rotation (a 2-tuple). Default: None. Origin is the top left corner. None sets to the center of the image. fill_value (Union[int, tuple[int]], optional): Optional fill color for the area outside the rotated image. If it is a 3-tuple, it is used to fill R, G, B channels respectively. If it is an integer, it is used for all RGB channels. - The fill_value values must be in range [0, 255] (default=0). + The fill_value values must be in range [0, 255]. Default: 0. Raises: TypeError: If `degrees` is not of type integer, float or sequence. @@ -3300,7 +3300,7 @@ class RandomSharpness(ImageTensorOperation, PyTensorOperation): Args: degrees (Union[list, tuple], optional): Range of random sharpness adjustment degrees, which must be non-negative. It should be in (min, max) format. If min=max, then - it is a single fixed magnitude operation (default = (0.1, 1.9)). + it is a single fixed magnitude operation. Default: (0.1, 1.9). Raises: TypeError : If `degrees` is not a list or a tuple. @@ -3344,7 +3344,7 @@ class RandomSolarize(ImageTensorOperation): the subrange to (255 - pixel). Args: - threshold (tuple, optional): Range of random solarize threshold (default=(0, 255)). + threshold (tuple, optional): Range of random solarize threshold. Default: (0, 255). Threshold values should always be in (min, max) format, where min and max are integers in the range [0, 255], and min <= max. If min=max, then invert all pixel values above min(max). @@ -3378,7 +3378,7 @@ class RandomVerticalFlip(ImageTensorOperation, PyTensorOperation): Args: prob (float, optional): Probability of the image being flipped, which - must be in range of [0.0, 1.0] (default=0.5). + must be in range of [0.0, 1.0]. Default: 0.5. Raises: TypeError: If `prob` is not of type float. @@ -3421,7 +3421,7 @@ class RandomVerticalFlipWithBBox(ImageTensorOperation): Args: prob (float, optional): Probability of the image being flipped, - which must be in range of [0.0, 1.0] (default=0.5). + which must be in range of [0.0, 1.0]. Default: 0.5. Raises: TypeError: If `prob` is not of type float. @@ -3492,7 +3492,7 @@ class Resize(ImageTensorOperation, PyTensorOperation): If size is an integer, the smaller edge of the image will be resized to this value with the same image aspect ratio. If size is a sequence of length 2, it should be (height, width). - interpolation (Inter, optional): Image interpolation mode (default=Inter.BILINEAR). + interpolation (Inter, optional): Image interpolation mode. Default: Inter.BILINEAR. It can be any of [Inter.BILINEAR, Inter.LINEAR, Inter.NEAREST, Inter.BICUBIC, Inter.AREA, Inter.PILCUBIC, Inter.ANTIALIAS]. @@ -3630,7 +3630,7 @@ class ResizeWithBBox(ImageTensorOperation): If size is an integer, smaller edge of the image will be resized to this value with the same image aspect ratio. If size is a sequence of length 2, it should be (height, width). - interpolation (Inter, optional): Image interpolation mode (default=Inter.LINEAR). + interpolation (Inter, optional): Image interpolation mode. Default: Inter.LINEAR. It can be any of [Inter.LINEAR, Inter.NEAREST, Inter.BICUBIC]. - Inter.LINEAR, means interpolation method is bilinear interpolation. @@ -3724,23 +3724,23 @@ class Rotate(ImageTensorOperation): Args: degrees (Union[int, float]): Rotation degrees. - resample (Inter, optional): An optional resampling filter (default=Inter.NEAREST). + resample (Inter, optional): An optional resampling filter. Default: Inter.NEAREST. It can be any of [Inter.BILINEAR, Inter.NEAREST, Inter.BICUBIC]. - Inter.BILINEAR, means resample method is bilinear interpolation. - Inter.NEAREST, means resample method is nearest-neighbor interpolation. - Inter.BICUBIC, means resample method is bicubic interpolation. - expand (bool, optional): Optional expansion flag (default=False). If set to True, expand the output + expand (bool, optional): Optional expansion flag. Default: False. If set to True, expand the output image to make it large enough to hold the entire rotated image. If set to False or omitted, make the output image the same size as the input. Note that the expand flag assumes rotation around the center and no translation. - center (tuple, optional): Optional center of rotation (a 2-tuple) (default=None). + center (tuple, optional): Optional center of rotation (a 2-tuple). Default: None. Origin is the top left corner. None sets to the center of the image. fill_value (Union[int, tuple[int]], optional): Optional fill color for the area outside the rotated image. If it is a 3-tuple, it is used to fill R, G, B channels respectively. If it is an integer, it is used for all RGB channels. - The fill_value values must be in range [0, 255] (default=0). + The fill_value values must be in range [0, 255]. Default: 0. Raises: TypeError: If `degrees` is not of type integer, float or sequence. @@ -3794,13 +3794,13 @@ class SlicePatches(ImageTensorOperation): number of output tensors is equal to num_height*num_width. Args: - num_height (int, optional): The number of patches in vertical direction, which must be positive (default=1). - num_width (int, optional): The number of patches in horizontal direction, which must be positive (default=1). - slice_mode (Inter, optional): A mode represents pad or drop (default=SliceMode.PAD). + num_height (int, optional): The number of patches in vertical direction, which must be positive. Default: 1. + num_width (int, optional): The number of patches in horizontal direction, which must be positive. Default: 1. + slice_mode (Inter, optional): A mode represents pad or drop. Default: SliceMode.PAD. It can be any of [SliceMode.PAD, SliceMode.DROP]. fill_value (int, optional): The border width in number of pixels in right and bottom direction if slice_mode is set to be SliceMode.PAD. - The fill_value must be in range [0, 255] (default=0). + The fill_value must be in range [0, 255]. Default: 0. Raises: TypeError: If `num_height` is not of type integer. diff --git a/mindspore/python/mindspore/dataset/vision/utils.py b/mindspore/python/mindspore/dataset/vision/utils.py index 7987ee9850a..09180eb456d 100755 --- a/mindspore/python/mindspore/dataset/vision/utils.py +++ b/mindspore/python/mindspore/dataset/vision/utils.py @@ -368,7 +368,7 @@ def encode_jpeg(image, quality=75): Examples: >>> import numpy as np - >>> from mindspore.dataset import vision + >>> import mindspore.dataset.vision as vision >>> # Generate a random image with height=120, width=340, channels=3 >>> image = np.random.randint(256, size=(120, 340, 3), dtype=np.uint8) >>> jpeg_data = vision.encode_jpeg(image) @@ -471,7 +471,7 @@ def write_file(filename, data): RuntimeError: If the shape of `data` is not a one-dimensional array. Examples: - >>> from mindspore.dataset import vision + >>> import mindspore.dataset.vision as vision >>> vision.write_file("/path/to/file", data) """ if not isinstance(filename, str):