!27715 dataset: fix api description error

Merge pull request !27715 from ms_yan/code_docs_api_format
This commit is contained in:
i-robot 2021-12-15 08:08:22 +00:00 committed by Gitee
commit 00da76efd9
3 changed files with 10 additions and 10 deletions

View File

@ -1,13 +1,13 @@
mindspore.dataset.MindDataset
==============================
.. py:class:: mindspore.dataset.MindDataset(dataset_file, columns_list=None, num_parallel_workers=None, shuffle=None, num_shards=None, shard_id=None, sampler=None, padded_sample=None, num_padded=None, num_samples=None, cache=None)
.. py:class:: mindspore.dataset.MindDataset(dataset_files, columns_list=None, num_parallel_workers=None, shuffle=None, num_shards=None, shard_id=None, sampler=None, padded_sample=None, num_padded=None, num_samples=None, cache=None)
读取和解析MindRecord数据文件作为源数据集。生成的数据集的列名和列类型取决于MindRecord文件中的保存的列名与类型。
**参数:**
- **dataset_file** (Union[str, list[str]]) - MindRecord文件路径支持单文件路径字符串、多文件路径字符串列表。如果 `dataset_file` 的类型是字符串则它代表一组具有相同前缀名的MindRecord文件同一路径下具有相同前缀名的其他MindRecord文件将会被自动寻找并加载。如果 `dataset_file` 的类型是列表则它表示所需读取的MindRecord数据文件。
- **dataset_files** (Union[str, list[str]]) - MindRecord文件路径支持单文件路径字符串、多文件路径字符串列表。如果 `dataset_files` 的类型是字符串则它代表一组具有相同前缀名的MindRecord文件同一路径下具有相同前缀名的其他MindRecord文件将会被自动寻找并加载。如果 `dataset_files` 的类型是列表则它表示所需读取的MindRecord数据文件。
- **columns_list** (list[str],可选) - 指定从MindRecord文件中读取的数据列默认为None读取所有列
- **num_parallel_workers** (int可选) - 指定读取数据的工作线程数默认值None即使用mindspore.dataset.config中配置的线程数
- **shuffle** (Union[bool, Shuffle level], 可选) - 每个epoch中数据混洗的模式默认为为mindspore.dataset.Shuffle.GLOBAL。如果为False则不混洗如果为True等同于将 `shuffle` 设置为mindspore.dataset.Shuffle.GLOBAL。另外也可以传入枚举变量设置shuffle级别
@ -63,7 +63,7 @@
**样例:**
>>> mind_dataset_dir = ["/path/to/mind_dataset_file"] # 此列表可以包含1个或多个MindRecord文件
>>> dataset = ds.MindDataset(dataset_file=mind_dataset_dir)
>>> dataset = ds.MindDataset(dataset_files=mind_dataset_dir)
.. include:: mindspore.dataset.Dataset.add_sampler.rst

View File

@ -46,8 +46,8 @@ class Vocab(cde.Vocab):
This would collect all unique words in a dataset and return a vocab within
the frequency range specified by user in freq_range. User would be warned if no words fall into the frequency.
Words in vocab are ordered from highest frequency to lowest frequency. Words with the same frequency would be
ordered lexicographically.
Words in vocab are ordered from the highest frequency to the lowest frequency. Words with the same frequency
would be ordered lexicographically.
Args:
dataset(Dataset): dataset to build vocab from.
@ -86,7 +86,7 @@ class Vocab(cde.Vocab):
Args:
word_list(list): A list of string where each element is a word of type string.
special_tokens(list, optional): A list of strings, each one is a special token. for example
special_tokens(list, optional): A list of strings, each one is a special token. For example
special_tokens=["<pad>","<unk>"] (default=None, no special tokens will be added).
special_first(bool, optional): Whether special_tokens is prepended or appended to vocab. If special_tokens
is specified and special_first is set to True, special_tokens will be prepended (default=True).
@ -112,7 +112,7 @@ class Vocab(cde.Vocab):
delimiter (str, optional): A delimiter to break up each line in file, the first element is taken to be
the word (default="").
vocab_size (int, optional): Number of words to read from file_path (default=None, all words are taken).
special_tokens (list, optional): A list of strings, each one is a special token. for example
special_tokens (list, optional): A list of strings, each one is a special token. For example
special_tokens=["<pad>","<unk>"] (default=None, no special tokens will be added).
special_first (bool, optional): Whether special_tokens will be prepended/appended to vocab,
If special_tokens is specified and special_first is set to True,
@ -262,7 +262,7 @@ def to_str(array, encoding='utf8'):
Args:
array (numpy.ndarray): Array of `bytes` type representing strings.
encoding (str): Indicating the charset for decoding.
encoding (str): Indicating the charset for decoding (default='utf8').
Returns:
numpy.ndarray, NumPy array of `str`.
@ -286,7 +286,7 @@ def to_bytes(array, encoding='utf8'):
Args:
array (numpy.ndarray): Array of `str` type representing strings.
encoding (str): Indicating the charset for encoding.
encoding (str): Indicating the charset for encoding (default='utf8').
Returns:
numpy.ndarray, NumPy array of `bytes`.

View File

@ -286,7 +286,7 @@ class PadEnd(TensorOperation):
Args:
pad_shape (list(int)): List of integers representing the shape needed. Dimensions that set to `None` will
not be padded (i.e., original dim will be used). Shorter dimensions will truncate the values.
pad_value (Union[str, bytes, int, float, bool]), optional): Value used to pad. Default to 0 or empty
pad_value (Union[str, bytes, int, float, bool], optional): Value used to pad. Default to 0 or empty
string in case of tensors of strings.
Examples: