!27715 dataset: fix api description error

Merge pull request !27715 from ms_yan/code_docs_api_format
2021-12-15 08:08:22 +00:00 · 2021-12-15 08:08:22 +00:00 · 00da76efd9
parent daae93cff3 a32883ed04
commit 00da76efd9
3 changed files with 10 additions and 10 deletions
--- a/docs/api/api_python/dataset/mindspore.dataset.MindDataset.rst
+++ b/docs/api/api_python/dataset/mindspore.dataset.MindDataset.rst
@ -1,13 +1,13 @@
 mindspore.dataset.MindDataset
 ==============================

-.. py:class:: mindspore.dataset.MindDataset(dataset_file, columns_list=None, num_parallel_workers=None, shuffle=None, num_shards=None, shard_id=None, sampler=None, padded_sample=None, num_padded=None, num_samples=None, cache=None)
+.. py:class:: mindspore.dataset.MindDataset(dataset_files, columns_list=None, num_parallel_workers=None, shuffle=None, num_shards=None, shard_id=None, sampler=None, padded_sample=None, num_padded=None, num_samples=None, cache=None)

    读取和解析MindRecord数据文件作为源数据集。生成的数据集的列名和列类型取决于MindRecord文件中的保存的列名与类型。

    **参数：**

-    - **dataset_file** (Union[str, list[str]]) - MindRecord文件路径，支持单文件路径字符串、多文件路径字符串列表。如果 `dataset_file` 的类型是字符串，则它代表一组具有相同前缀名的MindRecord文件，同一路径下具有相同前缀名的其他MindRecord文件将会被自动寻找并加载。如果 `dataset_file` 的类型是列表，则它表示所需读取的MindRecord数据文件。
+    - **dataset_files** (Union[str, list[str]]) - MindRecord文件路径，支持单文件路径字符串、多文件路径字符串列表。如果 `dataset_files` 的类型是字符串，则它代表一组具有相同前缀名的MindRecord文件，同一路径下具有相同前缀名的其他MindRecord文件将会被自动寻找并加载。如果 `dataset_files` 的类型是列表，则它表示所需读取的MindRecord数据文件。
    - **columns_list** (list[str]，可选) - 指定从MindRecord文件中读取的数据列（默认为None，读取所有列）。
    - **num_parallel_workers** (int，可选) - 指定读取数据的工作线程数（默认值None，即使用mindspore.dataset.config中配置的线程数）。
    - **shuffle** (Union[bool, Shuffle level], 可选) - 每个epoch中数据混洗的模式（默认为为mindspore.dataset.Shuffle.GLOBAL）。如果为False，则不混洗；如果为True，等同于将 `shuffle` 设置为mindspore.dataset.Shuffle.GLOBAL。另外也可以传入枚举变量设置shuffle级别：
@ -63,7 +63,7 @@
    **样例：**

    >>> mind_dataset_dir = ["/path/to/mind_dataset_file"] # 此列表可以包含1个或多个MindRecord文件
-    >>> dataset = ds.MindDataset(dataset_file=mind_dataset_dir)
+    >>> dataset = ds.MindDataset(dataset_files=mind_dataset_dir)

    .. include:: mindspore.dataset.Dataset.add_sampler.rst

--- a/mindspore/dataset/text/utils.py
+++ b/mindspore/dataset/text/utils.py
@ -46,8 +46,8 @@ class Vocab(cde.Vocab):

        This would collect all unique words in a dataset and return a vocab within
        the frequency range specified by user in freq_range. User would be warned if no words fall into the frequency.
-        Words in vocab are ordered from highest frequency to lowest frequency. Words with the same frequency would be
-        ordered lexicographically.
+        Words in vocab are ordered from the highest frequency to the lowest frequency. Words with the same frequency
+        would be ordered lexicographically.

        Args:
            dataset(Dataset): dataset to build vocab from.
@ -86,7 +86,7 @@ class Vocab(cde.Vocab):

        Args:
            word_list(list): A list of string where each element is a word of type string.
-            special_tokens(list, optional):  A list of strings, each one is a special token. for example
+            special_tokens(list, optional):  A list of strings, each one is a special token. For example
                special_tokens=["<pad>","<unk>"] (default=None, no special tokens will be added).
            special_first(bool, optional): Whether special_tokens is prepended or appended to vocab. If special_tokens
                is specified and special_first is set to True, special_tokens will be prepended (default=True).
@ -112,7 +112,7 @@ class Vocab(cde.Vocab):
            delimiter (str, optional): A delimiter to break up each line in file, the first element is taken to be
                the word (default="").
            vocab_size (int, optional): Number of words to read from file_path (default=None, all words are taken).
-            special_tokens (list, optional):  A list of strings, each one is a special token. for example
+            special_tokens (list, optional):  A list of strings, each one is a special token. For example
                special_tokens=["<pad>","<unk>"] (default=None, no special tokens will be added).
            special_first (bool, optional): Whether special_tokens will be prepended/appended to vocab,
                If special_tokens is specified and special_first is set to True,
@ -262,7 +262,7 @@ def to_str(array, encoding='utf8'):

    Args:
        array (numpy.ndarray): Array of `bytes` type representing strings.
-        encoding (str): Indicating the charset for decoding.
+        encoding (str): Indicating the charset for decoding (default='utf8').

    Returns:
        numpy.ndarray, NumPy array of `str`.
@ -286,7 +286,7 @@ def to_bytes(array, encoding='utf8'):

    Args:
        array (numpy.ndarray): Array of `str` type representing strings.
-        encoding (str): Indicating the charset for encoding.
+        encoding (str): Indicating the charset for encoding (default='utf8').

    Returns:
        numpy.ndarray, NumPy array of `bytes`.
--- a/mindspore/dataset/transforms/c_transforms.py
+++ b/mindspore/dataset/transforms/c_transforms.py
@ -286,7 +286,7 @@ class PadEnd(TensorOperation):
    Args:
        pad_shape (list(int)): List of integers representing the shape needed. Dimensions that set to `None` will
            not be padded (i.e., original dim will be used). Shorter dimensions will truncate the values.
-        pad_value (Union[str, bytes, int, float, bool]), optional): Value used to pad. Default to 0 or empty
+        pad_value (Union[str, bytes, int, float, bool], optional): Value used to pad. Default to 0 or empty
            string in case of tensors of strings.

    Examples: