!45318 Fix the problem of Chinese API document

Merge pull request !45318 from 刘勇琪/code_docs_modify_chinese_api
2022-12-09 02:59:32 +00:00 · 2022-12-09 02:59:32 +00:00 · c10184ad88
parent cf1a49929b 7ff2e16bdf
commit c10184ad88
12 changed files with 263 additions and 29 deletions
--- a/docs/api/api_python/dataset/mindspore.dataset.ArgoverseDataset.rst
+++ b/docs/api/api_python/dataset/mindspore.dataset.ArgoverseDataset.rst
@ -1,7 +1,7 @@
 mindspore.dataset.ArgoverseDataset
 ====================================

-.. py:class:: mindspore.dataset.ArgoverseDataset(data_dir, column_names="graph", shuffle=None, num_parallel_workers=1, python_multiprocessing=True, perf_mode=True)
+.. py:class:: mindspore.dataset.ArgoverseDataset(data_dir, column_names="graph", num_parallel_workers=1, shuffle=None, python_multiprocessing=True, perf_mode=True)

    加载argoverse数据集并进行图（Graph）初始化。

@ -16,6 +16,45 @@
        - **python_multiprocessing** (bool，可选) - 启用Python多进程模式加速运算。默认值：True。当传入 `source` 的Python对象的计算量很大时，开启此选项可能会有较好效果。
        - **perf_mode** (bool，可选) - 遍历创建的dataset对象时获得更高性能的模式（在此过程中将调用 `__getitem__` 方法）。默认值：True，将Graph的所有数据（如边的索引、节点特征和图的特征）都作为图特征进行存储。

+    异常：
+        - **TypeError** - 如果 `data_dir` 不是str类型。
+        - **TypeError** - 如果 `num_parallel_workers` 不是int类型。
+        - **TypeError** - 如果 `shuffle` 不是bool类型。
+        - **TypeError** - 如果 `python_multiprocessing` 不是bool类型。
+        - **TypeError** - 如果 `perf_mode` 不是bool类型。
+        - **RuntimeError** - 如果 `data_dir` 无效或不存在。
+        - **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
+
+    **关于Argoverse数据集：**
+
+    Argoverse是第一个包含高精地图的数据集，它包含了290KM的带有几何形状和语义信息的高精度地图数据。
+
+    可以将数据集文件解压缩到以下结构中，并通过MindSpore的API读取：
+
+    .. code-block::
+
+        .
+        └── argoversedataset_dir
+            ├── train
+            │    ├──...
+            ├── val
+            │    └──...
+            ├── test
+            │    └──...
+
+    **引用：**
+
+    .. code-block::
+
+        @inproceedings{Argoverse,
+        author     = {Ming-Fang Chang and John W Lambert and Patsorn Sangkloy and Jagjeet Singh
+                   and Slawomir Bak and Andrew Hartnett and De Wang and Peter Carr
+                   and Simon Lucey and Deva Ramanan and James Hays},
+        title      = {Argoverse: 3D Tracking and Forecasting with Rich Maps},
+        booktitle  = {Conference on Computer Vision and Pattern Recognition (CVPR)},
+        year       = {2019}
+        }
+

    .. py:method:: load()

--- a/docs/api/api_python/dataset/mindspore.dataset.EnWik9Dataset.rst
+++ b/docs/api/api_python/dataset/mindspore.dataset.EnWik9Dataset.rst
@ -3,15 +3,13 @@ mindspore.dataset.EnWik9Dataset

 .. py:class:: mindspore.dataset.EnWik9Dataset(dataset_dir, num_samples=None, num_parallel_workers=None, shuffle=True, num_shards=None, shard_id=None, cache=None)

-    读取和解析EnWik9 Full和EnWik9 Polarity数据集。
+    读取和解析EnWik9数据集。

    生成的数据集有一列 `[text]` ，数据类型为string。

    参数：
        - **dataset_dir** (str) - 包含数据集文件的根目录路径。
-        - **num_samples** (int, 可选) - 指定从数据集中读取的样本数。
-          对于Polarity数据集， 'train'将读取360万个训练样本， 'test'将读取40万个测试样本， 'all'将读取所有400万个样本。
-          对于Full数据集， 'train'将读取300万个训练样本， 'test'将读取65万个测试样本， 'all'将读取所有365万个样本。默认值：None，读取所有样本。
+        - **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值：None，读取所有样本。
        - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值：None，使用mindspore.dataset.config中配置的线程数。
        - **shuffle** (Union[bool, Shuffle], 可选) - 每个epoch中数据混洗的模式，支持传入bool类型与枚举类型进行指定。默认值：True。
          如果 `shuffle` 为False，则不混洗，如果 `shuffle` 为True，等同于将 `shuffle` 设置为mindspore.dataset.Shuffle.GLOBAL。
--- a/docs/api/api_python/dataset/mindspore.dataset.IMDBDataset.rst
+++ b/docs/api/api_python/dataset/mindspore.dataset.IMDBDataset.rst
@ -10,9 +10,7 @@ mindspore.dataset.IMDBDataset
    参数：
        - **dataset_dir** (str) - 包含数据集文件的根目录路径。
        - **usage** (str, 可选) - 指定数据集的子集，可取值为 'train'， 'test'或 'all'。默认值：None，读取全部样本。
-        - **num_samples** (int, 可选) - 指定从数据集中读取的样本数。
-          对于Polarity数据集， 'train'将读取360万个训练样本， 'test'将读取40万个测试样本， 'all'将读取所有400万个样本。
-          对于Full数据集， 'train'将读取300万个训练样本， 'test'将读取65万个测试样本， 'all'将读取所有365万个样本。默认值：None，读取所有样本。
+        - **num_samples** (int, 可选) - 指定从数据集中读取的样本数。默认值：None，读取所有样本。
        - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值：None，使用mindspore.dataset.config中配置的线程数。
        - **shuffle** (bool, 可选) - 是否混洗数据集。默认值：None。下表中会展示不同参数配置的预期行为。
        - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值：None。下表中会展示不同配置的预期行为。
--- a/docs/api/api_python/dataset/mindspore.dataset.IWSLT2017Dataset.rst
+++ b/docs/api/api_python/dataset/mindspore.dataset.IWSLT2017Dataset.rst
@ -33,7 +33,7 @@ mindspore.dataset.IWSLT2017Dataset
        - **RuntimeError** - 指定了 `shard_id` 参数，但是未指定 `num_shards` 参数。
        - **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。

-    **关于IWSLT2016数据集：**
+    **关于IWSLT2017数据集：**

    IWSLT是一个专门讨论口译各个方面的重要年度科学会议。IWSLT评估活动中的MT任务被构成一个数据集，该数据集可通过 `wit3 <https://wit3.fbk.eu>`_ 公开获取。
    IWSLT2017数据集中有德语、英语、意大利语、荷兰语和罗马尼亚语，数据集包括其中任何两种语言的翻译。
--- a/docs/api/api_python/dataset/mindspore.dataset.InMemoryGraphDataset.rst
+++ b/docs/api/api_python/dataset/mindspore.dataset.InMemoryGraphDataset.rst
@ -21,6 +21,18 @@
        - **python_multiprocessing** (bool，可选) - 启用Python多进程模式加速运算。默认值：True。当传入 `source` 的Python对象的计算量很大时，开启此选项可能会有较好效果。
        - **max_rowsize** (int, 可选) - 指定在多进程之间复制数据时，共享内存分配的最大空间。默认值：6，单位为MB。仅当参数 `python_multiprocessing` 设为True时，此参数才会生效。

+    异常：
+        - **TypeError** - 如果 `data_dir` 不是str类型。
+        - **TypeError** - 如果 `save_dir` 不是str类型。
+        - **TypeError** - 如果 `num_parallel_workers` 不是int类型。
+        - **TypeError** - 如果 `shuffle` 不是bool类型。
+        - **TypeError** - 如果 `python_multiprocessing` 不是bool类型。
+        - **TypeError** - 如果 `perf_mode` 不是bool类型。
+        - **RuntimeError** - 如果 `data_dir` 无效或不存在。
+        - **RuntimeError** - 指定了 `num_shards` 参数，但是未指定 `shard_id` 参数。
+        - **RuntimeError** - 指定了 `shard_id` 参数，但是未指定 `num_shards` 参数。
+        - **ValueError** - `num_parallel_workers` 参数超过系统最大线程数。
+
    .. py:method:: load()

        从给定（处理好的）路径加载数据，也可以在自己实现的Dataset类中实现这个方法。
--- a/docs/api/api_python/dataset/mindspore.dataset.ManifestDataset.rst
+++ b/docs/api/api_python/dataset/mindspore.dataset.ManifestDataset.rst
@ -60,5 +60,25 @@
         - False
         - 不允许

+    **关于Manifest数据集：**
+
+    Manifest文件包含数据集中包含的文件列表，包括文件名和文件ID等基本文件信息，以及扩展文件元数据。
+    Manifest是华为ModelArts支持的数据格式文件，详细说明请参见`Manifest文档 <https://support.huaweicloud.com/engineers-modelarts/modelarts_23_0009.html>`_ 。
+
+    以下是原始Manifest数据集结构。可以将数据集文件解压缩到此目录结构中，并由MindSpore的API读取。
+
+    .. code-block::
+
+        .
+        └── manifest_dataset_directory
+            ├── train
+            │    ├── 1.JPEG
+            │    ├── 2.JPEG
+            │    ├── ...
+            ├── eval
+            │    ├── 1.JPEG
+            │    ├── 2.JPEG
+            │    ├── ...
+

 .. include:: mindspore.dataset.api_list_vision.rst
--- a/docs/api/api_python/dataset/mindspore.dataset.OBSMindDataset.rst
+++ b/docs/api/api_python/dataset/mindspore.dataset.OBSMindDataset.rst
@ -26,7 +26,7 @@
        - **shard_id** (int, 可选) - 指定分布式训练时使用的分片ID号。默认值：None。只有当指定了 `num_shards` 时才能指定此参数。
        - **shard_equal_rows** (bool, 可选) - 分布式训练时，为所有分片获取等量的数据行数。默认值：True。
          如果 `shard_equal_rows` 为False，则可能会使得每个分片的数据条目不相等，从而导致分布式训练失败。
-          因此当每个TFRecord文件的数据数量不相等时，建议将此参数设置为True。注意，只有当指定了 `num_shards` 时才能指定此参数。
+          因此当每个MindRecord文件的数据数量不相等时，建议将此参数设置为True。注意，只有当指定了 `num_shards` 时才能指定此参数。

    异常：
        - **RuntimeError** - `sync_obs_path` 参数指定的目录不存在。
--- a/docs/api/api_python/dataset/mindspore.dataset.SVHNDataset.rst
+++ b/docs/api/api_python/dataset/mindspore.dataset.SVHNDataset.rst
@ -11,7 +11,7 @@ mindspore.dataset.SVHNDataset
        - **dataset_dir** (str) - 包含数据集文件的根目录路径。
        - **usage** (str, 可选) - 指定数据集的子集，可取值为 'train'、 'test'、 'extra'或 'all'。默认值：None，读取全部样本图片。
        - **num_samples** (int, 可选) - 指定从数据集中读取的样本数，可以小于数据集总数。默认值：None，读取全部样本图片。
-        - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值：1，使用mindspore.dataset.config中配置的线程数。
+        - **num_parallel_workers** (int, 可选) - 指定读取数据的工作线程数。默认值：1。
        - **shuffle** (bool, 可选) - 是否混洗数据集。默认值：None。下表中会展示不同参数配置的预期行为。
        - **sampler** (Sampler, 可选) - 指定从数据集中选取样本的采样器。默认值：None。下表中会展示不同配置的预期行为。
        - **num_shards** (int, 可选) - 指定分布式训练时将数据集进行划分的分片数。默认值：None。指定此参数后， `num_samples` 表示每个分片的最大样本数。
--- a/mindspore/python/mindspore/dataset/engine/datasets.py
+++ b/mindspore/python/mindspore/dataset/engine/datasets.py
@ -450,6 +450,9 @@ class Dataset:

        Returns:
            str, JSON string of the pipeline.
+
+        Examples:
+            >>> dataset_json = dataset.to_json("/path/to/mnist_dataset_pipeline.json")
        """
        ir_tree, _ = self.create_ir_tree()
        return json.loads(ir_tree.to_json(filename))
@ -1316,6 +1319,14 @@ class Dataset:

        Returns:
            Dataset, dataset for transferring.
+
+        Examples:
+            >>> data = ds.TFRecordDataset('/path/to/TF_FILES', '/path/to/TF_SCHEMA_FILE', shuffle=ds.Shuffle.FILES)
+            >>>
+            >>> data = data.device_que()
+            >>> data.send()
+            >>> time.sleep(0.1)
+            >>> data.stop_send()
        """
        return TransferDataset(self, send_epoch_end, create_data_info_queue)

@ -1389,6 +1400,17 @@ class Dataset:
            num_files (int, optional): Number of dataset files. Default: 1.
            file_type (str, optional): Dataset format. Default: 'mindrecord'.

+        Examples:
+            >>> import numpy as np
+            >>>
+            >>> def generator_1d():
+            ...     for i in range(10):
+            ...         yield (np.array([i]),)
+            >>>
+            >>>
+            >>> # apply dataset operations
+            >>> d1 = ds.GeneratorDataset(generator_1d, ["data"], shuffle=False)
+            >>> d1.save('/path/to/save_file')
        """
        ir_tree, api_tree = self.create_ir_tree()

@ -1689,6 +1711,39 @@ class Dataset:
                When num_batch is None, it will default to the number specified by the
                sync_wait operation. Default: None.
            data (Any): The data passed to the callback, user defined. Default: None.
+
+        Examples:
+            >>> import numpy as np
+            >>>
+            >>>
+            >>> def gen():
+            ...     for i in range(100):
+            ...         yield (np.array(i),)
+            >>>
+            >>>
+            >>> class Augment:
+            ...     def __init__(self, loss):
+            ...         self.loss = loss
+            ...
+            ...     def preprocess(self, input_):
+            ...         return input_
+            ...
+            ...     def update(self, data):
+            ...         self.loss = data["loss"]
+            >>>
+            >>>
+            >>> batch_size = 10
+            >>> dataset = ds.GeneratorDataset(gen, column_names=["input"])
+            >>> aug = Augment(0)
+            >>> dataset = dataset.sync_wait(condition_name='', num_batch=1)
+            >>> dataset = dataset.map(input_columns=["input"], operations=[aug.preprocess])
+            >>> dataset = dataset.batch(batch_size)
+            >>>
+            >>> count = 0
+            >>> for data in dataset.create_dict_iterator(num_epochs=1, output_numpy=True):
+            ...     count += 1
+            ...     data = {"loss": count}
+            ...     dataset.sync_update(condition_name="", data=data)
        """
        if (not isinstance(num_batch, int) and num_batch is not None) or \
                (isinstance(num_batch, int) and num_batch <= 0):
@ -1761,7 +1816,18 @@ class Dataset:
        return {}

    def reset(self):
-        """Reset the dataset for next epoch."""
+        """
+        Reset the dataset for next epoch.
+
+        Examples:
+            >>> mind_dataset_dir = ["/path/to/mind_dataset_file"]
+            >>> data_set = ds.MindDataset(dataset_files=mind_dataset_dir)
+            >>> for _ in range(5):
+            ...     num_iter = 0
+            ...     for data in dataset.create_tuple_iterator(num_epochs=1, output_numpy=True):
+            ...         num_iter += 1
+            ...     data_set.reset()
+        """

    def is_shuffled(self):
        """Returns True if the dataset or its children is shuffled."""
@ -3797,6 +3863,12 @@ class Schema:

        Raises:
            ValueError: If column type is unknown.
+
+        Examples:
+        >>> from mindspore import dtype as mstype
+        >>>
+        >>> schema = ds.Schema()
+        >>> schema.add_column('col_1d', de_type=mstype.int64, shape=[2])
        """
        if isinstance(de_type, typing.Type):
            de_type = mstype_to_detype(de_type)
@ -3841,6 +3913,12 @@ class Schema:

        Returns:
            str, JSON string of the schema.
+
+        Examples:
+            >>> from mindspore.dataset import Schema
+            >>>
+            >>> schema1 = ds.Schema()
+            >>> schema2 = schema1.to_json()
        """
        return self.cpp_schema.to_json()

@ -3855,6 +3933,16 @@ class Schema:
            RuntimeError: if there is unknown item in the object.
            RuntimeError: if dataset type is missing in the object.
            RuntimeError: if columns are missing in the object.
+
+        Examples:
+            >>> import json
+            >>>
+            >>> from mindspore.dataset import Schema
+            >>>
+            >>> with open("/path/to/schema_file") as file:
+            ...     json_obj = json.load(file)
+            ...     schema = ds.Schema()
+            ...     schema.from_json(json_obj)
        """
        self.cpp_schema.from_string(json.dumps(json_obj, indent=2))

--- a/mindspore/python/mindspore/dataset/engine/datasets_text.py
+++ b/mindspore/python/mindspore/dataset/engine/datasets_text.py
@ -647,17 +647,14 @@ class DBpediaDataset(SourceDataset, TextBaseDataset):

 class EnWik9Dataset(SourceDataset, TextBaseDataset):
    """
-    A source dataset that reads and parses EnWik9 Polarity and EnWik9 Full datasets.
+    A source dataset that reads and parses EnWik9 datasets.

    The generated dataset has one column :py:obj:`[text]` with type string.

    Args:
        dataset_dir (str): Path to the root directory that contains the dataset.
        num_samples (int, optional): The number of samples to be included in the dataset.
-            For Polarity dataset, 'train' will read from 3,600,000 train samples, 'test' will read from 400,000 test
-            samples, 'all' will read from all 4,000,000 samples.
-            For Full dataset, 'train' will read from 3,000,000 train samples, 'test' will read from 650,000 test
-            samples, 'all' will read from all 3,650,000 samples. Default: None, will include all samples.
+            Default: None, will include all samples.
        num_parallel_workers (int, optional): Number of workers to read the data.
            Default: None, number set in the mindspore.dataset.config.
        shuffle (Union[bool, Shuffle], optional): Perform reshuffling of the data every epoch.
@ -744,9 +741,6 @@ class IMDBDataset(MappableDataset, TextBaseDataset):
        usage (str, optional): Usage of this dataset, can be 'train', 'test' or 'all'.
            Default: None, will read all samples.
        num_samples (int, optional): The number of images to be included in the dataset.
-            For Polarity dataset, 'train' will read from 3,600,000 train samples, 'test' will read from 400,000 test
-            samples, 'all' will read from all 4,000,000 samples. For Full dataset, 'train' will read from 3,000,000
-            train samples, 'test' will read from 650,000 test samples, 'all' will read from all 3,650,000 samples.
            Default: None, will include all samples.
        num_parallel_workers (int, optional): Number of workers to read the data.
            Default: None, number set in the mindspore.dataset.config.
--- a/mindspore/python/mindspore/dataset/engine/datasets_vision.py
+++ b/mindspore/python/mindspore/dataset/engine/datasets_vision.py
@ -3114,6 +3114,26 @@ class ManifestDataset(MappableDataset, VisionBaseDataset):
        >>>
        >>> # 2) Read samples (specified in manifest_file.manifest) for shard 0 in a 2-way distributed training setup
        >>> dataset = ds.ManifestDataset(dataset_file=manifest_dataset_dir, num_shards=2, shard_id=0)
+
+    About Manifest dataset:
+
+    Manifest file contains a list of files included in a dataset, including basic file info such as File name and File
+    ID, along with extended file metadata. Manifest is a data format file supported by Huawei Modelarts. For details,
+    see `Specifications for Importing the Manifest File <https://support.huaweicloud.com/engineers-modelarts/
+    modelarts_23_0009.html>`_ .
+
+    .. code-block::
+
+        .
+        └── manifest_dataset_directory
+            ├── train
+            │    ├── 1.JPEG
+            │    ├── 2.JPEG
+            │    ├── ...
+            ├── eval
+            │    ├── 1.JPEG
+            │    ├── 2.JPEG
+            │    ├── ...
    """

    @check_manifestdataset
--- a/mindspore/python/mindspore/dataset/engine/graphdata.py
+++ b/mindspore/python/mindspore/dataset/engine/graphdata.py
@ -198,6 +198,13 @@ class GraphData:
        Returns:
            numpy.ndarray, array of nodes.

+        Examples:
+            >>> from mindspore.dataset import GraphData
+            >>>
+            >>> g = ds.GraphData("/path/to/testdata", 1)
+            >>> edges = g.get_all_edges(0)
+            >>> nodes = g.get_nodes_from_edges(edges)
+
        Raises:
            TypeError: If `edge_list` is not list or ndarray.
        """
@ -488,6 +495,12 @@ class GraphData:
        Returns:
            dict, meta information of the graph. The key is node_type, edge_type, node_num, edge_num,
            node_feature_type and edge_feature_type.
+
+        Examples:
+        >>> from mindspore.dataset import GraphData
+        >>>
+        >>> g = ds.GraphData("/path/to/testdata", 2)
+        >>> graph_info = g.graph_info()
        """
        if self._working_mode == 'server':
            raise Exception("This method is not supported when working mode is server.")
@ -1282,17 +1295,29 @@ class InMemoryGraphDataset(GeneratorDataset):
            Default: 'graph'.
        num_samples (int, optional): The number of samples to be included in the dataset. Default: None, all samples.
        num_parallel_workers (int, optional): Number of subprocesses used to fetch the dataset in parallel. Default: 1.
-        shuffle (bool, optional): Whether or not to perform shuffle on the dataset.
-            Default: None, expected order behavior shown in the table below.
+        shuffle (bool, optional): Whether or not to perform shuffle on the dataset. This parameter can only be
+            specified when the implemented dataset has a random access attribute ( `__getitem__` ). Default: None.
        num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None.
            When this argument is specified, `num_samples` reflects the max
            sample number of per shard.
        shard_id (int, optional): The shard ID within `num_shards` . Default: None. This argument must be specified only
-            when num_shards is also specified.
+            when `num_shards` is also specified.
        python_multiprocessing (bool, optional): Parallelize Python operations with multiple worker process. This
            option could be beneficial if the Python operation is computational heavy. Default: True.
        max_rowsize(int, optional): Maximum size of row in MB that is used for shared memory allocation to copy
-            data between processes.  This is only used if python_multiprocessing is set to True. Default: 6 MB.
+            data between processes. This is only used if python_multiprocessing is set to True. Default: 6 MB.
+
+    Raises:
+        TypeError: If `data_dir` is not of type str.
+        TypeError: If `save_dir` is not of type str.
+        TypeError: If `num_parallel_workers` is not of type int.
+        TypeError: If `shuffle` is not of type bool.
+        TypeError: If `python_multiprocessing` is not of type bool.
+        TypeError: If `perf_mode` is not of type bool.
+        RuntimeError: If `data_dir` is not valid or does not exit.
+        RuntimeError: If `num_shards` is specified but `shard_id` is None.
+        RuntimeError: If `shard_id` is specified but `num_shards` is None.
+        ValueError: If `num_parallel_workers` exceeds the max thread numbers.

    Examples:
        >>> from mindspore.dataset import InMemoryGraphDataset, Graph
@ -1381,19 +1406,28 @@ class ArgoverseDataset(InMemoryGraphDataset):
    Args:
        data_dir (str): directory for loading dataset, here contains origin format data and will be loaded in
            `process` method.
-        column_names (Union[str, list[str]], optional): single column name or list of column names of the dataset,
-            num of column name should be equal to num of item in return data when implement method like `__getitem__` ,
-            recommend to specify it with
+        column_names (Union[str, list[str]], optional): single column name or list of column names of the dataset.
+            Default: "graph". Num of column name should be equal to num of item in return data when implement method
+            like `__getitem__`, recommend to specify it with
            `column_names=["edge_index", "x", "y", "cluster", "valid_len", "time_step_len"]` like the following example.
        num_parallel_workers (int, optional): Number of subprocesses used to fetch the dataset in parallel. Default: 1.
-        shuffle (bool, optional): Whether or not to perform shuffle on the dataset.
-            Default: None, expected order behavior shown in the table below.
+        shuffle (bool, optional): Whether or not to perform shuffle on the dataset. This parameter can only be
+            specified when the implemented dataset has a random access attribute ( `__getitem__` ). Default: None.
        python_multiprocessing (bool, optional): Parallelize Python operations with multiple worker process. This
            option could be beneficial if the Python operation is computational heavy. Default: True.
        perf_mode(bool, optional): mode for obtaining higher performance when iterate created dataset(will call
            `__getitem__` method in this process). Default True, will save all the data in graph
            (like edge index, node feature and graph feature) into graph feature.

+    Raises:
+        TypeError: If `data_dir` is not of type str.
+        TypeError: If `num_parallel_workers` is not of type int.
+        TypeError: If `shuffle` is not of type bool.
+        TypeError: If `python_multiprocessing` is not of type bool.
+        TypeError: If `perf_mode` is not of type bool.
+        RuntimeError: If `data_dir` is not valid or does not exit.
+        ValueError: If `num_parallel_workers` exceeds the max thread numbers.
+
    Examples:
        >>> from mindspore.dataset import ArgoverseDataset
        >>>
@ -1403,6 +1437,37 @@ class ArgoverseDataset(InMemoryGraphDataset):
        ...                                                "time_step_len"])
        >>> for item in graph_dataset.create_dict_iterator(output_numpy=True, num_epochs=1):
        ...     pass
+
+    About Argoverse Dataset:
+
+    Argverse is the first dataset containing high-precision maps, which contains 290KM high-precision map data with
+    geometric shape and semantic information.
+
+    You can unzip the dataset files into the following structure and read by MindSpore's API:
+
+    .. code-block::
+
+        .
+        └── argoverse_dataset_dir
+            ├── train
+            │    ├──...
+            ├── val
+            │    └──...
+            ├── test
+            │    └──...
+
+    Citation:
+
+    .. code-block::
+
+        @inproceedings{Argoverse,
+        author     = {Ming-Fang Chang and John W Lambert and Patsorn Sangkloy and Jagjeet Singh
+                   and Slawomir Bak and Andrew Hartnett and De Wang and Peter Carr
+                   and Simon Lucey and Deva Ramanan and James Hays},
+        title      = {Argoverse: 3D Tracking and Forecasting with Rich Maps},
+        booktitle  = {Conference on Computer Vision and Pattern Recognition (CVPR)},
+        year       = {2019}
+        }
    """

    def __init__(self, data_dir, column_names="graph", num_parallel_workers=1, shuffle=None,