add approver for md and some comment

2022-06-30 19:32:58 +08:00 · 2022-06-30 19:32:58 +08:00 · 06bfa2b96c
parent 5d5b8f7a3d
commit 06bfa2b96c
9 changed files with 32 additions and 18 deletions
--- a/docs/api/api_python/dataset/mindspore.dataset.Dataset.d.rst
+++ b/docs/api/api_python/dataset/mindspore.dataset.Dataset.d.rst
@ -259,7 +259,7 @@

 .. py:method:: save(file_name, num_files=1, file_type='mindrecord')

-    将数据处理管道中正处理的数据保存为通用的数据集格式。支持的数据集格式：'mindrecord'。
+    将数据处理管道中正处理的数据保存为通用的数据集格式。支持的数据集格式：'mindrecord'，然后可以使用'MindDataset'类来读取保存的'mindrecord'文件。

    将数据保存为'mindrecord'格式时存在隐式类型转换。转换表展示如何执行类型转换。

--- a/docs/api/api_python/dataset/mindspore.dataset.Dataset.rst
+++ b/docs/api/api_python/dataset/mindspore.dataset.Dataset.rst
@ -32,7 +32,8 @@
    - **drop_remainder** (bool, 可选) - 当最后一个批处理数据包含的数据条目小于 `batch_size` 时，是否将该批处理丢弃，不传递给下一个操作。默认值：False，不丢弃。
    - **num_parallel_workers** (int, 可选) - 指定 `batch` 操作的并发进程数/线程数（由参数 `python_multiprocessing` 决定当前为多进程模式或多线程模式）。
      默认值：None，使用mindspore.dataset.config中配置的线程数。
-    - **per_batch_map** (callable, 可选) - 可调用对象，以(list[numpy.ndarray], list[numpy.ndarray], ..., BatchInfo)作为输入参数，
+    - **per_batch_map** (Callable[[List[numpy.ndarray], ..., List[numpy.ndarray], BatchInfo], (List[numpy.ndarray],
+      ..., List[numpy.ndarray])], optional, 可选) - 可调用对象，以(list[numpy.ndarray], list[numpy.ndarray], ..., BatchInfo)作为输入参数，
      处理后返回(list[numpy.ndarray], list[numpy.ndarray],...)作为新的数据列。输入参数中每个list[numpy.ndarray]代表给定数据列中的一批numpy.ndarray，
      list[numpy.ndarray]的个数应与 `input_columns` 中传入列名的数量相匹配，在返回的(list[numpy.ndarray], list[numpy.ndarray], ...)中，
      list[numpy.ndarray]的个数应与输入相同，如果输出列数与输入列数不一致，则需要指定 `output_columns`。该可调用对象的最后一个输入参数始终是BatchInfo，
--- a/mindspore/ccsrc/minddata/OWNERS
+++ b/mindspore/ccsrc/minddata/OWNERS
@ -6,6 +6,8 @@ approvers:
 - tom__chen
 - jonyguo
 - tiancixiao
+- h.farahat
+- dessyang
 reviewers:
 - luoyang42
 - ms_yan
--- a/mindspore/lite/minddata/OWNERS
+++ b/mindspore/lite/minddata/OWNERS
@ -6,6 +6,8 @@ approvers:
 - tom__chen
 - jonyguo
 - tiancixiao
+- h.farahat
+- dessyang
 reviewers:
 - luoyang42
 - ms_yan
--- a/mindspore/lite/tools/dataset/OWNERS
+++ b/mindspore/lite/tools/dataset/OWNERS
@ -6,6 +6,8 @@ approvers:
 - tom__chen
 - jonyguo
 - tiancixiao
+- h.farahat
+- dessyang
 reviewers:
 - luoyang42
 - ms_yan
--- a/mindspore/python/mindspore/dataset/OWNERS
+++ b/mindspore/python/mindspore/dataset/OWNERS
@ -6,6 +6,8 @@ approvers:
 - tom__chen
 - jonyguo
 - tiancixiao
+- h.farahat
+- dessyang
 reviewers:
 - luoyang42
 - ms_yan
--- a/mindspore/python/mindspore/dataset/engine/datasets.py
+++ b/mindspore/python/mindspore/dataset/engine/datasets.py
@ -550,8 +550,9 @@ class Dataset:
                be dropped and not propagated to the child node.
            num_parallel_workers (int, optional): Number of workers(threads) to process the dataset in parallel
                (default=None).
-            per_batch_map (callable, optional): Per batch map callable (default=None). A callable which takes
-                (list[numpy.ndarray], list[numpy.ndarray], ..., BatchInfo) as input parameters. Each
+            per_batch_map (Callable[[List[numpy.ndarray], ..., List[numpy.ndarray], BatchInfo], (List[numpy.ndarray],
+                ..., List[numpy.ndarray])], optional): Per batch map callable (default=None). A callable
+                which takes (list[numpy.ndarray], list[numpy.ndarray], ..., BatchInfo) as input parameters. Each
                list[numpy.ndarray] represents a batch of numpy.ndarray on a given column. The number of lists should
                match with the number of entries in input_columns. The last parameter of the callable should always be
                a BatchInfo object. Per_batch_map should return (list[numpy.ndarray], list[numpy.ndarray], ...). The
@ -695,12 +696,12 @@ class Dataset:
        """
        Map `func` to each row in dataset and flatten the result.

-        The specified `func` is a function that must take one 'Ndarray' as input
-        and return a 'Dataset'.
+        The specified `func` is a function that must take one `numpy.ndarray` as input
+        and return a `Dataset`.

        Args:
-            func (function): A function that must take one 'Ndarray' as an argument and
-                return a 'Dataset'.
+            func (function): A function that must take one `numpy.ndarray` as an argument and
+                return a `Dataset`.

        Returns:
            Dataset, dataset applied by the function.
@ -1244,8 +1245,8 @@ class Dataset:
        Apply a function in this dataset.

        Args:
-            apply_func (function): A function that must take one 'Dataset' as an argument and
-                                   return a preprocessed 'Dataset'.
+            apply_func (function): A function that must take one `Dataset` as an argument and
+                                   return a preprocessed `Dataset`.

        Returns:
            Dataset, dataset applied by the function.
@ -1319,16 +1320,16 @@ class Dataset:
    def save(self, file_name, num_files=1, file_type='mindrecord'):
        """
        Save the dynamic data processed by the dataset pipeline in common dataset format.
-        Supported dataset formats: 'mindrecord' only
+        Supported dataset formats: `mindrecord` only. And you can use `MindDataset` API to read the saved file(s).

-        Implicit type casting exists when saving data as 'mindrecord'. The transform table shows how to do type casting.
+        Implicit type casting exists when saving data as `mindrecord`. The transform table shows how to do type casting.

-        .. list-table:: Implicit Type Casting when Saving as 'mindrecord'
+        .. list-table:: Implicit Type Casting when Saving as `mindrecord`
           :widths: 25 25 50
           :header-rows: 1

-           * - Type in 'dataset'
-             - Type in 'mindrecord'
+           * - Type in `dataset`
+             - Type in `mindrecord`
             - Details
           * - bool
             - None
@ -1400,7 +1401,7 @@ class Dataset:
    @check_tuple_iterator
    def create_tuple_iterator(self, columns=None, num_epochs=-1, output_numpy=False, do_copy=True):
        """
-        Create an iterator over the dataset. The datatype retrieved back will be a list of ndarrays.
+        Create an iterator over the dataset. The datatype retrieved back will be a list of `numpy.ndarray`.

        To specify which columns to list and the order needed, use columns_list. If columns_list
        is not provided, the order of the columns will remain unchanged.
@ -3859,9 +3860,9 @@ class Schema:
        Args:
            columns (Union[dict, list[dict], tuple[dict]]): Dataset attribute information, decoded from schema file.

-                - list[dict], 'name' and 'type' must be in keys, 'shape' optional.
+                - list[dict], `name` and `type` must be in keys, `shape` optional.

-                - dict, columns.keys() as name, columns.values() is dict, and 'type' inside, 'shape' optional.
+                - dict, columns.keys() as name, columns.values() is dict, and `type` inside, `shape` optional.

        Raises:
            RuntimeError: If failed to parse columns.
--- a/mindspore/python/mindspore/mindrecord/OWNERS
+++ b/mindspore/python/mindspore/mindrecord/OWNERS
@ -6,6 +6,8 @@ approvers:
 - tom__chen
 - jonyguo
 - tiancixiao
+- h.farahat
+- dessyang
 reviewers:
 - luoyang42
 - ms_yan
--- a/tests/OWNERS
+++ b/tests/OWNERS
@ -59,6 +59,8 @@ approvers:
 - HulkTang
 - hwcaifubi
 - zichun_ye
+- h.farahat
+- dessyang
 reviewers:
 - nicholas_yhr
 - liubuyu