add approver for md and some comment

This commit is contained in:
jonyguo 2022-06-30 19:32:58 +08:00
parent 5d5b8f7a3d
commit 06bfa2b96c
9 changed files with 32 additions and 18 deletions

View File

@ -259,7 +259,7 @@
.. py:method:: save(file_name, num_files=1, file_type='mindrecord')
将数据处理管道中正处理的数据保存为通用的数据集格式。支持的数据集格式:'mindrecord'。
将数据处理管道中正处理的数据保存为通用的数据集格式。支持的数据集格式:'mindrecord',然后可以使用'MindDataset'类来读取保存的'mindrecord'文件
将数据保存为'mindrecord'格式时存在隐式类型转换。转换表展示如何执行类型转换。

View File

@ -32,7 +32,8 @@
- **drop_remainder** (bool, 可选) - 当最后一个批处理数据包含的数据条目小于 `batch_size`是否将该批处理丢弃不传递给下一个操作。默认值False不丢弃。
- **num_parallel_workers** (int, 可选) - 指定 `batch` 操作的并发进程数/线程数(由参数 `python_multiprocessing` 决定当前为多进程模式或多线程模式)。
默认值None使用mindspore.dataset.config中配置的线程数。
- **per_batch_map** (callable, 可选) - 可调用对象,以(list[numpy.ndarray], list[numpy.ndarray], ..., BatchInfo)作为输入参数,
- **per_batch_map** (Callable[[List[numpy.ndarray], ..., List[numpy.ndarray], BatchInfo], (List[numpy.ndarray],
..., List[numpy.ndarray])], optional, 可选) - 可调用对象,以(list[numpy.ndarray], list[numpy.ndarray], ..., BatchInfo)作为输入参数,
处理后返回(list[numpy.ndarray], list[numpy.ndarray],...)作为新的数据列。输入参数中每个list[numpy.ndarray]代表给定数据列中的一批numpy.ndarray
list[numpy.ndarray]的个数应与 `input_columns` 中传入列名的数量相匹配,在返回的(list[numpy.ndarray], list[numpy.ndarray], ...)中,
list[numpy.ndarray]的个数应与输入相同,如果输出列数与输入列数不一致,则需要指定 `output_columns`。该可调用对象的最后一个输入参数始终是BatchInfo

View File

@ -6,6 +6,8 @@ approvers:
- tom__chen
- jonyguo
- tiancixiao
- h.farahat
- dessyang
reviewers:
- luoyang42
- ms_yan

View File

@ -6,6 +6,8 @@ approvers:
- tom__chen
- jonyguo
- tiancixiao
- h.farahat
- dessyang
reviewers:
- luoyang42
- ms_yan

View File

@ -6,6 +6,8 @@ approvers:
- tom__chen
- jonyguo
- tiancixiao
- h.farahat
- dessyang
reviewers:
- luoyang42
- ms_yan

View File

@ -6,6 +6,8 @@ approvers:
- tom__chen
- jonyguo
- tiancixiao
- h.farahat
- dessyang
reviewers:
- luoyang42
- ms_yan

View File

@ -550,8 +550,9 @@ class Dataset:
be dropped and not propagated to the child node.
num_parallel_workers (int, optional): Number of workers(threads) to process the dataset in parallel
(default=None).
per_batch_map (callable, optional): Per batch map callable (default=None). A callable which takes
(list[numpy.ndarray], list[numpy.ndarray], ..., BatchInfo) as input parameters. Each
per_batch_map (Callable[[List[numpy.ndarray], ..., List[numpy.ndarray], BatchInfo], (List[numpy.ndarray],
..., List[numpy.ndarray])], optional): Per batch map callable (default=None). A callable
which takes (list[numpy.ndarray], list[numpy.ndarray], ..., BatchInfo) as input parameters. Each
list[numpy.ndarray] represents a batch of numpy.ndarray on a given column. The number of lists should
match with the number of entries in input_columns. The last parameter of the callable should always be
a BatchInfo object. Per_batch_map should return (list[numpy.ndarray], list[numpy.ndarray], ...). The
@ -695,12 +696,12 @@ class Dataset:
"""
Map `func` to each row in dataset and flatten the result.
The specified `func` is a function that must take one 'Ndarray' as input
and return a 'Dataset'.
The specified `func` is a function that must take one `numpy.ndarray` as input
and return a `Dataset`.
Args:
func (function): A function that must take one 'Ndarray' as an argument and
return a 'Dataset'.
func (function): A function that must take one `numpy.ndarray` as an argument and
return a `Dataset`.
Returns:
Dataset, dataset applied by the function.
@ -1244,8 +1245,8 @@ class Dataset:
Apply a function in this dataset.
Args:
apply_func (function): A function that must take one 'Dataset' as an argument and
return a preprocessed 'Dataset'.
apply_func (function): A function that must take one `Dataset` as an argument and
return a preprocessed `Dataset`.
Returns:
Dataset, dataset applied by the function.
@ -1319,16 +1320,16 @@ class Dataset:
def save(self, file_name, num_files=1, file_type='mindrecord'):
"""
Save the dynamic data processed by the dataset pipeline in common dataset format.
Supported dataset formats: 'mindrecord' only
Supported dataset formats: `mindrecord` only. And you can use `MindDataset` API to read the saved file(s).
Implicit type casting exists when saving data as 'mindrecord'. The transform table shows how to do type casting.
Implicit type casting exists when saving data as `mindrecord`. The transform table shows how to do type casting.
.. list-table:: Implicit Type Casting when Saving as 'mindrecord'
.. list-table:: Implicit Type Casting when Saving as `mindrecord`
:widths: 25 25 50
:header-rows: 1
* - Type in 'dataset'
- Type in 'mindrecord'
* - Type in `dataset`
- Type in `mindrecord`
- Details
* - bool
- None
@ -1400,7 +1401,7 @@ class Dataset:
@check_tuple_iterator
def create_tuple_iterator(self, columns=None, num_epochs=-1, output_numpy=False, do_copy=True):
"""
Create an iterator over the dataset. The datatype retrieved back will be a list of ndarrays.
Create an iterator over the dataset. The datatype retrieved back will be a list of `numpy.ndarray`.
To specify which columns to list and the order needed, use columns_list. If columns_list
is not provided, the order of the columns will remain unchanged.
@ -3859,9 +3860,9 @@ class Schema:
Args:
columns (Union[dict, list[dict], tuple[dict]]): Dataset attribute information, decoded from schema file.
- list[dict], 'name' and 'type' must be in keys, 'shape' optional.
- list[dict], `name` and `type` must be in keys, `shape` optional.
- dict, columns.keys() as name, columns.values() is dict, and 'type' inside, 'shape' optional.
- dict, columns.keys() as name, columns.values() is dict, and `type` inside, `shape` optional.
Raises:
RuntimeError: If failed to parse columns.

View File

@ -6,6 +6,8 @@ approvers:
- tom__chen
- jonyguo
- tiancixiao
- h.farahat
- dessyang
reviewers:
- luoyang42
- ms_yan

View File

@ -59,6 +59,8 @@ approvers:
- HulkTang
- hwcaifubi
- zichun_ye
- h.farahat
- dessyang
reviewers:
- nicholas_yhr
- liubuyu