forked from mindspore-Ecosystem/mindspore
!27715 dataset: fix api description error
Merge pull request !27715 from ms_yan/code_docs_api_format
This commit is contained in:
commit
00da76efd9
|
@ -1,13 +1,13 @@
|
|||
mindspore.dataset.MindDataset
|
||||
==============================
|
||||
|
||||
.. py:class:: mindspore.dataset.MindDataset(dataset_file, columns_list=None, num_parallel_workers=None, shuffle=None, num_shards=None, shard_id=None, sampler=None, padded_sample=None, num_padded=None, num_samples=None, cache=None)
|
||||
.. py:class:: mindspore.dataset.MindDataset(dataset_files, columns_list=None, num_parallel_workers=None, shuffle=None, num_shards=None, shard_id=None, sampler=None, padded_sample=None, num_padded=None, num_samples=None, cache=None)
|
||||
|
||||
读取和解析MindRecord数据文件作为源数据集。生成的数据集的列名和列类型取决于MindRecord文件中的保存的列名与类型。
|
||||
|
||||
**参数:**
|
||||
|
||||
- **dataset_file** (Union[str, list[str]]) - MindRecord文件路径,支持单文件路径字符串、多文件路径字符串列表。如果 `dataset_file` 的类型是字符串,则它代表一组具有相同前缀名的MindRecord文件,同一路径下具有相同前缀名的其他MindRecord文件将会被自动寻找并加载。如果 `dataset_file` 的类型是列表,则它表示所需读取的MindRecord数据文件。
|
||||
- **dataset_files** (Union[str, list[str]]) - MindRecord文件路径,支持单文件路径字符串、多文件路径字符串列表。如果 `dataset_files` 的类型是字符串,则它代表一组具有相同前缀名的MindRecord文件,同一路径下具有相同前缀名的其他MindRecord文件将会被自动寻找并加载。如果 `dataset_files` 的类型是列表,则它表示所需读取的MindRecord数据文件。
|
||||
- **columns_list** (list[str],可选) - 指定从MindRecord文件中读取的数据列(默认为None,读取所有列)。
|
||||
- **num_parallel_workers** (int,可选) - 指定读取数据的工作线程数(默认值None,即使用mindspore.dataset.config中配置的线程数)。
|
||||
- **shuffle** (Union[bool, Shuffle level], 可选) - 每个epoch中数据混洗的模式(默认为为mindspore.dataset.Shuffle.GLOBAL)。如果为False,则不混洗;如果为True,等同于将 `shuffle` 设置为mindspore.dataset.Shuffle.GLOBAL。另外也可以传入枚举变量设置shuffle级别:
|
||||
|
@ -63,7 +63,7 @@
|
|||
**样例:**
|
||||
|
||||
>>> mind_dataset_dir = ["/path/to/mind_dataset_file"] # 此列表可以包含1个或多个MindRecord文件
|
||||
>>> dataset = ds.MindDataset(dataset_file=mind_dataset_dir)
|
||||
>>> dataset = ds.MindDataset(dataset_files=mind_dataset_dir)
|
||||
|
||||
.. include:: mindspore.dataset.Dataset.add_sampler.rst
|
||||
|
||||
|
|
|
@ -46,8 +46,8 @@ class Vocab(cde.Vocab):
|
|||
|
||||
This would collect all unique words in a dataset and return a vocab within
|
||||
the frequency range specified by user in freq_range. User would be warned if no words fall into the frequency.
|
||||
Words in vocab are ordered from highest frequency to lowest frequency. Words with the same frequency would be
|
||||
ordered lexicographically.
|
||||
Words in vocab are ordered from the highest frequency to the lowest frequency. Words with the same frequency
|
||||
would be ordered lexicographically.
|
||||
|
||||
Args:
|
||||
dataset(Dataset): dataset to build vocab from.
|
||||
|
@ -86,7 +86,7 @@ class Vocab(cde.Vocab):
|
|||
|
||||
Args:
|
||||
word_list(list): A list of string where each element is a word of type string.
|
||||
special_tokens(list, optional): A list of strings, each one is a special token. for example
|
||||
special_tokens(list, optional): A list of strings, each one is a special token. For example
|
||||
special_tokens=["<pad>","<unk>"] (default=None, no special tokens will be added).
|
||||
special_first(bool, optional): Whether special_tokens is prepended or appended to vocab. If special_tokens
|
||||
is specified and special_first is set to True, special_tokens will be prepended (default=True).
|
||||
|
@ -112,7 +112,7 @@ class Vocab(cde.Vocab):
|
|||
delimiter (str, optional): A delimiter to break up each line in file, the first element is taken to be
|
||||
the word (default="").
|
||||
vocab_size (int, optional): Number of words to read from file_path (default=None, all words are taken).
|
||||
special_tokens (list, optional): A list of strings, each one is a special token. for example
|
||||
special_tokens (list, optional): A list of strings, each one is a special token. For example
|
||||
special_tokens=["<pad>","<unk>"] (default=None, no special tokens will be added).
|
||||
special_first (bool, optional): Whether special_tokens will be prepended/appended to vocab,
|
||||
If special_tokens is specified and special_first is set to True,
|
||||
|
@ -262,7 +262,7 @@ def to_str(array, encoding='utf8'):
|
|||
|
||||
Args:
|
||||
array (numpy.ndarray): Array of `bytes` type representing strings.
|
||||
encoding (str): Indicating the charset for decoding.
|
||||
encoding (str): Indicating the charset for decoding (default='utf8').
|
||||
|
||||
Returns:
|
||||
numpy.ndarray, NumPy array of `str`.
|
||||
|
@ -286,7 +286,7 @@ def to_bytes(array, encoding='utf8'):
|
|||
|
||||
Args:
|
||||
array (numpy.ndarray): Array of `str` type representing strings.
|
||||
encoding (str): Indicating the charset for encoding.
|
||||
encoding (str): Indicating the charset for encoding (default='utf8').
|
||||
|
||||
Returns:
|
||||
numpy.ndarray, NumPy array of `bytes`.
|
||||
|
|
|
@ -286,7 +286,7 @@ class PadEnd(TensorOperation):
|
|||
Args:
|
||||
pad_shape (list(int)): List of integers representing the shape needed. Dimensions that set to `None` will
|
||||
not be padded (i.e., original dim will be used). Shorter dimensions will truncate the values.
|
||||
pad_value (Union[str, bytes, int, float, bool]), optional): Value used to pad. Default to 0 or empty
|
||||
pad_value (Union[str, bytes, int, float, bool], optional): Value used to pad. Default to 0 or empty
|
||||
string in case of tensors of strings.
|
||||
|
||||
Examples:
|
||||
|
|
Loading…
Reference in New Issue