fix chn api doc

This commit is contained in:
liyong 2022-03-26 16:37:40 +08:00
parent 4954258517
commit 0df7e0edf8
18 changed files with 115 additions and 74 deletions

View File

@ -1,15 +1,14 @@
mindspore.dataset.OBSMindDataset
================================
.. py:class:: mindspore.dataset.OBSMindDataset(dataset_files, server, ak, sk, sync_obs_path, columns_list=None,
shuffle=Shuffle.GLOBAL, num_shards=None, shard_id=None, shard_equal_rows=True)
.. py:class:: mindspore.dataset.OBSMindDataset(dataset_files, server, ak, sk, sync_obs_path, columns_list=None, shuffle=Shuffle.GLOBAL, num_shards=None, shard_id=None, shard_equal_rows=True)
读取和解析存放在OBS上的MindRecord格式数据集。生成的数据集的列名和列类型取决于MindRecord文件中的保存的列名与类型。
**参数:**
- **dataset_files** (list[str]) - OBS上MindRecord格式数据集文件的路径列表每个文件的路径前缀为s3://。
- **server** (str) - 连接OBS的服务地址。可包含协议类型、域名、端口号。示例 <https://your-endpoint:9000>。
- **server** (str) - 连接OBS的服务地址。可包含协议类型、域名、端口号。示例<https://your-endpoint:9000>。
- **ak** (str) - 访问密钥中的AK。
- **sk** (str) - 访问密钥中的SK。
- **sync_obs_path** (str) - 用于同步操作的OBS路径用户需要提前创建目录路径的前缀为s3://。

View File

@ -8,12 +8,12 @@
**参数:**
- **source** (str) - 待转换的CIFAR-100数据集文件的目录路径。
- **destination** (str) - 转换生成的MindRecord文件路径。
- **source** (str) - 待转换的CIFAR-100数据集文件所在目录的路径。
- **destination** (str) - 转换生成的MindRecord文件路径,需提前创建目录并且目录下不能存在同名文件
**异常:**
- **ValueError** - `source``destination` 无效。
- **ValueError** - 参数 `source``destination` 无效。
.. py:method:: run(fields=None)
@ -27,12 +27,12 @@
**返回:**
MSRStatusCIFAR-100数据集是否成功转换为MindRecord格式数据集
MSRStatusSUCCESS或FAILED
.. py:method:: transform(fields=None)
:func:`mindspore.mindrecord.Cifar100ToMR.run` 函数的包装函数来保证异常时正常退出。
:func:`mindspore.mindrecord.Cifar100ToMR.run` 的包装函数来保证异常时正常退出。
**参数:**
@ -41,4 +41,4 @@
**返回:**
MSRStatusCIFAR-100数据集是否成功转换为MindRecord格式数据集
MSRStatusSUCCESS或FAILED

View File

@ -8,8 +8,8 @@
**参数:**
- **source** (str) - 待转换的CIFAR-10数据集文件的目录路径。
- **destination** (str) - 转换生成的MindRecord文件路径。
- **source** (str) - 待转换的CIFAR-10数据集文件所在目录的路径。
- **destination** (str) - 转换生成的MindRecord文件路径,需提前创建目录并且目录下不能存在同名文件
**异常:**
@ -27,12 +27,12 @@
**返回:**
MSRStatusCIFAR-10数据集是否成功转换为MindRecord格式数据集
MSRStatusSUCCESS或FAILED
.. py:method:: transform(fields=None)
:func:`mindspore.mindrecord.Cifar10ToMR.run` 函数的包装函数来保证异常时正常退出。
:func:`mindspore.mindrecord.Cifar10ToMR.run` 的包装函数来保证异常时正常退出。
**参数:**
@ -41,5 +41,5 @@
**返回:**
MSRStatusCIFAR-10数据集是否成功转换为MindRecord格式数据集
MSRStatusSUCCESS或FAILED

View File

@ -9,14 +9,14 @@
**参数:**
- **source** (str) - 待转换的CSV文件路径。
- **destination** (str) - 转换生成的MindRecord文件路径。
- **destination** (str) - 转换生成的MindRecord文件路径,需提前创建目录并且目录下不能存在同名文件
- **columns_list** (list[str],可选) - CSV中待读取数据列的列表。默认值None读取所有的数据列。
- **partition_number** (int可选) - 生成MindRecord的文件个数。默认值1。
**异常:**
- **ValueError** - `source``destination``partition_number` 无效。
- **RuntimeError** - `columns_list` 无效。
- **ValueError** - 参数 `source``destination``partition_number` 无效。
- **RuntimeError** - 参数 `columns_list` 无效。
.. py:method:: run()
@ -25,9 +25,13 @@
**返回:**
MSRStatusCSV数据集是否成功转换为MindRecord格式数据集
MSRStatusSUCCESS或FAILED
.. py:method:: transform()
:func:`mindspore.mindrecord.CsvToMR.run` 函数的包装函数来保证异常时正常退出。
:func:`mindspore.mindrecord.CsvToMR.run` 的包装函数来保证异常时正常退出。
**返回:**
MSRStatusSUCCESS或FAILED。

View File

@ -19,12 +19,12 @@
.. py:method:: add_index(index_fields)
指定schema中的字段作为索引来加速MindRecord文件的读取。schema可以通过 `add_schema` 通过来添加。
指定schema中的字段作为索引来加速MindRecord文件的读取。schema可以通过 `add_schema` 来添加。
.. note::
- 索引字段应为Primitive类型例如 `int``float``str`
- 如果不调用该函数则默认将schema中所有的Primitive类型的字段设置为索引。
请参考类的示例`mindspore.mindrecord.FileWriter`
请参考类的示例 :class:`mindspore.mindrecord.FileWriter`
**参数:**
@ -47,7 +47,7 @@
增加描述用户自定义数据的schema。
.. note::
请参考类的示例`mindspore.mindrecord.FileWriter`
请参考类的示例 :class:`mindspore.mindrecord.FileWriter`
**参数:**
@ -70,7 +70,7 @@
将内存中的数据同步到磁盘,并生成相应的数据库文件。
.. note::
请参考类的示例`mindspore.mindrecord.FileWriter`
请参考类的示例 :class:`mindspore.mindrecord.FileWriter`
**返回:**
@ -126,7 +126,7 @@
**参数:**
- **header_size** (int) - header大小可设置范围为16*1024(16KB)128*1024*1024(128MB)。
- **header_size** (int) - header大小可设置范围为16*1024(16KB)128*1024*1024(128MB)。
**返回:**
@ -144,7 +144,7 @@
**参数:**
- **page_size** (int) - page大小可设置范围为32*1024(32KB)256*1024*1024(256MB)。
- **page_size** (int) - page大小可设置范围为32*1024(32KB)256*1024*1024(256MB)。
**返回:**
@ -161,7 +161,7 @@
根据schema校验用户自定义数据后将数据转换为一系列连续的MindRecord格式的数据集文件。
.. note::
请参考类的示例`mindspore.mindrecord.FileWriter`
请参考类的示例 :class:`mindspore.mindrecord.FileWriter`
**参数:**

View File

@ -18,12 +18,12 @@
n02096294 3
- **image_dir** (str) - ImageNet数据集的目录路径目录中包含类似n02119789、n02100735、n02110185和n02096294的子目录。
- **destination** (str) - 转换生成的MindRecord文件路径
- **destination** (str) - 转换生成的MindRecord文件路径,需提前创建目录并且目录下不能存在同名文件。
- **partition_number** (int可选) - 生成MindRecord的文件个数。默认值1。
**异常:**
- **ValueError** - `map_file``image_dir``destination` 无效。
- **ValueError** - 参数 `map_file``image_dir``destination` 无效。
.. py:method:: run()
@ -31,9 +31,13 @@
**返回:**
MSRStatusImageNet数据集是否成功转换为MindRecord格式数据集
MSRStatusSUCCESS或FAILED
.. py:method:: transform()
:func:`mindspore.mindrecord.ImageNetToMR.run` 函数的包装函数来保证异常时正常退出。
:func:`mindspore.mindrecord.ImageNetToMR.run` 的包装函数来保证异常时正常退出。
**返回:**
MSRStatusSUCCESS或FAILED。

View File

@ -5,7 +5,7 @@
**参数:**
- **file_name** (str) - MindRecord格式的数据集文件或文件列表。
- **file_name** (Union[str, list[str]]) - MindRecord格式的数据集文件或文件列表。
- **num_consumer** (int可选) - 加载数据的并发数。默认值4。不应小于1或大于处理器的核数。
**异常:**

View File

@ -6,12 +6,12 @@
**参数:**
- **source** (str) - 包含t10k-images-idx3-ubyte.gz、train-images-idx3-ubyte.gz、t10k-labels-idx1-ubyte.gz和train-labels-idx1-ubyte.gz数据集文件的目录路径。
- **destination** (str) - 转换生成的MindRecord文件路径。
- **destination** (str) - 转换生成的MindRecord文件路径,需提前创建目录并且目录下不能存在同名文件
- **partition_number** (int可选) - 生成MindRecord的文件个数。默认值1。
**异常:**
- **ValueError** - `source``destination``partition_number` 无效。
- **ValueError** - 参数 `source``destination``partition_number` 无效。
.. py:method:: run()
@ -26,3 +26,7 @@
.. py:method:: transform()
:func:`mindspore.mindrecord.MnistToMR.run` 函数的包装函数来保证异常时正常退出。
**返回:**
MSRStatusSUCCESS或FAILED。

View File

@ -9,9 +9,10 @@
**参数:**
- **source** (str) - 待转换的TFRecord文件路径。
- **destination** (str) - 转换生成的MindRecord文件路径。
- **feature_dict** (dict) - TFRecord的feature类别的字典不支持 `VarLenFeature` 类别。
- **bytes_fields** (list可选) - `feature_dict` 中的字节字段,可以为字节类型的图像字段。
- **destination** (str) - 转换生成的MindRecord文件路径需提前创建目录并且目录下不能存在同名文件。
- **feature_dict** (dict[str, `FixedLenFeature <https://www.tensorflow.org/api_docs/python/tf/io/FixedLenFeature>`_ ]) - TFRecord的feature类别的字典
不支持 `VarLenFeature <https://www.tensorflow.org/api_docs/python/tf/io/VarLenFeature>`_ 类别。
- **bytes_fields** (list[str],可选) - `feature_dict` 中的字节字段,可以为字节类型的图像字段。
**异常:**
@ -25,7 +26,7 @@
**返回:**
MSRStatusTFRecord格式的数据集是否成功转换为MindRecord格式数据集
MSRStatusSUCCESS或FAILED
.. py:method:: tfrecord_iterator()
@ -48,4 +49,8 @@
.. py:method:: transform()
:func:`mindspore.mindrecord.TFRecordToMR.run` 函数的包装函数来保证异常时正常退出。
:func:`mindspore.mindrecord.TFRecordToMR.run` 的包装函数来保证异常时正常退出。
**返回:**
MSRStatusSUCCESS或FAILED。

View File

@ -170,7 +170,7 @@ def _download_work(shard_id, current_idx, local_path, cache, q):
used_disk = get_used_disk_per()
while used_disk > float(config.DISK_THRESHOLD):
logger.info("[{} FUNCTION] Used disk space is {}%, and the disk threshold is {}%.".format(
sys._getframe().f_code.co_name, used_disk*100, config.DISK_THRESHOLD*100)) # pylint: disable=W0212
sys._getframe().f_code.co_name, used_disk*100, float(config.DISK_THRESHOLD)*100)) # pylint: disable=W0212
retry_cnt = 0
has_deleted = _delete_candidate_datasets(
current_idx.value, idx, cache, q, local_path)

View File

@ -29,7 +29,7 @@ class MindPage:
Class to read MindRecord files in pagination.
Args:
file_name (str): One of MindRecord files or a file list.
file_name (Union[str, list[str]]): One of MindRecord files or a file list.
num_consumer(int, optional): The number of reader workers which load data. Default: 4.
It should not be smaller than 1 or larger than the number of processor cores.

View File

@ -43,8 +43,9 @@ class Cifar100ToMR:
www.mindspore.cn/docs/programming_guide/en/master/dataset_conversion.html#converting-the-cifar-10-dataset>`_.
Args:
source (str): the cifar100 directory to be transformed.
destination (str): the MindRecord file path to transform into.
source (str): The cifar100 directory to be transformed.
destination (str): MindRecord file path to transform into, ensure that no file with the same name
exists in the directory.
Raises:
ValueError: If source or destination is invalid.
@ -80,7 +81,7 @@ class Cifar100ToMR:
fields (list[str]): A list of index field, e.g.["fine_label", "coarse_label"]. Default: None.
Returns:
MSRStatus, whether cifar100 is successfully transformed to MindRecord.
MSRStatus, SUCCESS or FAILED.
"""
if fields and not isinstance(fields, list):
raise ValueError("The parameter fields should be None or list")
@ -119,7 +120,7 @@ class Cifar100ToMR:
fields (list[str]): A list of index field, e.g.["fine_label", "coarse_label"]. Default: None.
Returns:
MSRStatus, whether cifar100 is successfully transformed to MindRecord.
MSRStatus, SUCCESS or FAILED.
"""
t = ExceptionThread(target=self.run, kwargs={'fields': fields})
@ -171,7 +172,7 @@ def _generate_mindrecord(file_name, raw_data, fields, schema_desc):
schema_desc (str): String of schema description.
Returns:
MSRStatus, whether successfully written into MindRecord.
MSRStatus, SUCCESS or FAILED.
"""
schema = {"id": {"type": "int64"}, "fine_label": {"type": "int64"},
"coarse_label": {"type": "int64"}, "data": {"type": "bytes"}}

View File

@ -43,8 +43,9 @@ class Cifar10ToMR:
www.mindspore.cn/docs/programming_guide/en/master/dataset_conversion.html#converting-the-cifar-10-dataset>`_.
Args:
source (str): the cifar10 directory to be transformed.
destination (str): the MindRecord file path to transform into.
source (str): The cifar10 directory to be transformed.
destination (str): MindRecord file path to transform into, ensure that no file with the same name
exists in the directory.
Raises:
ValueError: If source or destination is invalid.
@ -80,8 +81,9 @@ class Cifar10ToMR:
fields (list[str], optional): A list of index fields. Default: None.
Returns:
MSRStatus, whether cifar10 is successfully transformed to MindRecord.
MSRStatus, SUCCESS or FAILED.
"""
if fields and not isinstance(fields, list):
raise ValueError("The parameter fields should be None or list")
@ -115,7 +117,7 @@ class Cifar10ToMR:
fields (list[str], optional): A list of index fields. Default: None.
Returns:
MSRStatus, whether cifar10 is successfully transformed to MindRecord.
MSRStatus, SUCCESS or FAILED.
"""
t = ExceptionThread(target=self.run, kwargs={'fields': fields})
@ -138,6 +140,7 @@ def _construct_raw_data(images, labels):
Returns:
list[dict], data dictionary constructed from cifar10.
"""
if not cv2:
raise ModuleNotFoundError("opencv-python module not found, please use pip install it.")
@ -164,8 +167,9 @@ def _generate_mindrecord(file_name, raw_data, fields, schema_desc):
schema_desc (str): String of schema description.
Returns:
MSRStatus, whether successfully written into MindRecord.
MSRStatus, SUCCESS or FAILED.
"""
schema = {"id": {"type": "int64"}, "label": {"type": "int64"},
"data": {"type": "bytes"}}

View File

@ -39,10 +39,11 @@ class CsvToMR:
www.mindspore.cn/docs/programming_guide/en/master/dataset_conversion.html#converting-csv-dataset>`_.
Args:
source (str): the file path of csv.
destination (str): the MindRecord file path to transform into.
source (str): The file path of csv.
destination (str): The MindRecord file path to transform into, ensure that no file with the same name
exists in the directory.
columns_list(list[str], optional): A list of columns to be read. Default: None.
partition_number (int, optional): partition size, Default: 1.
partition_number (int, optional): The partition size, Default: 1.
Raises:
ValueError: If `source`, `destination`, `partition_number` is invalid.
@ -130,7 +131,7 @@ class CsvToMR:
Execute transformation from csv to MindRecord.
Returns:
MSRStatus, whether csv is successfully transformed to MindRecord.
MSRStatus, SUCCESS or FAILED.
"""
if not os.path.exists(self.source):
raise IOError("Csv file {} do not exist.".format(self.source))
@ -178,7 +179,10 @@ class CsvToMR:
def transform(self):
"""
Encapsulate the run function to exit normally
Encapsulate the run function to exit normally.
Returns:
MSRStatus, SUCCESS or FAILED.
"""
t = ExceptionThread(target=self.run)

View File

@ -35,7 +35,7 @@ class ImageNetToMR:
www.mindspore.cn/docs/programming_guide/en/master/dataset_conversion.html#converting-the-imagenet-dataset>`_.
Args:
map_file (str): the map file that indicates label. The map file content should be like this:
map_file (str): The map file that indicates label. The map file content should be like this:
.. code-block::
@ -44,9 +44,10 @@ class ImageNetToMR:
n02110185 2
n02096294 3
image_dir (str): image directory contains n02119789, n02100735, n02110185 and n02096294 directory.
destination (str): the MindRecord file path to transform into.
partition_number (int, optional): partition size. Default: 1.
image_dir (str): Image directory contains n02119789, n02100735, n02110185 and n02096294 directory.
destination (str): MindRecord file path to transform into, ensure that no file with the same name
exists in the directory.
partition_number (int, optional): The partition size. Default: 1.
Raises:
ValueError: If `map_file`, `image_dir` or `destination` is invalid.
@ -129,8 +130,9 @@ class ImageNetToMR:
Execute transformation from imagenet to MindRecord.
Returns:
MSRStatus, whether imagenet is successfully transformed to MindRecord.
MSRStatus, SUCCESS or FAILED.
"""
t0_total = time.time()
imagenet_schema_json = {"label": {"type": "int32"},
@ -179,7 +181,10 @@ class ImageNetToMR:
def transform(self):
"""
Encapsulate the run function to exit normally
Encapsulate the run function to exit normally.
Returns:
MSRStatus, SUCCESS or FAILED.
"""
t = ExceptionThread(target=self.run)

View File

@ -38,11 +38,12 @@ class MnistToMR:
A class to transform from Mnist to MindRecord.
Args:
source (str): directory that contains t10k-images-idx3-ubyte.gz,
source (str): Directory that contains t10k-images-idx3-ubyte.gz,
train-images-idx3-ubyte.gz, t10k-labels-idx1-ubyte.gz
and train-labels-idx1-ubyte.gz.
destination (str): the MindRecord file directory to transform into.
partition_number (int, optional): partition size. Default: 1.
destination (str): MindRecord file path to transform into, ensure that no file with the same name
exists in the directory.
partition_number (int, optional): The partition size. Default: 1.
Raises:
ValueError: If `source`, `destination`, `partition_number` is invalid.
@ -225,8 +226,9 @@ class MnistToMR:
Execute transformation from Mnist to MindRecord.
Returns:
MSRStatus, whether successfully written into MindRecord.
MSRStatus, SUCCESS or FAILED.
"""
if not cv2:
raise ModuleNotFoundError("opencv-python module not found, please use pip install it.")
@ -239,7 +241,10 @@ class MnistToMR:
def transform(self):
"""
Encapsulate the run function to exit normally
Encapsulate the run function to exit normally.
Returns:
MSRStatus, SUCCESS or FAILED.
"""
t = ExceptionThread(target=self.run)

View File

@ -72,11 +72,14 @@ class TFRecordToMR:
www.mindspore.cn/docs/programming_guide/en/master/dataset_conversion.html#converting-tfrecord-dataset>`_.
Args:
source (str): the TFRecord file to be transformed.
destination (str): the MindRecord file path to transform into.
feature_dict (dict): a dictionary that states the feature type, and
`VarLenFeature` is not supported.
bytes_fields (list, optional): the bytes fields which are in `feature_dict` and can be images bytes.
source (str): TFRecord file to be transformed.
destination (str): MindRecord file path to transform into, ensure that no file with the same name
exists in the directory.
feature_dict (dict[str, `FixedLenFeature
<https://www.tensorflow.org/api_docs/python/tf/io/FixedLenFeature>`_ ]): Dictionary that states
the feature type, and `VarLenFeature <https://www.tensorflow.org/api_docs/python/tf/io/VarLenFeature>`_
is not supported.
bytes_fields (list[str], optional): The bytes fields which are in `feature_dict` and can be images bytes.
Default: None.
Raises:
@ -282,7 +285,7 @@ class TFRecordToMR:
Execute transformation from TFRecord to MindRecord.
Returns:
MSRStatus, whether TFRecord is successfully transformed to MindRecord.
MSRStatus, SUCCESS or FAILED.
"""
writer = FileWriter(self.destination)
logger.info("Transformed MindRecord schema is: {}, TFRecord feature dict is: {}"
@ -313,7 +316,10 @@ class TFRecordToMR:
def transform(self):
"""
Encapsulate the run function to exit normally
Encapsulate the run function to exit normally.
Returns:
MSRStatus, SUCCESS or FAILED.
"""
t = ExceptionThread(target=self.run)