fix chinese doc of minddata

2022-11-02 22:20:14 +08:00 · 2022-11-02 22:20:14 +08:00 · 20946982cb
parent e7157bd7e6
commit 20946982cb
17 changed files with 62 additions and 49 deletions
--- a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.Biquad.rst
+++ b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.Biquad.rst
@ -8,9 +8,9 @@ mindspore.dataset.audio.Biquad
    具体的数学公式与参数详见 `数字双二阶滤波器 <https://zh.m.wikipedia.org/wiki/%E6%95%B0%E5%AD%97%E6%BB%A4%E6%B3%A2%E5%99%A8>`_ 。

    参数：
-        - **b0** (float) - 电流输入的分子系数，x[n]。
+        - **b0** (float) - 当前输入的分子系数，x[n]。
        - **b1** (float) - 一个时间间隔前输入的分子系数x[n-1]。
        - **b2** (float) - 两个时间间隔前输入的分子系数x[n-2]。
-        - **a0** (float) - 电流输出y[n]的分母系数，该值不能为零，通常为1。
-        - **a1** (float) - 电流输出y[n-1]的分母系数。
-        - **a2** (float) - 电流输出y[n-2]的分母系数。
+        - **a0** (float) - 当前输出y[n]的分母系数，该值不能为零，通常为1。
+        - **a1** (float) - 当前输出y[n-1]的分母系数。
+        - **a2** (float) - 当前输出y[n-2]的分母系数。
--- a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.DCShift.rst
+++ b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.DCShift.rst
@ -3,7 +3,7 @@ mindspore.dataset.audio.DCShift

 .. py:class:: mindspore.dataset.audio.DCShift(shift, limiter_gain=None)

-    对输入音频波形施加直流移位，可以从音频中删除直流偏移（DC Offset）。
+    对输入音频波形施加直流移位。可以从音频中删除直流偏移（DC Offset）。

    参数：
        - **shift** (float) - 音频的移位量，值必须在[-2.0, 2.0]范围内。
--- a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.DetectPitchFrequency.rst
+++ b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.DetectPitchFrequency.rst
@ -7,7 +7,7 @@ mindspore.dataset.audio.DetectPitchFrequency
    基于归一化互相关函数和中位平滑来实现。

    参数：
-        - **sample_rate** (int) - 波形的采样频率，例如44100(Hz)，值不能为零。
+        - **sample_rate** (int) - 波形的采样频率，如44100 (单位：Hz)，值不能为0。
        - **frame_time** (float, 可选) - 帧的持续时间，值必须大于零。默认值：0.01。
        - **win_length** (int, 可选) - 中位平滑的窗口长度（以帧数为单位），该值必须大于零。默认值：30。
        - **freq_low** (int, 可选) - 可检测的最低频率（Hz），该值必须大于零。默认值：85。
--- a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.EqualizerBiquad.rst
+++ b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.EqualizerBiquad.rst
@ -8,7 +8,7 @@ mindspore.dataset.audio.EqualizerBiquad
    接口实现方式类似于 `SoX库 <http://sox.sourceforge.net/sox.html>`_ 。

    参数：
-        - **sample_rate** (int) - 采样频率（单位：Hz），值不能为零。
+        - **sample_rate** (int) - 波形的采样频率，如44100 (单位：Hz)，值不能为0。
        - **center_freq** (float) - 中心频率（单位：Hz）。
        - **gain** (float) - 期望提升（或衰减）的音频增益（单位：dB）。
        - **Q** (float, 可选) - `品质因子 <https://zh.wikipedia.org/wiki/%E5%93%81%E8%B3%AA%E5%9B%A0%E5%AD%90>`_ ，能够反映带宽与采样频率和中心频率的关系，取值范围为(0, 1]。默认值：0.707。
--- a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.Fade.rst
+++ b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.Fade.rst
@ -8,8 +8,13 @@ mindspore.dataset.audio.Fade
    参数：
        - **fade_in_len** (int, 可选) - 淡入长度（时间帧），必须是非负。默认值：0。
        - **fade_out_len** (int, 可选) - 淡出长度（时间帧），必须是非负。默认值：0。
-        - **fade_shape** (FadeShape, 可选) - 淡入淡出形状，可以是FadeShape.QUARTER_SINE、FadeShape.HALF_SINE、
-          FadeShape.LINEAR、FadeShape.LOGARITHMIC或FadeShape.EXPONENTIAL中的一个。默认值：FadeShape.LINEAR。
+        - **fade_shape** (FadeShape, 可选) - 淡入淡出形状，可以选择FadeShape提供的模式。默认值：FadeShape.LINEAR。
+
+          - FadeShape.QUARTER_SINE，表示淡入淡出形状为四分之一正弦模式。
+          - FadeShape.HALF_SINE，表示淡入形状为半正弦模式。
+          - FadeShape.LINEAR，表示淡入淡出形状为线性模式。
+          - FadeShape.LOGARITHMIC，表示淡入淡出形状为对数模式。
+          - FadeShape.EXPONENTIAL，表示淡入淡出形状为指数模式。

    异常：
        - **RuntimeError** - 如果 `fade_in_len` 超过音频波形长度。
--- a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.Gain.rst
+++ b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.Gain.rst
@ -3,7 +3,7 @@ mindspore.dataset.audio.Gain

 .. py:class:: mindspore.dataset.audio.Gain(gain_db=1.0)

-    放大或衰减音频波形。
+    放大或衰减整个音频波形。

    参数：
        - **gain_db** (float) - 增益调整，单位为分贝（dB）。默认值：1.0。
--- a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.GriffinLim.rst
+++ b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.GriffinLim.rst
@ -3,12 +3,10 @@ mindspore.dataset.audio.GriffinLim

 .. py:class:: mindspore.dataset.audio.GriffinLim(n_fft=400, n_iter=32, win_length=None, hop_length=None, window_type=WindowType.HANN, power=2, momentum=0.99, length=None, rand_init=True)

-    使用GriffinLim算法对音频波形进行近似幅度谱图反演。
+    使用Griffin-Lim算法从线性幅度频谱图中计算信号波形。

-    .. math::
-        x(n)=\frac{\sum_{m=-\infty}^{\infty} w(m S-n) y_{w}(m S, n)}{\sum_{m=-\infty}^{\infty} w^{2}(m S-n)}
-
-    其中w表示窗口函数，y表示每个帧的重建信号，x表示整个信号。
+    有关Griffin-Lim算法更多的描述，详见论文 `A fast Griffin-Lim algorithm <https://doi.org/10.1109/WASPAA.2013.6701851>`_
+    与 `Signal estimation from modified short-time Fourier transform <https://doi.org/10.1109/ICASSP.1983.1172092>`_ 。

    参数：
        - **n_fft** (int, 可选) - FFT的长度。默认值：400。
@ -22,4 +20,7 @@ mindspore.dataset.audio.GriffinLim
        - **momentum** (float, 可选) - 快速Griffin-Lim的动量。默认值：0.99。
        - **length** (int, 可选) - 预期输出波形的长度。默认值：None，将设置为stft矩阵的最后一个维度的值。
        - **rand_init** (bool, 可选) - 随机相位初始化或全零相位初始化标志。默认值：True。
-    
+    
+    异常：
+        - **RuntimeError** - 当 `n_fft` 指定的FFT长度不小于 `length` 指定的输出波形长度。
+        - **RuntimeError** - 当 `win_length` 指定的窗口长度不小于 `n_fft` 指定的FFT长度。
--- a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.HighpassBiquad.rst
+++ b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.HighpassBiquad.rst
@ -8,6 +8,9 @@ mindspore.dataset.audio.HighpassBiquad
    接口实现方式类似于 `SoX库 <http://sox.sourceforge.net/sox.html>`_ 。

    参数：
-        - **sample_rate** (int) - 采样频率（单位：Hz），不能为零。
+        - **sample_rate** (int) - 波形的采样频率，如44100（单位：Hz），不能为零。
        - **cutoff_freq** (float) - 中心频率（单位：Hz）。
-        - **Q** (float, 可选) - `品质因子 <https://zh.wikipedia.org/wiki/%E5%93%81%E8%B3%AA%E5%9B%A0%E5%AD%90>`_ ，能够反映带宽与采样频率和中心频率的关系，取值范围为(0, 1]。默认值：0.707。
+        - **Q** (float, 可选) - `品质因子 <https://zh.wikipedia.org/wiki/%E5%93%81%E8%B3%AA%E5%9B%A0%E5%AD%90>`_ ，取值范围为(0, 1]。默认值：0.707。
+
+    异常：
+        - **RuntimeError** - 当输入音频的shape不为(..., time)。
--- a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.Interpolation.rst
+++ b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.Interpolation.rst
@ -7,5 +7,5 @@ mindspore.dataset.audio.Interpolation

    可选的枚举值包括：Interpolation.LINEAR和Interpolation.QUADRATIC。

-    - **Interpolation.LINEAR** - 插值类型为线性。
-    - **Interpolation.QUADRATIC** - 插值类型为二次型。
+    - **Interpolation.LINEAR** - 插值模式为线性。
+    - **Interpolation.QUADRATIC** - 插值模式为二次型。
--- a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.InverseMelScale.rst
+++ b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.InverseMelScale.rst
@ -3,7 +3,7 @@ mindspore.dataset.audio.InverseMelScale

 .. py:class:: mindspore.dataset.audio.InverseMelScale(n_stft, n_mels=128, sample_rate=16000, f_min=0.0, f_max=None, max_iter=100000, tolerance_loss=1e-5, tolerance_change=1e-8, sgdargs=None, norm=NormType.NONE, mel_type=MelType.HTK)

-    使用转换矩阵求解STFT，形成梅尔频率的STFT。
+    使用转换矩阵从梅尔频率STFT求解普通频率的STFT。

    参数：
        - **n_stft** (int) - STFT中的频段数。
--- a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.Magphase.rst
+++ b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.Magphase.rst
@ -3,10 +3,10 @@ mindspore.dataset.audio.Magphase

 .. py:class:: mindspore.dataset.audio.Magphase(power=1.0)

-    将具有（..., 2）形状的复值光谱图分离，输出幅度和相位。
+    将shape为(..., 2)的复值光谱图分离，输出幅度和相位。

    参数：
-        - **power** (float) - 范数的功率，必须是非负的。默认值：1.0。
+        - **power** (float) - 范数的幂，必须是非负的。默认值：1.0。
    
    异常：
-        - **RuntimeError** - 当输入音频的shape不为<..., 2>。
+        - **RuntimeError** - 当输入音频的shape不为(..., 2)。
--- a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.MaskAlongAxis.rst
+++ b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.MaskAlongAxis.rst
@ -8,7 +8,7 @@ mindspore.dataset.audio.MaskAlongAxis
    参数：
        - **mask_start** (int) - 掩码的起始位置，必须是非负的。
        - **mask_width** (int) - 掩码的宽度，必须是大于0。
-        - **mask_value** (float) - 掩码值。
+        - **mask_value** (float) - 填充到掩码区间的值。
        - **axis** (int) - 要应用掩码的轴（1表示频率，2表示时间）。

    异常：
--- a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.MaskAlongAxisIID.rst
+++ b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.MaskAlongAxisIID.rst
@ -3,12 +3,12 @@ mindspore.dataset.audio.MaskAlongAxisIID

 .. py:class:: mindspore.dataset.audio.MaskAlongAxisIID(mask_param, mask_value, axis)

-    对音频波形应用掩码。掩码的起始和长度由 `[mask_start, mask_start + mask_width)` 决定，其中 `mask_width` 从 `uniform[0, mask_param]` 中采样， `mask_start` 从 `uniform[0, max_length - mask_width]` 中采样，
+    对音频波形沿 `axis` 轴应用掩码。掩码的起始和长度由 `[mask_start, mask_start + mask_width)` 决定，其中 `mask_width` 从 `uniform[0, mask_param]` 中采样， `mask_start` 从 `uniform[0, max_length - mask_width]` 中采样，
    `max_length` 是光谱图中特定轴的列数。

    参数：
        - **mask_param** (int) - 要屏蔽的列数，将从[0, mask_param]统一采样，必须是非负数。
-        - **mask_value** (float) - 掩码值。
+        - **mask_value** (float) - 填充到掩码区间的值。
        - **axis** (int) - 要应用掩码的轴（1表示频率，2表示时间）。

    异常：
--- a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.MelScale.rst
+++ b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.MelScale.rst
@ -1,15 +1,16 @@
 mindspore.dataset.audio.MelScale
 ================================

-.. py:class:: mindspore.dataset.audio.MelScale(n_mels=128, sample_rate=16000, f_min=0, f_max=None, n_stft=201, norm=NormType.NONE, mel_type=MelType.HTK)
+.. py:class:: mindspore.dataset.audio.MelScale(n_mels=128, sample_rate=16000, f_min=0.0, f_max=None, n_stft=201, norm=NormType.NONE, mel_type=MelType.HTK)

-    将正常STFT转换为梅尔尺度的STFT。
+    将普通STFT转换为梅尔尺度的STFT。

    参数：
        - **n_mels** (int, 可选) - 梅尔滤波器的数量。默认值：128。
-        - **sample_rate** (int, 可选) - 音频信号采样速率。默认值：16000。
+        - **sample_rate** (int, 可选) - 音频信号采样速率。默认值：16000（单位：Hz）。
        - **f_min** (float, 可选) - 最小频率。默认值：0.0。
        - **f_max** (float, 可选) - 最大频率。默认值：None，将设置为 `sample_rate//2` 。
        - **n_stft** (int, 可选) - STFT中的频段数。默认值：201。
        - **norm** (NormType, 可选) - 标准化方法，可以是NormType.SLANEY或NormType.NONE。默认值：NormType.NONE。
+          若采用NormType.SLANEY，则三角梅尔权重将被除以梅尔频带的宽度。
        - **mel_type** (MelType, 可选) - 要使用的Mel比例，可以是MelType.SLAN或MelType.HTK。默认值：MelType.HTK。
--- a/docs/api/api_python/dataset_audio/mindspore.dataset.audio.create_dct.rst
+++ b/docs/api/api_python/dataset_audio/mindspore.dataset.audio.create_dct.rst
@ -11,4 +11,4 @@ mindspore.dataset.audio.create_dct
        - **norm** (NormMode, 可选) - 标准化模式，可以是NormMode.NONE或NormMode.ORTHO。默认值：NormMode.NONE。

    返回：
-        numpy.ndarray，DCT转换矩阵。
+        numpy.ndarray，shape为 ( `n_mels`, `n_mfcc` ) 的DCT转换矩阵。
--- a/mindspore/python/mindspore/dataset/audio/transforms.py
+++ b/mindspore/python/mindspore/dataset/audio/transforms.py
@ -619,7 +619,7 @@ class DBToAmplitude(AudioTensorOperation):

 class DCShift(AudioTensorOperation):
    """
-    Apply a DC shift to the audio.
+    Apply a DC shift to the audio. This can be useful to remove DC offset from audio.

    Args:
        shift (float): The amount to shift the audio, the value must be in the range [-2.0, 2.0].
@ -807,9 +807,8 @@ class Fade(AudioTensorOperation):
    Args:
        fade_in_len (int, optional): Length of fade-in (time frames), which must be non-negative. Default: 0.
        fade_out_len (int, optional): Length of fade-out (time frames), which must be non-negative. Default: 0.
-        fade_shape (FadeShape, optional): Shape of fade. Default: FadeShape.LINEAR. Can be one of
-            FadeShape.QUARTER_SINE, FadeShape.HALF_SINE, FadeShape.LINEAR, FadeShape.LOGARITHMIC or
-            FadeShape.EXPONENTIAL.
+        fade_shape (FadeShape, optional): Shape of fade, five different types can be chosen as defined in FadeShape.
+            Default: FadeShape.LINEAR.

            -FadeShape.QUARTER_SINE, means it tend to 0 in an quarter sin function.

@ -1042,13 +1041,10 @@ class Gain(AudioTensorOperation):

 class GriffinLim(AudioTensorOperation):
    r"""
-    Approximate magnitude spectrogram inversion using the GriffinLim algorithm.
+    Compute waveform from a linear scale magnitude spectrogram using the Griffin-Lim transformation.

-    .. math::
-        x(n)=\frac{\sum_{m=-\infty}^{\infty} w(m S-n) y_{w}(m S, n)}{\sum_{m=-\infty}^{\infty} w^{2}(m S-n)}
-
-    where w represents the window function, y represents the reconstructed signal of each frame and x represents the
-    whole signal.
+    About Griffin-Lim please refer to `A fast Griffin-Lim algorithm <https://doi.org/10.1109/WASPAA.2013.6701851>`_
+    and `Signal estimation from modified short-time Fourier transform <https://doi.org/10.1109/ICASSP.1983.1172092>`_.

    Args:
        n_fft (int, optional): Size of FFT. Default: 400.
@ -1065,6 +1061,10 @@ class GriffinLim(AudioTensorOperation):
        rand_init (bool, optional): Flag for random phase initialization or all-zero phase initialization.
            Default: True.

+    Raises:
+        RuntimeError: If `n_fft` is not less than `length`.
+        RuntimeError: If `win_length` is not less than `n_fft`.
+
    Examples:
        >>> import numpy as np
        >>>
@ -1075,7 +1075,7 @@ class GriffinLim(AudioTensorOperation):
    """

    @check_griffin_lim
-    def __init__(self, n_fft=400, n_iter=32, win_length=None, hop_length=None, window_type=WindowType.HANN, power=2,
+    def __init__(self, n_fft=400, n_iter=32, win_length=None, hop_length=None, window_type=WindowType.HANN, power=2.0,
                 momentum=0.99, length=None, rand_init=True):
        super().__init__()
        self.n_fft = n_fft
@ -1105,6 +1105,9 @@ class HighpassBiquad(AudioTensorOperation):
        cutoff_freq (float): Filter cutoff frequency (in Hz).
        Q (float, optional): Quality factor, https://en.wikipedia.org/wiki/Q_factor, range: (0, 1]. Default: 0.707.

+    Raises:
+        RuntimeError: If the shape of input audio waveform does not match (..., time).
+
    Examples:
        >>> import numpy as np
        >>>
@ -1127,14 +1130,14 @@ class HighpassBiquad(AudioTensorOperation):

 class InverseMelScale(AudioTensorOperation):
    """
-    Solve for a normal STFT form a mel frequency STFT, using a conversion matrix.
+    Solve for a normal STFT from a mel frequency STFT, using a conversion matrix.

    Args:
        n_stft (int): Number of bins in STFT.
        n_mels (int, optional): Number of mel filterbanks. Default: 128.
        sample_rate (int, optional): Sample rate of audio signal. Default: 16000.
        f_min (float, optional): Minimum frequency. Default: 0.0.
-        f_max (float, optional): Maximum frequency. Default: None, will be set to sample_rate // 2.
+        f_max (float, optional): Maximum frequency. Default: None, will be set to `sample_rate // 2`.
        max_iter (int, optional): Maximum number of optimization iterations. Default: 100000.
        tolerance_loss (float, optional): Value of loss to stop optimization at. Default: 1e-5.
        tolerance_change (float, optional): Difference in losses to stop optimization at. Default: 1e-8.
@ -1281,7 +1284,7 @@ class Magphase(AudioTensorOperation):
        power (float): Power of the norm, which must be non-negative. Default: 1.0.

    Raises:
-        RuntimeError: If the shape of input audio waveform does not match <..., 2>.
+        RuntimeError: If the shape of input audio waveform does not match (..., 2).

    Examples:
        >>> import numpy as np
@ -1309,7 +1312,7 @@ class MaskAlongAxis(AudioTensorOperation):
        mask_start (int): Starting position of the mask, which must be non negative.
        mask_width (int): The width of the mask, which must be larger than 0.
        mask_value (float): Value to assign to the masked columns.
-        axis (int): Axis to apply masking on (1 for frequency and 2 for time).
+        axis (int): Axis to apply mask on (1 for frequency and 2 for time).

    Raises:
        ValueError: If `mask_start` is invalid (< 0).
@ -1347,7 +1350,7 @@ class MaskAlongAxisIID(AudioTensorOperation):
        mask_param (int): Number of columns to be masked, will be uniformly sampled from
            [0, mask_param], must be non negative.
        mask_value (float): Value to assign to the masked columns.
-        axis (int): Axis to apply masking on (1 for frequency and 2 for time).
+        axis (int): Axis to apply mask on (1 for frequency and 2 for time).

    Raises:
        TypeError: If `mask_param` is not of type int.
@ -1392,7 +1395,7 @@ class MelScale(AudioTensorOperation):
        n_mels (int, optional): Number of mel filterbanks. Default: 128.
        sample_rate (int, optional): Sample rate of audio signal. Default: 16000.
        f_min (float, optional): Minimum frequency. Default: 0.
-        f_max (float, optional): Maximum frequency. Default: None, will be set to sample_rate // 2.
+        f_max (float, optional): Maximum frequency. Default: None, will be set to `sample_rate // 2`.
        n_stft (int, optional): Number of bins in STFT. Default: 201.
        norm (NormType, optional): Type of norm, value should be NormType.SLANEY or NormType::NONE.
            If norm is NormType.SLANEY, divide the triangular mel weight by the width of the mel band.
--- a/mindspore/python/mindspore/dataset/engine/datasets_audio.py
+++ b/mindspore/python/mindspore/dataset/engine/datasets_audio.py
@ -944,7 +944,7 @@ class YesNoDataset(MappableDataset, AudioBaseDataset):
        shuffle (bool, optional): Whether or not to perform shuffle on the dataset.
            Default: None, expected order behavior shown in the table.
        sampler (Sampler, optional): Object used to choose samples from the
-            dataset. Default: None, expected order behavior shown in the table.
+            dataset. Default: None, expected order behavior shown in the table below.
        num_shards (int, optional): Number of shards that the dataset will be divided into. Default: None.
            When this argument is specified, `num_samples` reflects the maximum sample number of per shard.
        shard_id (int, optional): The shard ID within `num_shards`. Default: None. This argument can only