!6759 Add music auto tagging network in modelzoo

Merge pull request !6759 from jiangzhenguang/add_music_auto_tagging
2020-09-24 16:07:10 +08:00 · 2020-09-24 16:07:10 +08:00 · 2b6a88e09e
parent 3cf6cb6ddb f4ccfd4380
commit 2b6a88e09e
14 changed files with 1049 additions and 0 deletions
--- a/model_zoo/official/audio/music_auto_tagging/README.md
+++ b/model_zoo/official/audio/music_auto_tagging/README.md
@ -0,0 +1,203 @@
 # Contents
 - [Music Auto Tagging Description](#fcn-4-description)
 - [Model Architecture](#model-architecture)
 - [Features](#features)
    - [Mixed Precision](#mixed-precision)
 - [Environment Requirements](#environment-requirements)
 - [Quick Start](#quick-start)    
 - [Script Description](#script-description)
    - [Script and Sample Code](#script-and-sample-code)
    - [Script Parameters](#script-parameters)
    - [Training Process](#training-process)
        - [Training](#training)
    - [Evaluation Process](#evaluation-process)
        - [Evaluation](#evaluation)
 - [Model Description](#model-description)
    - [Performance](#performance)  
        - [Evaluation Performance](#evaluation-performance)
 - [ModelZoo Homepage](#modelzoo-homepage)
 # [Music Auto Tagging Description](#contents)
 This repository provides a script and recipe to train the Music Auto Tagging model to achieve state-of-the-art accuracy.
 [Paper](https://arxiv.org/abs/1606.00298):  `"Keunwoo Choi, George Fazekas, and Mark Sandler, “Automatic tagging using deep convolutional neural networks,” in International Society of Music Information Retrieval Conference. ISMIR, 2016."
 # [Model Architecture](#contents)
 Music Auto Tagging is a convolutional neural network architecture, its name Music Auto Tagging comes from the fact that it has 4 layers. Its layers consists of Convolutional layers, Max Pooling layers, Activation layers, Fully connected layers.
 # [Features](#contents)
 ## Mixed Precision
 The [mixed precision](https://www.mindspore.cn/tutorial/zh-CN/master/advanced_use/mixed_precision.html) training method accelerates the deep learning neural network training process by using both the single-precision and half-precision data formats, and maintains the network precision achieved by the single-precision training at the same time. Mixed precision training can accelerate the computation process, reduce memory usage, and enable a larger model or batch size to be trained on specific hardware. 
 For FP16 operators, if the input data type is FP32, the backend of MindSpore will automatically handle it with reduced precision. Users could check the reduced-precision operators by enabling INFO log and then searching ‘reduce precision’.
 # [Environment Requirements](#contents)
 - Hardware（Ascend
  - If you want to try Ascend  , please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources. 
 - Framework
  - [MindSpore](https://www.mindspore.cn/install/en)
 - For more information, please check the resources below：
  - [MindSpore tutorials](https://www.mindspore.cn/tutorial/zh-CN/master/index.html) 
  - [MindSpore API](https://www.mindspore.cn/api/zh-CN/master/index.html)
 # [Quick Start](#contents)
 After installing MindSpore via the official website, you can start training and evaluation as follows: 
 ### 1. Download and preprocess the dataset
 1. down load the classification dataset (for instance, MagnaTagATune Dataset, Million Song Dataset, etc)
 2. Extract the dataset
 3. The information file of each clip should contain the label and path. Please refer to the annotations_final.csv in MagnaTagATune Dataset.
 4. The provided pre-processing script use MagnaTagATune Dataset as an example. Please modify the code accprding to your own need.
 ### 2. setup parameters (src/config.py)
 ### 3. Train
 after having your dataset, first convert the audio clip into mindrecord dataset by using the following codes
 ```shell
 python pre_process_data.py --device_id 0
 ```
 Then, you can start training the model by using the following codes
 ```shell
 SLOG_PRINT_TO_STDOUT=1 python train.py --device_id 0
 ```
 ### 4. Test
 Then you can test your model
 ```shell
 SLOG_PRINT_TO_STDOUT=1 python eval.py --device_id 0
 ```
 # [Script Description](#contents)
 ## [Script and Sample Code](#contents)
 ```
 ├── model_zoo
    ├── README.md                          // descriptions about all the models
    ├── music_auto_tagging        
        ├── README.md                    // descriptions about googlenet
        ├── scripts 
        │   ├──run_train.sh             // shell script for distributed on Ascend
        │   ├──run_eval.sh              // shell script for evaluation on Ascend
        │   ├──run_process_data.sh      // shell script for convert audio clips to mindrecord
        ├── src
        │   ├──dataset.py                     // creating dataset
        │   ├──pre_process_data.py            // pre-process dataset
        │   ├──musictagger.py                 // googlenet architecture
        │   ├──config.py                      // parameter configuration
        │   ├──loss.py                        // loss function
        │   ├──tag.txt                        // tag for each number
        ├── train.py               // training script 
        ├── eval.py                //  evaluation script 
        ├── export.py              //  export model in air format 
 ```
 ## [Script Parameters](#contents)
 Parameters for both training and evaluation can be set in config.py
 - config for Music Auto tagging
  ```python
  'num_classes': 50                      # number of tagging classes
  'num_consumer': 4                      # file number for mindrecord
  'get_npy': 1 # mode for converting to npy, default 1 in this case
  'get_mindrecord': 1 # mode for converting npy file into mindrecord file，default 1 in this case
  'audio_path': "/dev/data/Music_Tagger_Data/fea/" # path to audio clips
  'npy_path': "/dev/data/Music_Tagger_Data/fea/" # path to numpy
  'info_path': "/dev/data/Music_Tagger_Data/fea/" # path to info_name, which provide the label of each audio clips
  'info_name': 'annotations_final.csv'   # info_name
  'device_target': 'Ascend'              # device running the program
  'device_id': 0                         # device ID used to train or evaluate the dataset. Ignore it when you use run_train.sh for distributed training
  'mr_path': '/dev/data/Music_Tagger_Data/fea/' # path to mindrecord
  'mr_name': ['train', 'val']            # mindrecord name
  'pre_trained': False                   # whether training based on the pre-trained model
  'lr': 0.0005                           # learning rate
  'batch_size': 32                       # training batch size
  'epoch_size': 10                       # total training epochs
  'loss_scale': 1024.0                   # loss scale
  'num_consumer': 4                      # file number for mindrecord
  'mixed_precision': False               # if use mix precision calculation
  'train_filename': 'train.mindrecord0'  # file name of the train mindrecord data 
  'val_filename': 'val.mindrecord0'      # file name of the evaluation mindrecord data 
  'data_dir': '/dev/data/Music_Tagger_Data/fea/' # directory of mindrecord data
  'device_target': 'Ascend'              # device running the program
  'device_id': 0,                        # device ID used to train or evaluate the dataset. Ignore it when you use run_train.sh for distributed training
  'keep_checkpoint_max': 10,             # only keep the last keep_checkpoint_max checkpoint
  'save_step': 2000,                     # steps for saving checkpoint
  'checkpoint_path': '/dev/data/Music_Tagger_Data/model/',  # the absolute full path to save the checkpoint file
  'prefix': 'MusicTagger',               # prefix of checkpoint
  'model_name': 'MusicTagger_3-50_543.ckpt', # checkpoint name
  ```
 ## [Training Process](#contents)
 ### Training 
 - running on Ascend
  ```
  python train.py > train.log 2>&1 & 
  ```
  The python command above will run in the background, you can view the results through the file `train.log`.
  After training, you'll get some checkpoint files under the script folder by default. The loss value will be achieved as follows:
  ```
  # grep "loss is " train.log
  epoch: 1 step: 100, loss is 0.23264095
  epoch: 1 step: 200, loss is 0.2013525
  ...
  ```
  The model checkpoint will be saved in the set directory. 
 ## [Evaluation Process](#contents)
 ### Evaluation
 # [Model Description](#contents)
 ## [Performance](#contents)
 ### Evaluation Performance 
 | Parameters                 | Ascend                                                      |
 | -------------------------- | ----------------------------------------------------------- |
 | Model Version              | FCN-4                                                       |
 | Resource                   | Ascend 910 ；CPU 2.60GHz，56cores；Memory，314G             |
 | uploaded Date              | 09/11/2020 (month/day/year)                                 |
 | MindSpore Version          | r0.7.0                                                |
 | Training Parameters        | epoch=10, steps=534, batch_size = 32, lr=0.005              |
 | Optimizer                  | Adam                                                        |
 | Loss Function              | Binary cross entropy                                        |
 | outputs                    | probability                                                 |
 | Loss                       | AUC 0.909                                                  |
 | Speed                      | 1pc: 160 samples/sec;                                       |
 | Total time                 | 1pc: 20 mins;                                               |
 | Checkpoint for Fine tuning | 198.73M(.ckpt file)                                         |
 | Scripts                    | [music_auto_tagging script](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/audio/music_auto_tagging) |
 # [ModelZoo Homepage](#contents)  
 Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo).  
--- a/model_zoo/official/audio/music_auto_tagging/eval.py
+++ b/model_zoo/official/audio/music_auto_tagging/eval.py
@ -0,0 +1,137 @@
 # Copyright 2020 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
 '''
 ##############evaluate trained models#################
 python eval.py
 '''
 import argparse
 import numpy as np
 import mindspore.common.dtype as mstype
 from mindspore import context
 from mindspore import Tensor
 from mindspore.train.serialization import load_checkpoint, load_param_into_net
 from src.musictagger import MusicTaggerCNN
 from src.config import music_cfg as cfg
 from src.dataset import create_dataset
 def calculate_auc(labels_list, preds_list):
    """
    The AUC calculation function
    Input:
            labels_list: list of true label
            preds_list:  list of predicted label
    Outputs
            Float, means of AUC
    """
    auc = []
    n_bins = labels_list.shape[0] // 2
    if labels_list.ndim == 1:
        labels_list = labels_list.reshape(-1, 1)
        preds_list = preds_list.reshape(-1, 1)
    for i in range(labels_list.shape[1]):
        labels = labels_list[:, i]
        preds = preds_list[:, i]
        postive_len = labels.sum()
        negative_len = labels.shape[0] - postive_len
        total_case = postive_len * negative_len
        positive_histogram = np.zeros((n_bins))
        negative_histogram = np.zeros((n_bins))
        bin_width = 1.0 / n_bins
        for j, _ in enumerate(labels):
            nth_bin = int(preds[j] // bin_width)
            if labels[j]:
                positive_histogram[nth_bin] = positive_histogram[nth_bin] + 1
            else:
                negative_histogram[nth_bin] = negative_histogram[nth_bin] + 1
        accumulated_negative = 0
        satisfied_pair = 0
        for k in range(n_bins):
            satisfied_pair += (
                positive_histogram[k] * accumulated_negative +
                positive_histogram[k] * negative_histogram[k] * 0.5)
            accumulated_negative += negative_histogram[k]
        auc.append(satisfied_pair / total_case)
    return np.mean(auc)
 def val(net, data_dir, filename, num_consumer=4, batch=32):
    """
    Validation function, estimate the performance of trained model
    Input:
            net:        the trained neural network
            data_dir:       path to the validation dataset
            filename:       name of the validation dataset
            num_consumer:   split number of validation dataset
            batch:          validation batch size
    Outputs
            Float, AUC
    """
    data_train = create_dataset(data_dir, filename, 32, ['feature', 'label'],
                                num_consumer)
    data_train = data_train.create_tuple_iterator()
    res_pred = []
    res_true = []
    for data, label in data_train:
        x = net(Tensor(data, dtype=mstype.float32))
        res_pred.append(x.asnumpy())
        res_true.append(label.asnumpy())
    res_pred = np.concatenate(res_pred, axis=0)
    res_true = np.concatenate(res_true, axis=0)
    auc = calculate_auc(res_true, res_pred)
    return auc
 def validation(net, model_path, data_dir, filename, num_consumer, batch):
    param_dict = load_checkpoint(model_path)
    load_param_into_net(net, param_dict)
    auc = val(net, data_dir, filename, num_consumer, batch)
    return auc
 if __name__ == "__main__":
    parser = argparse.ArgumentParser(description='Evaluate model')
    parser.add_argument('--device_id',
                        type=int,
                        help='device ID',
                        default=None)
    args = parser.parse_args()
    if args.device_id is not None:
        context.set_context(device_target=cfg.device_target,
                            mode=context.GRAPH_MODE,
                            device_id=args.device_id)
    else:
        context.set_context(device_target=cfg.device_target,
                            mode=context.GRAPH_MODE,
                            device_id=cfg.device_id)
    network = MusicTaggerCNN(in_classes=[1, 128, 384, 768, 2048],
                             kernel_size=[3, 3, 3, 3, 3],
                             padding=[0] * 5,
                             maxpool=[(2, 4), (4, 5), (3, 8), (4, 8)],
                             has_bias=True)
    network.set_train(False)
    auc_val = validation(network, cfg.checkpoint_path + "/" + cfg.model_name, cfg.data_dir,
                         cfg.val_filename, cfg.num_consumer, cfg.batch_size)
    print("=" * 10 + "Validation Peformance" + "=" * 10)
    print("AUC: {:.5f}".format(auc_val))
--- a/model_zoo/official/audio/music_auto_tagging/export.py
+++ b/model_zoo/official/audio/music_auto_tagging/export.py
@ -0,0 +1,40 @@
 # Copyright 2020 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
 '''
 ##############evaluate trained models#################
 python export.py
 '''
 import numpy as np
 from mindspore.train.serialization import export
 from mindspore import Tensor
 from mindspore.train.serialization import load_checkpoint, load_param_into_net
 from src.musictagger import MusicTaggerCNN
 from src.config import music_cfg as cfg
 if __name__ == "__main__":
    network = MusicTaggerCNN(in_classes=[1, 128, 384, 768, 2048],
                             kernel_size=[3, 3, 3, 3, 3],
                             padding=[0] * 5,
                             maxpool=[(2, 4), (4, 5), (3, 8), (4, 8)],
                             has_bias=True)
    param_dict = load_checkpoint(cfg.checkpoint_path + "/" + cfg.model_name)
    load_param_into_net(network, param_dict)
    input_data = np.random.uniform(0.0, 1.0, size=[1, 1, 96, 1366]).astype(np.float32)
    export(network,
           Tensor(input_data),
           filename="{}/{}.air".format(cfg.checkpoint_path,
                                       cfg.model_name[:-5]),
           file_format="AIR")
--- a/model_zoo/official/audio/music_auto_tagging/scripts/run_eval.sh
+++ b/model_zoo/official/audio/music_auto_tagging/scripts/run_eval.sh
@ -0,0 +1,18 @@
 #!/bin/bash
 # Copyright 2020 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
 export SLOG_PRINT_TO_STDOUT=1
 python ../eval.py --device_id 0
--- a/model_zoo/official/audio/music_auto_tagging/scripts/run_process_data.sh
+++ b/model_zoo/official/audio/music_auto_tagging/scripts/run_process_data.sh
@ -0,0 +1,18 @@
 #!/bin/bash
 # Copyright 2020 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
 export SLOG_PRINT_TO_STDOUT=1
 python ../src/pre_process_data.py --device_id 0
--- a/model_zoo/official/audio/music_auto_tagging/scripts/run_train.sh
+++ b/model_zoo/official/audio/music_auto_tagging/scripts/run_train.sh
@ -0,0 +1,18 @@
 #!/bin/bash
 # Copyright 2020 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
 export SLOG_PRINT_TO_STDOUT=1
 python ../train.py --device_id 0
--- a/model_zoo/official/audio/music_auto_tagging/src/init.py
+++ b/model_zoo/official/audio/music_auto_tagging/src/init.py
@ -0,0 +1,23 @@
 # Copyright 2020 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
 """
 __init__.py
 """
 from . import musictagger
 from . import loss
 from . import dataset
 from . import config
 from . import pre_process_data
--- a/model_zoo/official/audio/music_auto_tagging/src/config.py
+++ b/model_zoo/official/audio/music_auto_tagging/src/config.py
@ -0,0 +1,53 @@
 # Copyright 2020 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
 """
 network config setting, will be used in train.py, eval.py
 """
 from easydict import EasyDict as edict
 data_cfg = edict({
    'num_classes': 50,
    'num_consumer': 4,
    'get_npy': 1,
    'get_mindrecord': 1,
    'audio_path': "/dev/data/Music_Tagger_Data/fea/",
    'npy_path': "/dev/data/Music_Tagger_Data/fea/",
    'info_path': "/dev/data/Music_Tagger_Data/fea/",
    'info_name': 'annotations_final.csv',
    'device_target': 'Ascend',
    'device_id': 0,
    'mr_path': '/dev/data/Music_Tagger_Data/fea/',
    'mr_name': ['train', 'val'],
 })
 music_cfg = edict({
    'pre_trained': False,
    'lr': 0.0005,
    'batch_size': 32,
    'epoch_size': 10,
    'loss_scale': 1024.0,
    'num_consumer': 4,
    'mixed_precision': False,
    'train_filename': 'train.mindrecord0',
    'val_filename': 'val.mindrecord0',
    'data_dir': '/dev/data/Music_Tagger_Data/fea/',
    'device_target': 'Ascend',
    'device_id': 0,
    'keep_checkpoint_max': 10,
    'save_step': 2000,
    'checkpoint_path': '/dev/data/Music_Tagger_Data/model',
    'prefix': 'MusicTagger',
    'model_name': 'MusicTagger_3-50_543.ckpt',
 })
--- a/model_zoo/official/audio/music_auto_tagging/src/dataset.py
+++ b/model_zoo/official/audio/music_auto_tagging/src/dataset.py
@ -0,0 +1,30 @@
 # Copyright 2020 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
 '''python dataset.py'''
 import os
 import mindspore.dataset as ds
 def create_dataset(base_path, filename, batch_size, columns_list,
                   num_consumer):
    """Create dataset"""
    path = os.path.join(base_path, filename)
    dtrain = ds.MindDataset(path, columns_list, num_consumer)
    dtrain = dtrain.shuffle(buffer_size=dtrain.get_dataset_size())
    dtrain = dtrain.batch(batch_size, drop_remainder=True)
    return dtrain
--- a/model_zoo/official/audio/music_auto_tagging/src/loss.py
+++ b/model_zoo/official/audio/music_auto_tagging/src/loss.py
@ -0,0 +1,41 @@
 # Copyright 2020 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
 """
 define loss
 """
 from mindspore import nn
 import mindspore.common.dtype as mstype
 from mindspore.ops import operations as P
 class BCELoss(nn.Cell):
    """
    BCELoss
    """
    def __init__(self, record=None):
        super(BCELoss, self).__init__(record)
        self.sm_scalar = P.ScalarSummary()
        self.cast = P.Cast()
        self.record = record
        self.weight = None
        self.bce = P.BinaryCrossEntropy()
    def construct(self, input_data, target):
        target = self.cast(target, mstype.float32)
        loss = self.bce(input_data, target, self.weight)
        if self.record:
            self.sm_scalar("loss", loss)
        return loss
--- a/model_zoo/official/audio/music_auto_tagging/src/musictagger.py
+++ b/model_zoo/official/audio/music_auto_tagging/src/musictagger.py
@ -0,0 +1,83 @@
 # Copyright 2020 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
 '''model'''
 from mindspore import nn
 from mindspore.ops import operations as P
 class MusicTaggerCNN(nn.Cell):
    """
    Music Tagger CNN
    """
    def __init__(self, in_classes, kernel_size, padding, maxpool, has_bias):
        super(MusicTaggerCNN, self).__init__()
        self.in_classes = in_classes
        self.kernel_size = kernel_size
        self.maxpool = maxpool
        self.padding = padding
        self.has_bias = has_bias
        # build model
        self.conv1 = nn.Conv2d(self.in_classes[0], self.in_classes[1],
                               self.kernel_size[0])
        self.conv2 = nn.Conv2d(self.in_classes[1], self.in_classes[2],
                               self.kernel_size[1])
        self.conv3 = nn.Conv2d(self.in_classes[2], self.in_classes[3],
                               self.kernel_size[2])
        self.conv4 = nn.Conv2d(self.in_classes[3], self.in_classes[4],
                               self.kernel_size[3])
        self.bn1 = nn.BatchNorm2d(self.in_classes[1])
        self.bn2 = nn.BatchNorm2d(self.in_classes[2])
        self.bn3 = nn.BatchNorm2d(self.in_classes[3])
        self.bn4 = nn.BatchNorm2d(self.in_classes[4])
        self.pool1 = nn.MaxPool2d(maxpool[0], maxpool[0])
        self.pool2 = nn.MaxPool2d(maxpool[1], maxpool[1])
        self.pool3 = nn.MaxPool2d(maxpool[2], maxpool[2])
        self.pool4 = nn.MaxPool2d(maxpool[3], maxpool[3])
        self.poolreduce = P.ReduceMax(keep_dims=False)
        self.Act = nn.ReLU()
        self.flatten = nn.Flatten()
        self.dense = nn.Dense(2048, 50, activation='sigmoid')
        self.sigmoid = nn.Sigmoid()
    def construct(self, input_data):
        """
        Music Tagger CNN
        """
        x = self.conv1(input_data)
        x = self.bn1(x)
        x = self.Act(x)
        x = self.pool1(x)
        x = self.conv2(x)
        x = self.bn2(x)
        x = self.Act(x)
        x = self.pool2(x)
        x = self.conv3(x)
        x = self.bn3(x)
        x = self.Act(x)
        x = self.pool3(x)
        x = self.conv4(x)
        x = self.bn4(x)
        x = self.Act(x)
        x = self.poolreduce(x, (2, 3))
        x = self.flatten(x)
        x = self.dense(x)
        return x
--- a/model_zoo/official/audio/music_auto_tagging/src/pre_process_data.py
+++ b/model_zoo/official/audio/music_auto_tagging/src/pre_process_data.py
@ -0,0 +1,226 @@
 # Copyright 2020 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
 '''python dataset.py'''
 import os
 import argparse
 import pandas as pd
 import numpy as np
 import librosa
 from mindspore.mindrecord import FileWriter
 from mindspore import context
 from src.config import data_cfg as cfg
 def compute_melgram(audio_path, save_path='', filename='', save_npy=True):
    """
    extract melgram feature from the audio and save as numpy array
    Args:
        audio_path (str): path to the audio clip.
        save_path (str): path to save the numpy array.
        filename (str): filename of the audio clip.
    Returns:
        numpy array.
    """
    SR = 12000
    N_FFT = 512
    N_MELS = 96
    HOP_LEN = 256
    DURA = 29.12  # to make it 1366 frame..
    src, _ = librosa.load(audio_path, sr=SR)  # whole signal
    n_sample = src.shape[0]
    n_sample_fit = int(DURA * SR)
    if n_sample < n_sample_fit:  # if too short
        src = np.hstack((src, np.zeros((int(DURA * SR) - n_sample,))))
    elif n_sample > n_sample_fit:  # if too long
        src = src[(n_sample - n_sample_fit) // 2:(n_sample + n_sample_fit) //
                  2]
    logam = librosa.core.amplitude_to_db
    melgram = librosa.feature.melspectrogram
    ret = logam(
        melgram(y=src, sr=SR, hop_length=HOP_LEN, n_fft=N_FFT, n_mels=N_MELS))
    ret = ret[np.newaxis, np.newaxis, :]
    if save_npy:
        save_path = save_path + filename[:-4] + '.npy'
        np.save(save_path, ret)
    return ret
 def get_data(features_data, labels_data):
    data_list = []
    for i, (label, feature) in enumerate(zip(labels_data, features_data)):
        data_json = {"id": i, "feature": feature, "label": label}
        data_list.append(data_json)
    return data_list
 def convert(s):
    if s.isdigit():
        return int(s)
    return s
 def GetLabel(info_path, info_name):
    """
    separate dataset into training set and validation set
    Args:
        info_path (str): path to the information file.
        info_name (str): name of the information file.
    """
    T = []
    with open(info_path + '/' + info_name, 'rb') as info:
        data = info.readline()
        while data:
            T.append([
                convert(i[1:-1])
                for i in data.strip().decode('utf-8').split("\t")
            ])
            data = info.readline()
    annotation = pd.DataFrame(T[1:], columns=T[0])
    count = []
    for i in annotation.columns[1:-2]:
        count.append([annotation[i].sum() / len(annotation), i])
    count = sorted(count)
    full_label = []
    for i in count[-50:]:
        full_label.append(i[1])
    out = []
    for i in T[1:]:
        index = [k for k, x in enumerate(i) if x == 1]
        label = [T[0][k] for k in index]
        L = [str(0) for k in range(50)]
        L.append(i[-1])
        for j in label:
            if j in full_label:
                ind = full_label.index(j)
                L[ind] = '1'
        out.append(L)
    out = np.array(out)
    Train = []
    Val = []
    for i in out:
        if np.random.rand() > 0.2:
            Train.append(i)
        else:
            Val.append(i)
    np.savetxt("{}/music_tagging_train_tmp.csv".format(info_path),
               np.array(Train),
               fmt='%s',
               delimiter=',')
    np.savetxt("{}/music_tagging_val_tmp.csv".format(info_path),
               np.array(Val),
               fmt='%s',
               delimiter=',')
 def generator_md(info_name, file_path, num_classes):
    """
    generate numpy array from features of all audio clips
    Args:
        info_path (str): path to the information file.
        file_path (str): path to the npy files.
    Returns:
        2 numpy array.
    """
    df = pd.read_csv(info_name, header=None)
    df.columns = [str(i) for i in range(num_classes)] + ["mp3_path"]
    data = []
    label = []
    for i in range(len(df)):
        try:
            data.append(
                np.load(file_path + df.mp3_path.values[i][:-4] +
                        '.npy').reshape(1, 96, 1366))
            label.append(np.array(df[df.columns[:-1]][i:i + 1])[0])
        except FileNotFoundError:
            print("Exception occurred in generator_md.")
    return np.array(data), np.array(label, dtype=np.int32)
 def convert_to_mindrecord(info_name, file_path, store_path, mr_name,
                          num_classes):
    """ convert dataset to mindrecord """
    num_shard = 4
    data, label = generator_md(info_name, file_path, num_classes)
    schema_json = {
        "id": {
            "type": "int32"
        },
        "feature": {
            "type": "float32",
            "shape": [1, 96, 1366]
        },
        "label": {
            "type": "int32",
            "shape": [num_classes]
        }
    }
    writer = FileWriter(
        os.path.join(store_path, '{}.mindrecord'.format(mr_name)), num_shard)
    datax = get_data(data, label)
    writer.add_schema(schema_json, "music_tagger_schema")
    writer.add_index(["id"])
    writer.write_raw_data(datax)
    writer.commit()
 if __name__ == "__main__":
    parser = argparse.ArgumentParser(description='get feature')
    parser.add_argument('--device_id',
                        type=int,
                        help='device ID',
                        default=None)
    args = parser.parse_args()
    if cfg.get_npy:
        GetLabel(cfg.info_path, cfg.info_name)
        dirname = os.listdir(cfg.audio_path)
        for d in dirname:
            file_name = os.listdir("{}/{}".format(cfg.audio_path, d))
            if not os.path.isdir("{}/{}".format(cfg.npy_path, d)):
                os.mkdir("{}/{}".format(cfg.npy_path, d))
            for f in file_name:
                compute_melgram("{}/{}/{}".format(cfg.audio_path, d, f),
                                "{}/{}/".format(cfg.npy_path, d), f)
    if cfg.get_mindrecord:
        if args.device_id is not None:
            context.set_context(device_target='Ascend',
                                mode=context.GRAPH_MODE,
                                device_id=args.device_id)
        else:
            context.set_context(device_target='Ascend',
                                mode=context.GRAPH_MODE,
                                device_id=cfg.device_id)
        for cmn in cfg.mr_nam:
            if cmn in ['train', 'val']:
                convert_to_mindrecord('music_tagging_{}_tmp.csv'.format(cmn),
                                      cfg.npy_path, cfg.mr_path, cmn,
                                      cfg.num_classes)
--- a/model_zoo/official/audio/music_auto_tagging/src/tag.txt
+++ b/model_zoo/official/audio/music_auto_tagging/src/tag.txt
@ -0,0 +1,50 @@
 choral
 female voice
 metal
 country
 weird
 no voice
 cello
 harp
 beats
 female vocal
 male voice
 dance
 new age
 voice
 choir
 classic
 man
 solo
 sitar
 soft
 no vocal
 pop
 male vocal
 woman
 flute
 quiet
 loud
 harpsichord
 no vocals
 vocals
 singing
 male
 opera
 indian
 female
 synth
 vocal
 violin
 beat
 ambient
 piano
 fast
 rock
 electronic
 drums
 strings
 techno
 slow
 classical
 guitar
--- a/model_zoo/official/audio/music_auto_tagging/train.py
+++ b/model_zoo/official/audio/music_auto_tagging/train.py
@ -0,0 +1,109 @@
 # Copyright 2020 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
 '''
 ##############train models#################
 python train.py
 '''
 import argparse
 from mindspore import context, nn
 from mindspore.train import Model
 from mindspore.common import set_seed
 from mindspore.train.loss_scale_manager import FixedLossScaleManager
 from mindspore.train.serialization import load_checkpoint, load_param_into_net
 from mindspore.train.callback import ModelCheckpoint, CheckpointConfig, LossMonitor, TimeMonitor
 from src.dataset import create_dataset
 from src.musictagger import MusicTaggerCNN
 from src.loss import BCELoss
 from src.config import music_cfg as cfg
 def train(model, dataset_direct, filename, columns_list, num_consumer=4,
          batch=16, epoch=50, save_checkpoint_steps=2172, keep_checkpoint_max=50,
          prefix="model", directory='./'):
    """
    train network
    """
    config_ck = CheckpointConfig(save_checkpoint_steps=save_checkpoint_steps,
                                 keep_checkpoint_max=keep_checkpoint_max)
    ckpoint_cb = ModelCheckpoint(prefix=prefix,
                                 directory=directory,
                                 config=config_ck)
    data_train = create_dataset(dataset_direct, filename, batch, columns_list,
                                num_consumer)
    model.train(epoch,
                data_train,
                callbacks=[
                    ckpoint_cb,
                    LossMonitor(per_print_times=181),
                    TimeMonitor()
                ],
                dataset_sink_mode=True)
 if __name__ == "__main__":
    set_seed(1)
    parser = argparse.ArgumentParser(description='Train model')
    parser.add_argument('--device_id',
                        type=int,
                        help='device ID',
                        default=None)
    args = parser.parse_args()
    if args.device_id is not None:
        context.set_context(device_target='Ascend',
                            mode=context.GRAPH_MODE,
                            device_id=args.device_id)
    else:
        context.set_context(device_target='Ascend',
                            mode=context.GRAPH_MODE,
                            device_id=cfg.device_id)
    context.set_context(enable_auto_mixed_precision=cfg.mixed_precision)
    network = MusicTaggerCNN(in_classes=[1, 128, 384, 768, 2048],
                             kernel_size=[3, 3, 3, 3, 3],
                             padding=[0] * 5,
                             maxpool=[(2, 4), (4, 5), (3, 8), (4, 8)],
                             has_bias=True)
    if cfg.pre_trained:
        param_dict = load_checkpoint(cfg.checkpoint_path + '/' +
                                     cfg.model_name)
        load_param_into_net(network, param_dict)
    net_loss = BCELoss()
    network.set_train(True)
    net_opt = nn.Adam(params=network.trainable_params(),
                      learning_rate=cfg.lr,
                      loss_scale=cfg.loss_scale)
    loss_scale_manager = FixedLossScaleManager(loss_scale=cfg.loss_scale,
                                               drop_overflow_update=False)
    net_model = Model(network, net_loss, net_opt, loss_scale_manager=loss_scale_manager)
    train(model=net_model,
          dataset_direct=cfg.data_dir,
          filename=cfg.train_filename,
          columns_list=['feature', 'label'],
          num_consumer=cfg.num_consumer,
          batch=cfg.batch_size,
          epoch=cfg.epoch_size,
          save_checkpoint_steps=cfg.save_step,
          keep_checkpoint_max=cfg.keep_checkpoint_max,
          prefix=cfg.prefix,
          directory=cfg.checkpoint_path + "_{}".format(cfg.device_id))
    print("train success")