From 27848587a3269c2934968e19c03efa7b4ade078a Mon Sep 17 00:00:00 2001 From: lvmingfu Date: Tue, 12 Jan 2021 17:48:37 +0800 Subject: [PATCH] modify code formats for master --- RELEASE.md | 6 +- .../ccsrc/minddata/dataset/util/README.md | 174 +++++++++++-- .../lite/examples/train_lenet/README_CN.md | 13 +- .../examples/transfer_learning/README_CN.md | 10 +- model_zoo/official/cv/centerface/README.md | 16 +- model_zoo/official/cv/cnnctc/README.md | 130 +++++----- model_zoo/official/cv/cnnctc/README_CN.md | 18 +- model_zoo/official/cv/dpn/README.md | 6 +- model_zoo/official/cv/maskrcnn/README_CN.md | 4 +- .../official/cv/mobilenetv2/README_CN.md | 4 +- .../cv/mobilenetv2_quant/README_CN.md | 33 ++- .../official/cv/mobilenetv3/README_CN.md | 56 ++-- model_zoo/official/cv/psenet/README_CN.md | 2 +- model_zoo/official/cv/resnet/README_CN.md | 2 +- .../official/cv/retinaface_resnet50/README.md | 135 +++++----- .../cv/retinaface_resnet50/README_CN.md | 4 +- model_zoo/official/cv/simple_pose/README.md | 6 +- model_zoo/official/cv/yolov4/README.MD | 91 ++++--- model_zoo/official/nlp/gnmt_v2/README.md | 2 +- model_zoo/official/nlp/prophetnet/README.md | 8 +- model_zoo/official/nlp/tinybert/README.md | 245 +++++++++++------- model_zoo/official/recommend/ncf/README.md | 158 ++++++----- model_zoo/research/audio/fcn-4/README.md | 6 +- model_zoo/research/cv/FaceAttribute/README.md | 4 +- model_zoo/research/cv/FaceDetection/README.md | 4 +- .../cv/FaceQualityAssessment/README.md | 4 +- .../research/cv/FaceRecognition/README.md | 6 +- .../cv/FaceRecognitionForTracking/README.md | 4 +- model_zoo/research/cv/centernet/README.md | 6 +- model_zoo/research/nlp/dscnn/README.md | 183 +++++++------ model_zoo/research/nlp/textrcnn/readme.md | 4 +- 31 files changed, 771 insertions(+), 573 deletions(-) diff --git a/RELEASE.md b/RELEASE.md index c0ba740f147..dc0ac108801 100644 --- a/RELEASE.md +++ b/RELEASE.md @@ -120,7 +120,7 @@ The following optimizers add the target interface: Adam, FTRL, LazyAdam, Proxim -###### `export` Modify the input parameters and export's file name ([!7385](https://gitee.com/mind_spore/dashboard/projects/mindspore/mindspore/pulls/7385?tab=diffs), [!9057](https://gitee.com/mindspore/mindspore/pulls/9057/files)) +###### `export` Modify the input parameters and export's file name ([!7385](https://gitee.com/mindspore/mindspore/pulls/7385), [!9057](https://gitee.com/mindspore/mindspore/pulls/9057/files)) Export the MindSpore prediction model to a file in the specified format. @@ -227,7 +227,7 @@ However, from a user's perspective, tensor.size and tensor.ndim (methods -> prop -###### `EmbeddingLookup` add a config in the interface: sparse ([!8202](https://gitee.com/mind_spore/dashboard/projects/mindspore/mindspore/pulls/8202?tab=diffs)) +###### `EmbeddingLookup` add a config in the interface: sparse ([!8202](https://gitee.com/mindspore/mindspore/pulls/8202)) sparse (bool): Using sparse mode. When 'target' is set to 'CPU', 'sparse' has to be true. Default: True. @@ -878,7 +878,7 @@ Contributions of any kind are welcome! - Fix bug of list cannot be used as input in pynative mode([!1765](https://gitee.com/mindspore/mindspore/pulls/1765)) - Fix bug of kernel select ([!2103](https://gitee.com/mindspore/mindspore/pulls/2103)) - Fix bug of pattern matching for batchnorm fusion in the case of auto mix precision.([!1851](https://gitee.com/mindspore/mindspore/pulls/1851)) - - Fix bug of generate hccl's kernel info.([!2393](https://gitee.com/mindspore/mindspore/mindspore/pulls/2393)) + - Fix bug of generate hccl's kernel info.([!2393](https://gitee.com/mindspore/mindspore/pulls/2393)) - GPU platform - Fix bug of summary feature invalid([!2173](https://gitee.com/mindspore/mindspore/pulls/2173)) - Data processing diff --git a/mindspore/ccsrc/minddata/dataset/util/README.md b/mindspore/ccsrc/minddata/dataset/util/README.md index 7cad3c0d7d0..6c62965d305 100644 --- a/mindspore/ccsrc/minddata/dataset/util/README.md +++ b/mindspore/ccsrc/minddata/dataset/util/README.md @@ -1,22 +1,31 @@ This folder contains miscellaneous utilities used by the dataset code. We will describe a couple important classes in this file. + ## Thread Management + This picture summarizes a few important classes that we will cover in the next few sections. ![Thread management](https://images.gitee.com/uploads/images/2020/0601/220111_9b07c8fa_7342120.jpeg "task_manager.JPG") ## Task + A Task object corresponds to an instance of std::future returning from std::async. In general, a user will not create a Task object directly. Most work will go through TaskManager's TaskGroup interface which we will cover later in this document. Here are some important members and functions of Task class. + ```cpp std::function fnc_obj_; ``` + It is the entry function when the thead is spawned. The function does not take any input and will return a Status object. The returned Status object will be saved in this member + ```cpp Status rc_; ``` + To retrieve the executed result from the entry function, call the following function + ```cpp Status Task::GetTaskErrorIfAny(); ``` + Here is roughly the pseudo code of a lifetime of a Task. Some extra works needed to spawn the thread are omitted for the purpose of simplicity. As mentioned previously, a user never spawn a thread directly using a Task class without using any helper. ```cpp @@ -27,12 +36,14 @@ Here is roughly the pseudo code of a lifetime of a Task. Some extra works needed 5 RETURN_IF_NOT_OK(tk.Join();) 6 RETURN_IF_NOT_OK(tk.GetTaskErrorIfAny()); ``` -In the above example line 1 to 3 we use Task constructor to prepare a thread that we are going to create and what it will be running. We also assign a name to this thread. The name is for eye catcher purpose. The second parameter is the real job for this thread to run. + +In the above example line 1 to 3 we use Task constructor to prepare a thread that we are going to create and what it will be running. We also assign a name to this thread. The name is for eye catcher purpose. The second parameter is the real job for this thread to run.
Line 4 we spawn the thread. In the above example, the thread will execute the lambda function which does nothing but return a OK Status object.
Line 5 We wait for the thread to complete
Line 6 We retrieve the result from running the thread which should be the OK Status object. Another purpose of Task object is to wrap around the entry function and capture any possible exceptions thrown by running the entry function but not being caught within the entry function. + ```cpp try { rc_ = fnc_obj_(); @@ -42,23 +53,30 @@ Another purpose of Task object is to wrap around the entry function and capture rc_ = Status(StatusCode::kUnexpectedError, __LINE__, __FILE__, e.what()); } ``` -Note that + +Note that + ```cpp Status Task::Run(); ``` -is not returning the Status of running the entry function func_obj_. It merely indicates if the spawn is successful or not. This function returns immediately. + +is not returning the Status of running the entry function func_obj_. It merely indicates if the spawn is successful or not. This function returns immediately. Another thing to point out that Task::Run() is not designed to re-run the thread repeatedly, say after it has returned. Result will be unexpected if a Task object is re-run. For the function + ```cpp Status Task::Join(WaitFlag wf = WaitFlag::kBlocking); ``` + where + ```cpp enum class WaitFlag : int { kBlocking, kNonBlocking }; ``` -is also not returning the Status of running the entry function func_obj_ like the function Run(). It can return some other unexpected error while waiting for the thread to return. + +is also not returning the Status of running the entry function func_obj_ like the function Run(). It can return some other unexpected error while waiting for the thread to return. This function blocks (kBlocking) by default until the spawned thread returns. @@ -71,37 +89,49 @@ while (thrd_.wait_for(std::chrono::seconds(1)) != std::future_status::ready) { // Do something if the thread is blocked on a conditional variable } ``` + The main use of this form of Join() is after we have interrupted the thread. A design alternative is to use + ```cpp std::future ``` -to spawn the thread asynchronously and we can get the result using std::future::get(). But get() can only be called once and it is then more convenient to save the returned result in the rc_ member for unlimited number of retrieval. As we shall see later, the value of rc_ will be propagated to high level classes like TaskGroup, master thread. + +to spawn the thread asynchronously and we can get the result using std::future::get(). But get() can only be called once and it is then more convenient to save the returned result in the rc_member for unlimited number of retrieval. As we shall see later, the value of rc_ will be propagated to high level classes like TaskGroup, master thread. Currently it is how the thread is defined in Task class + ```cpp std::future thrd_; ``` + and spawned by this line of code. + ```cpp thrd_ = std::async(std::launch::async, std::ref(*this)); ``` + Every thread can access its own Task object using the FindMe() function. + ```cpp Task * TaskManager::FindMe(); ``` There are other attributes of Task such as interrupt which we will cover later in this document. - + ## TaskGroup + The first helper in managing Task objects is TaskGroup. Technically speaking a TaskGroup is a collection of related Tasks. As of this writing, every Task must belong to a TaskGroup. We spawn a thread using the following function + ```cpp Status TaskGroup::CreateAsyncTask(const std::string &my_name, const std::function &f, Task **pTask = nullptr); ``` + The created Task object is added to the TaskGroup object. In many cases, user do not need to get a reference to the newly created Task object. But the CreateAsyncTask can return one if requested. There is no other way to add a Task object to a TaskGroup other than by calling TaskGroup::CreateAsyncTask. As a result, no Task object can belong to multiple TaskGroup's by design. Every Task object has a back pointer to the TaskGroup it belongs to : + ```cpp TaskGroup *Task::MyTaskGroup(); ``` @@ -110,48 +140,64 @@ Task objects in the same TaskGroup will form a linked list with newly created Ta Globally we support multiple TaskGroups's running concurrently. TaskManager (discussed in the next section) will chain all Task objects from all TaskGroup's in a single LRU linked list. -###### HandShaking +### HandShaking + As of this writing, the following handshaking logic is required. Suppose a thread T1 create another thread, say T2 by calling TaskGroup::CreateAsyncTask. T1 will block on a WaitPost area until T2 post back signalling T1 can resume. + ```cpp // Entry logic of T2 auto *myTask = TaskManager::FindMe(); myTask->Post(); ``` + If T2 is going to spawn more threads, say T3 and T4, it is *highly recommended* that T2 wait for T3 and T4 to post before it posts back to T1. -The purpose of the handshake is to provide a way for T2 to synchronize with T1 if necessary. +The purpose of the handshake is to provide a way for T2 to synchronize with T1 if necessary. TaskGroup provides similar functions as Task but at a group level. + ```cpp void TaskGroup::interrupt_all() noexcept; ``` + This interrupt all the threads currently running in the TaskGroup. The function returns immediately. We will cover more details on the mechanism of interrupt later in this document. + ```cpp Status TaskGroup::join_all(Task::WaitFlag wf = Task::WaitFlag::kBlocking); ``` + This performs Task::Join() on all the threads in the group. This is a blocking call by default. + ```cpp Status TaskGroup::GetTaskErrorIfAny(); ``` + A TaskGroup does not save records for all the Task::rc_ for all the threads in this group. Only the first error is saved. For example, if thread T1 reports error rc1 and later on T2 reports error rc2, only rc1 is saved in the TaskGroup and rc2 is ignored. TaskGroup::GetTaskErrorIfAny() will return rc1 in this case. + ```cpp int size() const noexcept; ``` + This returns the size of the TaskGroup. ## TaskManager + TaskManager is a singleton, meaning there is only one such class object. It is created by another Services singleton object which we will cover it in the later section. + ```cpp TaskManager &TaskManager::GetInstance() ``` + provides the method to access the singleton. TaskManager manages all the TaskGroups and all the Tasks objects ever created. + ```cpp List lru_; List free_lst_; std::set grp_list_; ``` + As mentioned previously, all the Tasks in the same TaskGroup are linked in a linked list local to this TaskGroup. At the TaskManager level, all Task objects from all the TaskGroups are linked in the lru_ list. When a thread finished its job and returned, its corresponding Task object is saved for reuse in the free_lst_. When a new thread is created, TaskManager will first look into the free_lst_ before allocating memory for the new Task object. @@ -159,23 +205,29 @@ When a thread finished its job and returned, its corresponding Task object is sa ```cpp std::shared_ptr master_; ``` + The master thread itself also has a corresponding **fake** Task object in the TaskManager singleton object. But this fake Task is not in any of the List -###### Passing error to the master thread +### Passing error to the master thread + ```cpp void TaskManager::InterruptGroup(Task &); void TaskManager::InterruptMaster(const Status &); Status Status::GetMasterThreadRc(); ``` + When a thread encounters some unexpected error, it performs the following actions before returning + * It saves the error rc in the TaskGroup it belongs (assuming it is the first error reported in the TaskGroup). * It interrupts every other threads in the TaskGroup by calling TaskManager::InterruptGroup. * It interrupts the master thread and copy the error rc to the TaskManager::master_::rc_ by calling TaskManager::InterruptMaster(rc). However, because there can be many TaskGroups running in parallel or back to back, if the TaskManager::master_::rc_ is already set to some error from earlier TaskGroup run but not yet retrieved, the old error code will **not** be overwritten by the new error code. Master thread can query the result using TaskGroup::GetTaskErrorIfAny or TaskManager::GetMasterThreadRc. The first form is the *preferred* method. For the second form, TaskManager::master_::rc_ will be reset to OK() once retrieved such that future call of TaskManager::InterruptMaster() will populate the error to the master thread again. -###### WatchDog +### WatchDog + TaskManager will spawn an additional thread with "Watchdog" as name catcher. It executes the following function once startup + ```cpp Status TaskManager::WatchDog() { TaskManager::FindMe()->Post(); @@ -190,45 +242,57 @@ Status TaskManager::WatchDog() { return Status::OK(); } ``` + Its main purpose is to handle Control-C and stop all the threads from running by interrupting all of them. We will cover more on the function call ServiceStop() when we reach the section about Service class. WatchDog has its own TaskGroup to follow the protocol but it is not in the set of all the TaskGroup. ## Interrupt + C++ std::thread and std::async do not provide a way to stop a thread. So we implement interrupt mechanism to stop a thread from running and exit. -The initial design can be considered as a polling method. A bit or a flag may be set in some global shared area. The running thread will periodically check this bit/flag. If it is set, interrupt has been sent and the thread will quit. This method has a requirement that even if the thread is waiting on a std::conditional_variable, it can't do an unconditional wait() call. That is, it must do a wait_for() with a time out. Once returned from the wait_for() call, the thread must check if it is woken up due to time out or due to the condition is satisfied. +The initial design can be considered as a polling method. A bit or a flag may be set in some global shared area. The running thread will periodically check this bit/flag. If it is set, interrupt has been sent and the thread will quit. This method has a requirement that even if the thread is waiting on a std::conditional_variable, it can't do an unconditional wait() call. That is, it must do a wait_for() with a time out. Once returned from the wait_for() call, the thread must check if it is woken up due to time out or due to the condition is satisfied. -The cons of this approach is the performance cost and we design a pushing method approach. +The cons of this approach is the performance cost and we design a pushing method approach. To begin with we define an abstract class that describe objects that are interruptible. ```cpp class IntrpResource { ... }; ``` + It has two states: + ```cpp enum class State : int { kRunning, kInterrupted }; ``` + either it is in the state of running or being interrupted. There are two virtual functions that any class inherit can override + ```cpp virtual Status Interrupt(); virtual void ResetIntrpState(); ``` + Interrupt() in the base class change the state of the object to kInterrupted. ResetIntrpState() is doing the opposite to reset the state. Any class that inherits the base class can implement its own Interrupt(), for example, we will later on see how a CondVar class (a wrapper for std::condition_variable) deals with interrupt on its own. All related IntrpResource can register to a + ```cpp class IntrpService {...} ``` + It provides the public method + ```cpp void InterruptAll() noexcept; ``` + which goes through all registered IntrpResource objects and call the corresponding Interrupt(). A IntrpResource is always associated with a TaskGroup: + ```cpp class TaskGroup { ... @@ -240,45 +304,62 @@ class TaskGroup { As of this writing, both push and poll methods are used. There are still a few places (e.g. a busy while loop) where a thread must periodically check for interrupt. ## CondVar -A CondVar class is a wrapper of std::condition_variable + +A CondVar class is a wrapper of std::condition_variable + ```cpp std::condition_variable cv_; ``` + and is interruptible : + ```cpp class CondVar : public IntrpResource { ... } ``` + It overrides the Interrupt() method with its own + ```cpp void CondVar::Interrupt() { IntrpResource::Interrupt(); cv_.notify_all(); } ``` + It provides a Wait() method and is equivalent to std::condition_variable::wait. + ```cpp Status Wait(std::unique_lock *lck, const std::function &pred); ``` + The main difference is Wait() is interruptible. Thread returning from Wait must check Status return code if it is being interrupted. Note that once a CondVar is interrupted, its state remains interrupted until it is reset. + ## WaitPost + A WaitPost is an implementation of Event. In brief, it consists of a boolean state and provides methods to synchronize running threads. + * Wait(). If the boolean state is false, the calling threads will block until the boolean state becomes true or an interrupt has occurred. * Set(). Change the boolean state to true. All blocking threads will be released. * Clear(). Reset the boolean state back to false. -WaitPost is implemented on top of CondVar and hence is interruptible, that is, caller of +WaitPost is implemented on top of CondVar and hence is interruptible, that is, caller of + ```cpp Status Wait(); ``` + must check the return Status for interrupt. The initial boolean state is false when a WaitPost object is created. Note that once a Set() call is invoked, the boolean state remains true until it is reset. + ## List + A List is the implementation of doubly linked list. It is not thread safe and so user must provide methods to serialize the access to the list. The main feature of List is it allows an element to be inserted into multiple Lists. Take the Task class as an example. It can be in its TaskGroup list and at the same time linked in the global TaskManager task list. When a Task is done, it will be in the free list. + ```cpp class Task { ... @@ -299,7 +380,9 @@ class TaskManager { ... }; ``` -where Node is defined as + +where Node is defined as + ```cpp template struct Node { @@ -314,10 +397,13 @@ struct Node { } }; ``` -The constructor List class will take Node<> as input so it will follow this Node element to form a doubly linked chain. For example, List lru_ takes Task::node in its constructor while TaskGroup::grp_list_ takes Task::group in its constructor. This way we allow a Task to appear in two distinct linked lists. + +The constructor List class will take Node<> as input so it will follow this Node element to form a doubly linked chain. For example, List lru_takes Task::node in its constructor while TaskGroup::grp_list_ takes Task::group in its constructor. This way we allow a Task to appear in two distinct linked lists. ## Queue + A Queue is a thread safe solution to producer-consumer problem. Every queue is of finite capacity and its size must be provided to the constructor of the Queue. Few methods are provided + * Add(). It appends an element to queue and will be blocked if the queue is full or an interrupt has occurred. * EmplaceBack(). Same as an Add() but construct the element in place. * PopFront(). Remove the first element from the queue and will be blocked if the queue is empty or an interrupt has occurred. @@ -325,16 +411,21 @@ A Queue is a thread safe solution to producer-consumer problem. Every queue is o Queue is implemented on top of CondVar class and hence is interruptible. So callers of the above functions must check for Status return code for interrupt. ## Locking + C++11 does not provide any shared lock support. So we implement some simple locking classes for our own benefits. -###### SpinLock + +### SpinLock + It is a simple exclusive lock based on CAS (compared and swap). The caller repeatedly trying (and hence the name spinning) to acquire the lock until successful. It is best used when the critical section is very short. SpinLock is not interruptible. There is helper class LockGuard to ensure the lock is released if it is acquired. -###### RWLock +### RWLock + It is a simple Read Write Lock where the implementation favors writers. Reader will acquire the lock in S (share) mode while writer will acquire the lock in X (exclusive) mode. X mode is not compatible with S and X. S is compatible with S but not X. In addition, we also provide additional functions + * Upgrade(). Upgrade a S lock to X lock. * Downgrade(). Downgrade a X lock to S lock. @@ -343,15 +434,19 @@ RWLock is not interruptible. Like LockGuard helper class, there are helper classes SharedLock and UniqueLock to release the lock when the lock goes out of scope. ## Treap + A Treap is the combination of BST (Binary Search Tree) and a heap. Each key is given a priority. The priority for any non-leaf node is greater than or equal to the priority of its children. Treap supports the following basic operations + * To search for a given key value. Standard binary search algorithm is applied, ignoring the priorities. * To insert a new key X into the treap. Heap properties of the tree is maintained by tree rotation. * To delete a key from a treap. Heap properties of the tree is maintained by tree rotation. ## MemoryPool + A MemoryPool is an abstract class to allow memory blocks to be dynamically allocated from a designated memory region. Any class that implements MemoryPool must provide the following implementations. + ```cpp // Allocate a block of size n virtual Status Allocate(size_t, void **) = 0; @@ -362,59 +457,83 @@ A MemoryPool is an abstract class to allow memory blocks to be dynamically alloc // Free a pointer virtual void Deallocate(void *) = 0; ``` + There are several implementations of MemoryPool -###### Arena -Arena is a fixed size memory region which is allocated up front. Each Allocate() will sub-allocate a block from this region. + +### Arena + +Arena is a fixed size memory region which is allocated up front. Each Allocate() will sub-allocate a block from this region. Internally free blocks are organized into a Treap where the address of the block is the key and its block size is the priority. So the top of the tree is the biggest free block that can be found. Memory allocation is always fast and at a constant cost. Contiguous free blocks are merged into one single free block. Similar algorithm is used to enlarge a block to avoid memory copy. The main advantage of Arena is we do not need to free individual memory block and simply free the whole region instead. -###### CircularPool +### CircularPool + It is still an experimental class. It consists of one single Arena or multiple Arenas. To allocate memory we circle through the Arenas before new Arena is added. It has an assumption that memory is not kept for too long and will be released at some point in the future, and memory allocation strategy is based on this assumption. ## B+ tree + We also provide B+ tree support. Compared to std::map, we provide the following additional features + * Thread safe * Concurrent insert/update/search support. As of this writing, no delete support has been implemented yet. + ## Service + Many of the internal class inherit from a Service abstract class. A Service class simply speaking it provides service. A Service class consists of four states + ```cpp enum class STATE : int { kStartInProg = 1, kRunning, kStopInProg, kStopped }; ``` + Any class that inherits from Service class must implement the following two methods. + ```cpp virtual Status DoServiceStart() = 0; virtual Status DoServiceStop() = 0; ``` -###### Service::ServiceStart() + +### Service::ServiceStart() + This function brings up the service and moves the state to kRunning. This function is thread safe. If another thread is bringing up the same service at the same time, only one of them will drive the service up. ServiceStart() will call DoServiceStart() provided by the child class when the state reaches kStartInProg. An example will be TaskManager which inherits from Service. Its implementation of DoServiceStart will be to spawn off the WatchDog thread. -###### Service::ServiceStop() + +### Service::ServiceStop() + This function shut down the service and moves the state to kStopped. This function is thread safe. If another thread is bringing down the same service at the same time, only one of them will drive the service down. ServiceStop() will call DoServiceStop() provided by the child class when the states reaches kStopInProg. As an example, Both TaskManager and TaskGroup during service shutdown will generates interrupts to all the threads. -###### State checking + +### State checking + Other important use of Service is to synchronize operations. For example, TaskGroup::CreateAsyncTask will return interrupt error if the current state of TaskGroup is not kRunning. This way we can assure no new thread is allowed to create and added to a TaskGroup while the TaskGroup is going out of scope. Without this state check, we can have Task running without its TaskGroup, and may run into situation the Task is blocked on a CondVar and not returning. + ## Services -Services is a singleton and is the first and only one singleton created as a result of calling + +Services is a singleton and is the first and only one singleton created as a result of calling + ```cpp mindspore::dataset::GlobalInit(); ``` -The first thing Services singleton do is to create a small 16M circular memory pool. This pool is used by many important classes to ensure basic operation will not fail due to out of memory. The most important example is TaskManager. Each Task memory is allocated from this memory pool. + +The first thing Services singleton do is to create a small 16M circular memory pool. This pool is used by many important classes to ensure basic operation will not fail due to out of memory. The most important example is TaskManager. Each Task memory is allocated from this memory pool. The next thing Services do is to spawn another singletons in some specific orders. One of the problems of multiple singletons is we have very limited control on the order of creation and destruction of singletons. Sometimes we need to control which singleton to allocate first and which one to deallocate last. One good example is logger. Logger is usually the last one to shutdown. Services singleton has a requirement on the list of singletons it bring up. They must inherit the Service class. Services singleton will bring each one up by calling the corresponding ServiceStart() function. The destructor of Services singleton will call ServiceStop() to bring down these singletons. TaskManager is a good example. It is invoked by Services singleton. -Services singleton also provide other useful services like +Services singleton also provide other useful services like + * return the current hostname * return the current username * generate a random string ## Path + Path class provides many operating system specific functions to shield the user to write functions for different platforms. As of this writing, the following functions are provided. + ```cpp bool Exists(); bool IsDirectory(); @@ -423,4 +542,5 @@ Path class provides many operating system specific functions to shield the user std::string Extension() const; std::string ParentPath(); ``` + Simple "/" operators are also provided to allow folders and/or files to be concatenated and work on all platforms including Windows. diff --git a/mindspore/lite/examples/train_lenet/README_CN.md b/mindspore/lite/examples/train_lenet/README_CN.md index 0e7c4302064..8ce70760f13 100644 --- a/mindspore/lite/examples/train_lenet/README_CN.md +++ b/mindspore/lite/examples/train_lenet/README_CN.md @@ -2,11 +2,15 @@ +- [目录](#目录) - [概述](#概述) - [数据集](#数据集) - [环境要求](#环境要求) - [快速入门](#快速入门) - [脚本详述](#脚本详述) + - [模型准备](#模型准备) + - [模型训练](#模型训练) +- [工程目录](#工程目录) @@ -14,7 +18,7 @@ 本文主要讲解如何在端侧进行LeNet模型训练。首先在服务器或个人笔记本上进行模型转换;然后在安卓设备上训练模型。LeNet由2层卷积和3层全连接层组成,模型结构简单,因此可以在设备上快速训练。 -# Dataset +# 数据集 本例使用[MNIST手写字数据集](http://yann.lecun.com/exdb/mnist/) @@ -40,8 +44,9 @@ mnist/ # 环境要求 - 服务器或个人笔记本 - - [MindSpore Framework](https://www.mindspore.cn/install/en): 建议使用Docker安装 - - [MindSpore ToD Framework](https://www.mindspore.cn/tutorial/tod/en/use/prparation.html) + - [MindSpore Framework](https://www.mindspore.cn/install): 建议使用Docker安装 + - [MindSpore ToD Download](https://www.mindspore.cn/tutorial/lite/zh-CN/master/use/downloads.html) + - [MindSpore ToD Build](https://www.mindspore.cn/tutorial/lite/zh-CN/master/use/build.html) - [Android NDK r20b](https://dl.google.com/android/repository/android-ndk-r20b-linux-x86_64.zip) - [Android SDK](https://developer.android.com/studio?hl=zh-cn#cmdline-tools) - Android移动设备 @@ -116,4 +121,4 @@ train_lenet/ │   ├── model │   │   └── lenet_tod.ms # model to train │   └── train.sh # on-device script that load the initial model and train it -``` \ No newline at end of file +``` diff --git a/mindspore/lite/examples/transfer_learning/README_CN.md b/mindspore/lite/examples/transfer_learning/README_CN.md index 9a33fab807f..bd59a9b99e2 100644 --- a/mindspore/lite/examples/transfer_learning/README_CN.md +++ b/mindspore/lite/examples/transfer_learning/README_CN.md @@ -2,10 +2,14 @@ +- [目录](#目录) - [概述](#概述) -- [数据集](#环境要求) +- [数据集](#数据集) - [环境要求](#环境要求) - [快速入门](#快速入门) +- [脚本详述](#脚本详述) + - [模型准备](#模型准备) + - [模型训练](#模型训练) - [工程目录](#工程目录) @@ -22,6 +26,7 @@ - 数据格式:jpeg > 注意 +> > - 当前发布版本中,数据通过dataset.cc中自定义的`DataSet`类加载。我们使用[ImageMagick convert tool](https://imagemagick.org/)进行数据预处理,包括图像裁剪、转换为BMP格式。 > - 本例将使用10分类而不是365类。 > - 训练、验证和测试数据集的比例分别是3:1:1。 @@ -42,7 +47,8 @@ places - 服务端 - [MindSpore Framework](https://www.mindspore.cn/install/en) - 建议使用安装docker环境 - - [MindSpore ToD Framework](https://www.mindspore.cn/tutorial/tod/en/use/prparation.html) + - [MindSpore ToD Download](https://www.mindspore.cn/tutorial/lite/zh-CN/master/use/downloads.html) + - [MindSpore ToD Build](https://www.mindspore.cn/tutorial/lite/zh-CN/master/use/build.html) - [Android NDK r20b](https://dl.google.com/android/repository/android-ndk-r20b-linux-x86_64.zip) - [Android SDK](https://developer.android.com/studio?hl=zh-cn#cmdline-tools) - [ImageMagick convert tool](https://imagemagick.org/) diff --git a/model_zoo/official/cv/centerface/README.md b/model_zoo/official/cv/centerface/README.md index 71db6c0c22c..0ad5cd6028e 100644 --- a/model_zoo/official/cv/centerface/README.md +++ b/model_zoo/official/cv/centerface/README.md @@ -1,6 +1,9 @@ # Contents -- [CenterFace Description](#CenterFace-description) + + +- [Contents](#contents) +- [CenterFace Description](#centerface-description) - [Model Architecture](#model-architecture) - [Dataset](#dataset) - [Environment Requirements](#environment-requirements) @@ -11,7 +14,7 @@ - [Training Process](#training-process) - [Training](#training) - [Testing Process](#testing-process) - - [Evaluation](#testing) + - [Testing](#testing) - [Evaluation Process](#evaluation-process) - [Evaluation](#evaluation) - [Convert Process](#convert-process) @@ -20,8 +23,11 @@ - [Performance](#performance) - [Evaluation Performance](#evaluation-performance) - [Inference Performance](#inference-performance) +- [Description of Random Situation](#description-of-random-situation) - [ModelZoo Homepage](#modelzoo-homepage) + + # [CenterFace Description](#contents) CenterFace is a practical anchor-free face detection and alignment method for edge devices, we support training and evaluation on Ascend910. @@ -80,8 +86,8 @@ other datasets need to use the same format as WiderFace. - Framework - [MindSpore](https://cmc-szv.clouddragon.huawei.com/cmcversion/index/search?searchKey=Do-MindSpore%20V100R001C00B622) - For more information, please check the resources below: - - [MindSpore tutorials](https://www.mindspore.cn/tutorial/zh-CN/master/index.html) - - [MindSpore API](https://www.mindspore.cn/api/zh-CN/master/index.html) + - [MindSpore tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html) + - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html) # [Quick Start](#contents) @@ -226,7 +232,7 @@ sh eval_all.sh the command is: python train.py [train parameters] Major parameters train.py as follows: -```python +```text --lr: learning rate --per_batch_size: batch size on each device --is_distributed: multi-device or not diff --git a/model_zoo/official/cv/cnnctc/README.md b/model_zoo/official/cv/cnnctc/README.md index 75f2137ce19..dc5ffa4410e 100644 --- a/model_zoo/official/cv/cnnctc/README.md +++ b/model_zoo/official/cv/cnnctc/README.md @@ -22,12 +22,12 @@ - [How to use](#how-to-use) - [Inference](#inference) - [Continue Training on the Pretrained Model](#continue-training-on-the-pretrained-model) - - [Transfer Learning](#transfer-learning) + - [Transfer Learning](#transfer-learning) - [Description of Random Situation](#description-of-random-situation) - [ModelZoo Homepage](#modelzoo-homepage) - # [CNNCTC Description](#contents) + This paper proposes three major contributions to addresses scene text recognition (STR). First, we examine the inconsistencies of training and evaluation datasets, and the performance gap results from inconsistencies. Second, we introduce a unified four-stage STR framework that most existing STR models fit into. @@ -38,10 +38,9 @@ comparisons to understand the performance gain of the existing modules. [Paper](https://arxiv.org/abs/1904.01906): J. Baek, G. Kim, J. Lee, S. Park, D. Han, S. Yun, S. J. Oh, and H. Lee, “What is wrong with scene text recognition model comparisons? dataset and model analysis,” ArXiv, vol. abs/1904.01906, 2019. # [Model Architecture](#contents) + This is an example of training CNN+CTC model for text recognition on MJSynth and SynthText dataset with MindSpore. - - # [Dataset](#contents) Note that you can run the scripts based on the dataset mentioned in original paper or widely used in relevant domain/network architecture. In the following sections, we will introduce how to run the scripts using the related dataset below. @@ -49,14 +48,18 @@ Note that you can run the scripts based on the dataset mentioned in original pap The [MJSynth](https://www.robots.ox.ac.uk/~vgg/data/text/) and [SynthText](https://github.com/ankush-me/SynthText) dataset are used for model training. The [The IIIT 5K-word dataset](https://cvit.iiit.ac.in/research/projects/cvit-projects/the-iiit-5k-word-dataset) dataset is used for evaluation. - step 1: + All the datasets have been preprocessed and stored in .lmdb format and can be downloaded [**HERE**](https://drive.google.com/drive/folders/192UfE9agQUMNq6AgU3_E05_FcPZK4hyt). - step 2: + Uncompress the downloaded file, rename the MJSynth dataset as MJ, the SynthText dataset as ST and the IIIT dataset as IIIT. - step 3: + Move above mentioned three datasets into `cnnctc_data` folder, and the structure should be as below: -``` + +```text |--- CNNCTC/ |--- cnnctc_data/ |--- ST/ @@ -68,13 +71,15 @@ Move above mentioned three datasets into `cnnctc_data` folder, and the structure |--- IIIT/ data.mdb lock.mdb - + ...... ``` - step 4: + Preprocess the dataset by running: -``` + +```bash python src/preprocess_dataset.py ``` @@ -84,31 +89,27 @@ This takes around 75 minutes. ## Mixed Precision -The [mixed precision](https://www.mindspore.cn/tutorial/zh-CN/master/advanced_use/mixed_precision.html) training method accelerates the deep learning neural network training process by using both the single-precision and half-precision data formats, and maintains the network precision achieved by the single-precision training at the same time. Mixed precision training can accelerate the computation process, reduce memory usage, and enable a larger model or batch size to be trained on specific hardware. +The [mixed precision](https://www.mindspore.cn/tutorial/training/en/master/advanced_use/enable_mixed_precision.html) training method accelerates the deep learning neural network training process by using both the single-precision and half-precision data formats, and maintains the network precision achieved by the single-precision training at the same time. Mixed precision training can accelerate the computation process, reduce memory usage, and enable a larger model or batch size to be trained on specific hardware. For FP16 operators, if the input data type is FP32, the backend of MindSpore will automatically handle it with reduced precision. Users could check the reduced-precision operators by enabling INFO log and then searching ‘reduce precision’. - - # [Environment Requirements](#contents) - Hardware(Ascend) - - - Prepare hardware environment with Ascend processor. If you want to try Ascend , please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources. + + - Prepare hardware environment with Ascend processor. If you want to try Ascend , please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources. - Framework - - - [MindSpore](https://www.mindspore.cn/install/en) + + - [MindSpore](https://www.mindspore.cn/install/en) - For more information, please check the resources below: - - [MindSpore tutorials](https://www.mindspore.cn/tutorial/zh-CN/master/index.html) + - [MindSpore tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html) - - [MindSpore API](https://www.mindspore.cn/api/zh-CN/master/index.html) - - - + - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html) # [Quick Start](#contents) - Install dependencies: -``` + +```bash pip install lmdb pip install Pillow pip install tqdm @@ -116,25 +117,30 @@ pip install six ``` - Standalone Training: -``` + +```bash bash scripts/run_standalone_train_ascend.sh $PRETRAINED_CKPT ``` - Distributed Training: -``` + +```bash bash scripts/run_distribute_train_ascend.sh $RANK_TABLE_FILE $PRETRAINED_CKPT ``` - Evaluation: -``` + +```bash bash scripts/run_eval_ascend.sh $TRAINED_CKPT ``` # [Script Description](#contents) ## [Script and Sample Code](#contents) + The entire code structure is as following: -``` + +```text |--- CNNCTC/ |---README.md // descriptions about cnnctc |---train.py // train scripts @@ -154,39 +160,41 @@ The entire code structure is as following: ``` - ## [Script Parameters](#contents) + Parameters for both training and evaluation can be set in `config.py`. Arguments: - * `--CHARACTER`: Character labels. - * `--NUM_CLASS`: The number of classes including all character labels and the label for CTCLoss. - * `--HIDDEN_SIZE`: Model hidden size. - * `--FINAL_FEATURE_WIDTH`: The number of features. - * `--IMG_H`: The height of input image. - * `--IMG_W`: The width of input image. - * `--TRAIN_DATASET_PATH`: The path to training dataset. - * `--TRAIN_DATASET_INDEX_PATH`: The path to training dataset index file which determines the order . - * `--TRAIN_BATCH_SIZE`: Training batch size. The batch size and index file must ensure input data is in fixed shape. - * `--TRAIN_DATASET_SIZE`: Training dataset size. - * `--TEST_DATASET_PATH`: The path to test dataset. - * `--TEST_BATCH_SIZE`: Test batch size. - * `--TRAIN_EPOCHS`:Total training epochs. - * `--CKPT_PATH`:The path to model checkpoint file, can be used to resume training and evaluation. - * `--SAVE_PATH`:The path to save model checkpoint file. - * `--LR`:Learning rate for standalone training. - * `--LR_PARA`:Learning rate for distributed training. - * `--MOMENTUM`:Momentum. - * `--LOSS_SCALE`:Loss scale to prevent gradient underflow. - * `--SAVE_CKPT_PER_N_STEP`:Save model checkpoint file per N steps. - * `--KEEP_CKPT_MAX_NUM`:The maximum number of saved model checkpoint file. + +- `--CHARACTER`: Character labels. +- `--NUM_CLASS`: The number of classes including all character labels and the label for CTCLoss. +- `--HIDDEN_SIZE`: Model hidden size. +- `--FINAL_FEATURE_WIDTH`: The number of features. +- `--IMG_H`: The height of input image. +- `--IMG_W`: The width of input image. +- `--TRAIN_DATASET_PATH`: The path to training dataset. +- `--TRAIN_DATASET_INDEX_PATH`: The path to training dataset index file which determines the order . +- `--TRAIN_BATCH_SIZE`: Training batch size. The batch size and index file must ensure input data is in fixed shape. +- `--TRAIN_DATASET_SIZE`: Training dataset size. +- `--TEST_DATASET_PATH`: The path to test dataset. +- `--TEST_BATCH_SIZE`: Test batch size. +- `--TRAIN_EPOCHS`:Total training epochs. +- `--CKPT_PATH`:The path to model checkpoint file, can be used to resume training and evaluation. +- `--SAVE_PATH`:The path to save model checkpoint file. +- `--LR`:Learning rate for standalone training. +- `--LR_PARA`:Learning rate for distributed training. +- `--MOMENTUM`:Momentum. +- `--LOSS_SCALE`:Loss scale to prevent gradient underflow. +- `--SAVE_CKPT_PER_N_STEP`:Save model checkpoint file per N steps. +- `--KEEP_CKPT_MAX_NUM`:The maximum number of saved model checkpoint file. ## [Training Process](#contents) ### Training - Standalone Training: -``` + +```bash bash scripts/run_standalone_train_ascend.sh $PRETRAINED_CKPT ``` @@ -195,22 +203,22 @@ Results and checkpoints are written to `./train` folder. Log can be found in `./ `$PRETRAINED_CKPT` is the path to model checkpoint and it is **optional**. If none is given the model will be trained from scratch. - Distributed Training: -``` + +```bash bash scripts/run_distribute_train_ascend.sh $RANK_TABLE_FILE $PRETRAINED_CKPT ``` Results and checkpoints are written to `./train_parallel_{i}` folder for device `i` respectively. Log can be found in `./train_parallel_{i}/log_{i}.log` and loss values are recorded in `./train_parallel_{i}/loss.log`. -`$RANK_TABLE_FILE` is needed when you are running a distribute task on ascend. +`$RANK_TABLE_FILE` is needed when you are running a distribute task on ascend. `$PATH_TO_CHECKPOINT` is the path to model checkpoint and it is **optional**. If none is given the model will be trained from scratch. ### Training Result Training result will be stored in the example path, whose folder name begins with "train" or "train_parallel". You can find checkpoint file together with result like the followings in loss.log. - -``` +```text # distribute training result(8p) epoch: 1 step: 1 , loss is 76.25, average time per step is 0.235177839748392712 epoch: 1 step: 2 , loss is 73.46875, average time per step is 0.25798572540283203 @@ -234,18 +242,20 @@ epoch: 1 step: 8698 , loss is 9.708542263610315, average time per step is 0.2184 ## [Evaluation Process](#contents) ### Evaluation + - Evaluation: -``` + +```bash bash scripts/run_eval_ascend.sh $TRAINED_CKPT ``` The model will be evaluated on the IIIT dataset, sample results and overall accuracy will be printed. - # [Model Description](#contents) + ## [Performance](#contents) -### Training Performance +### Training Performance | Parameters | CNNCTC | | -------------------------- | ----------------------------------------------------------- | @@ -260,8 +270,7 @@ The model will be evaluated on the IIIT dataset, sample results and overall accu | Speed | 1pc: 250 ms/step; 8pcs: 260 ms/step | | Total time | 1pc: 15 hours; 8pcs: 1.92 hours | | Parameters (M) | 177 | -| Scripts | https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/cnnctc | - +| Scripts | | ### Evaluation Performance @@ -278,13 +287,14 @@ The model will be evaluated on the IIIT dataset, sample results and overall accu | Model for inference | 675M (.ckpt file) | ## [How to use](#contents) + ### Inference -If you need to use the trained model to perform inference on multiple hardware platforms, such as GPU, Ascend 910 or Ascend 310, you can refer to this [Link](https://www.mindspore.cn/tutorial/zh-CN/master/advanced_use/network_migration.html). Following the steps below, this is a simple example: +If you need to use the trained model to perform inference on multiple hardware platforms, such as GPU, Ascend 910 or Ascend 310, you can refer to this [Link](https://www.mindspore.cn/tutorial/training/en/master/advanced_use/migrate_3rd_scripts.html). Following the steps below, this is a simple example: - Running on Ascend - ``` + ```python # Set context context.set_context(mode=context.GRAPH_HOME, device_target=cfg.device_target) context.set_context(device_id=cfg.device_id) @@ -315,7 +325,7 @@ If you need to use the trained model to perform inference on multiple hardware p - running on Ascend - ``` + ```python # Load dataset dataset = create_dataset(cfg.data_path, 1) batch_num = dataset.get_dataset_size() @@ -349,6 +359,6 @@ If you need to use the trained model to perform inference on multiple hardware p print("train success") ``` - # [ModelZoo Homepage](#contents) + Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo). diff --git a/model_zoo/official/cv/cnnctc/README_CN.md b/model_zoo/official/cv/cnnctc/README_CN.md index c65e9b6d08c..83194372e34 100644 --- a/model_zoo/official/cv/cnnctc/README_CN.md +++ b/model_zoo/official/cv/cnnctc/README_CN.md @@ -11,18 +11,18 @@ - [环境要求](#环境要求) - [快速入门](#快速入门) - [脚本说明](#脚本说明) - - [脚本及样例代码](#脚本及样例代码) - - [脚本参数](#脚本参数) - - [训练过程](#训练过程) + - [脚本及样例代码](#脚本及样例代码) + - [脚本参数](#脚本参数) + - [训练过程](#训练过程) - [训练](#训练) - [训练结果](#训练结果) - - [评估过程](#评估过程) + - [评估过程](#评估过程) - [评估](#评估) - [模型描述](#模型描述) - - [性能](#性能) + - [性能](#性能) - [训练性能](#训练性能) - [评估性能](#评估性能) - - [用法](#用法) + - [用法](#用法) - [推理](#推理) - [在预训练模型上继续训练](#在预训练模型上继续训练) - [ModelZoo主页](#modelzoo主页) @@ -101,12 +101,12 @@ python src/preprocess_dataset.py - 框架 - - [MindSpore](https://www.mindspore.cn/install) + - [MindSpore](https://www.mindspore.cn/install) - 如需查看详情,请参见如下资源: - - [MindSpore教程](https://www.mindspore.cn/tutorial/zh-CN/master/index.html) + - [MindSpore教程](https://www.mindspore.cn/tutorial/training/zh-CN/master/index.html) - - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/zh-CN/master/index.html) + - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/zh-CN/master/index.html) # 快速入门 diff --git a/model_zoo/official/cv/dpn/README.md b/model_zoo/official/cv/dpn/README.md index d8ccd71ca24..90f4997d6c7 100644 --- a/model_zoo/official/cv/dpn/README.md +++ b/model_zoo/official/cv/dpn/README.md @@ -67,7 +67,7 @@ All the models in this repository are trained and validated on ImageNet-1K. The ## [Mixed Precision](#contents) -The [mixed precision](https://www.mindspore.cn/tutorial/zh-CN/master/advanced_use/mixed_precision.html) training method accelerates the deep learning neural network training process by using both the single-precision and half-precision data formats, and maintains the network precision achieved by the single-precision training at the same time. Mixed precision training can accelerate the computation process, reduce memory usage, and enable a larger model or batch size to be trained on specific hardware. For FP16 operators, if the input data type is FP32, the backend of MindSpore will automatically handle it with reduced precision. Users could check the reduced-precision operators by enabling INFO log and then searching ‘reduce precision’. +The [mixed precision](https://www.mindspore.cn/tutorial/training/en/master/advanced_use/enable_mixed_precision.html) training method accelerates the deep learning neural network training process by using both the single-precision and half-precision data formats, and maintains the network precision achieved by the single-precision training at the same time. Mixed precision training can accelerate the computation process, reduce memory usage, and enable a larger model or batch size to be trained on specific hardware. For FP16 operators, if the input data type is FP32, the backend of MindSpore will automatically handle it with reduced precision. Users could check the reduced-precision operators by enabling INFO log and then searching ‘reduce precision’. # [Environment Requirements](#contents) @@ -81,8 +81,8 @@ To run the python scripts in the repository, you need to prepare the environment - Easydict - MXNet 1.6.0 if running the script `param_convert.py` - For more information, please check the resources below: - - [MindSpore tutorials](https://www.mindspore.cn/tutorial/zh-CN/master/index.html) - - [MindSpore API](https://www.mindspore.cn/api/zh-CN/master/index.html) + - [MindSpore tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html) + - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html) # [Quick Start](#contents) diff --git a/model_zoo/official/cv/maskrcnn/README_CN.md b/model_zoo/official/cv/maskrcnn/README_CN.md index a903d3ff29d..655221801b4 100644 --- a/model_zoo/official/cv/maskrcnn/README_CN.md +++ b/model_zoo/official/cv/maskrcnn/README_CN.md @@ -50,7 +50,7 @@ MaskRCNN是一个两级目标检测网络,作为FasterRCNN的扩展模型, - 注释:241M;包括实例、字幕、人物关键点等 - 数据格式:图像及JSON文件 - - 注:数据在[dataset.py](http://dataset.py/)中处理。 + - 注:数据在`dataset.py`中处理。 # 环境要求 @@ -583,7 +583,7 @@ Accumulating evaluation results... # 随机情况说明 -[dataset.py](http://dataset.py/)中设置了“create_dataset”函数内的种子,同时还使用[train.py](http://train.py/)中的随机种子进行权重初始化。 +`dataset.py`中设置了“create_dataset”函数内的种子,同时还使用`train.py`中的随机种子进行权重初始化。 # ModelZoo主页 diff --git a/model_zoo/official/cv/mobilenetv2/README_CN.md b/model_zoo/official/cv/mobilenetv2/README_CN.md index a1b39c382b9..12d55fd6c3d 100644 --- a/model_zoo/official/cv/mobilenetv2/README_CN.md +++ b/model_zoo/official/cv/mobilenetv2/README_CN.md @@ -58,7 +58,7 @@ MobileNetV2总体网络架构如下: - 硬件(Ascend/GPU/CPU) - 使用Ascend、GPU或CPU处理器来搭建硬件环境。如需试用Ascend处理器,请发送[申请表](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx)至ascend@huawei.com,审核通过即可获得资源。 - 框架 - - [MindSpore](https://www.mindspore.cn/install/en) + - [MindSpore](https://www.mindspore.cn/install) - 如需查看详情,请参见如下资源: - [MindSpore教程](https://www.mindspore.cn/tutorial/training/zh-CN/master/index.html) - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/zh-CN/master/index.html) @@ -222,7 +222,7 @@ python export.py --platform [PLATFORM] --ckpt_file [CKPT_PATH] --file_format [EX # 随机情况说明 - + 在train.py中,设置了numpy.random、minspore.common.Initializer、minspore.ops.composite.random_ops和minspore.nn.probability.distribution所使用的种子。 # ModelZoo主页 diff --git a/model_zoo/official/cv/mobilenetv2_quant/README_CN.md b/model_zoo/official/cv/mobilenetv2_quant/README_CN.md index 8704674e522..6a45135fbd3 100644 --- a/model_zoo/official/cv/mobilenetv2_quant/README_CN.md +++ b/model_zoo/official/cv/mobilenetv2_quant/README_CN.md @@ -1,4 +1,5 @@ # 目录 + - [目录](#目录) @@ -30,7 +31,6 @@ # MobileNetV2描述 - MobileNetV2结合硬件感知神经网络架构搜索(NAS)和NetAdapt算法,已经可以移植到手机CPU上运行,后续随新架构进一步优化改进。(2019年11月20日) [论文](https://arxiv.org/pdf/1905.02244):Howard, Andrew, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang et al."Searching for MobileNetV2."In Proceedings of the IEEE International Conference on Computer Vision, pp. 1314-1324.2019. @@ -47,12 +47,13 @@ MobileNetV2总体网络架构如下: 使用的数据集:[imagenet](http://www.image-net.org/) --数据集大小:125G,共1000个类、1.2万张彩色图像 - - 训练集: 120G,共1.2万张图像 - - 测试集:5G,共5万张图像 -- 数据格式:RGB - - 注:数据在src/dataset.py中处理。 +- 数据集大小:125G,共1000个类、1.2万张彩色图像 + - 训练集: 120G,共1.2万张图像 + - 测试集:5G,共5万张图像 + +- 数据格式:RGB + - 注:数据在src/dataset.py中处理。 # 特性 @@ -64,13 +65,12 @@ MobileNetV2总体网络架构如下: # 环境要求 - 硬件:昇腾处理器(Ascend) - - 使用昇腾处理器来搭建硬件环境。如需试用昇腾处理器,请发送[申请表](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx)至ascend@huawei.com,审核通过即可获得资源。 + - 使用昇腾处理器来搭建硬件环境。如需试用昇腾处理器,请发送[申请表](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx)至ascend@huawei.com,审核通过即可获得资源。 - 框架 - - [MindSpore](https://www.mindspore.cn/install/en) + - [MindSpore](https://www.mindspore.cn/install) - 如需查看详情,请参见如下资源 - - [MindSpore教程](https://www.mindspore.cn/tutorial/training/zh-CN/master/index.html) - - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/zh-CN/master/index.html) - + - [MindSpore教程](https://www.mindspore.cn/tutorial/training/zh-CN/master/index.html) + - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/zh-CN/master/index.html) # 脚本说明 @@ -94,7 +94,6 @@ MobileNetV2总体网络架构如下: ├── export.py # 导出检查点文件到air/onnx中 ``` - ## 脚本参数 在config.py中可以同时配置训练参数和评估参数。 @@ -123,13 +122,11 @@ MobileNetV2总体网络架构如下: ### 用法 - 使用python或shell脚本开始训练。shell脚本的使用方法如下: - bash run_train.sh [Ascend] [RANK_TABLE_FILE] [DATASET_PATH] [PRETRAINED_CKPT_PATH]\(可选) - bash run_train.sh [GPU] [DEVICE_ID_LIST] [DATASET_PATH] [PRETRAINED_CKPT_PATH]\(可选) - ### 启动 ``` bash @@ -143,7 +140,7 @@ MobileNetV2总体网络架构如下: 训练结果保存在示例路径中。`Ascend`处理器训练的检查点默认保存在`./train/device$i/checkpoint`,训练日志重定向到`./train/device$i/train.log`。`GPU`处理器训练的检查点默认保存在`./train/checkpointckpt_$i`中,训练日志重定向到`./train/train.log`中。 `train.log`内容如下: -``` +```text epoch:[ 0/200], step:[ 624/ 625], loss:[5.258/5.258], time:[140412.236], lr:[0.100] epoch time:140522.500, per step time:224.836, avg loss:5.258 epoch:[ 1/200], step:[ 624/ 625], loss:[3.917/3.917], time:[138221.250], lr:[0.200] @@ -160,7 +157,7 @@ epoch time:138331.250, per step time:221.330, avg loss:3.917 ### 启动 -``` +```bash # 推理示例 shell: Ascend: sh run_infer_quant.sh Ascend ~/imagenet/val/ ~/train/mobilenet-60_1601.ckpt @@ -172,7 +169,7 @@ epoch time:138331.250, per step time:221.330, avg loss:3.917 推理结果保存在示例路径,可以在`./val/infer.log`中找到如下结果: -``` +```text result:{'acc':0.71976314102564111} ``` @@ -218,7 +215,7 @@ result:{'acc':0.71976314102564111} # 随机情况说明 -[dataset.py](http://dataset.py/)中设置了“create_dataset”函数内的种子,同时还使用了train.py中的随机种子。 +`dataset.py`中设置了“create_dataset”函数内的种子,同时还使用了train.py中的随机种子。 # ModelZoo主页 diff --git a/model_zoo/official/cv/mobilenetv3/README_CN.md b/model_zoo/official/cv/mobilenetv3/README_CN.md index 47d916906c8..89755c6644e 100644 --- a/model_zoo/official/cv/mobilenetv3/README_CN.md +++ b/model_zoo/official/cv/mobilenetv3/README_CN.md @@ -1,4 +1,5 @@ # 目录 + - [目录](#目录) @@ -27,7 +28,6 @@ # MobileNetV3描述 - MobileNetV3结合硬件感知神经网络架构搜索(NAS)和NetAdapt算法,已经可以移植到手机CPU上运行,后续随新架构进一步优化改进。(2019年11月20日) [论文](https://arxiv.org/pdf/1905.02244):Howard, Andrew, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang et al."Searching for mobilenetv3."In Proceedings of the IEEE International Conference on Computer Vision, pp. 1314-1324.2019. @@ -43,38 +43,36 @@ MobileNetV3总体网络架构如下: 使用的数据集:[imagenet](http://www.image-net.org/) - 数据集大小:125G,共1000个类、1.2万张彩色图像 - - 训练集:120G,共1.2万张图像 - - 测试集:5G,共5万张图像 + - 训练集:120G,共1.2万张图像 + - 测试集:5G,共5万张图像 - 数据格式:RGB - - 注:数据在src/dataset.py中处理。 - + - 注:数据在src/dataset.py中处理。 # 环境要求 - 硬件:GPU - - 准备GPU处理器搭建硬件环境。 + - 准备GPU处理器搭建硬件环境。 - 框架 - - [MindSpore](https://www.mindspore.cn/install/en) + - [MindSpore](https://www.mindspore.cn/install) - 如需查看详情,请参见如下资源: - - [MindSpore教程](https://www.mindspore.cn/tutorial/training/zh-CN/master/index.html) - - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/zh-CN/master/index.html) - + - [MindSpore教程](https://www.mindspore.cn/tutorial/training/zh-CN/master/index.html) + - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/zh-CN/master/index.html) # 脚本说明 ## 脚本和样例代码 ```python -├── MobileNetV3 - ├── Readme.md # MobileNetV3相关描述 - ├── scripts - │ ├──run_train.sh # 用于训练的shell脚本 - │ ├──run_eval.sh # 用于评估的shell脚本 - ├── src - │ ├──config.py # 参数配置 +├── MobileNetV3 + ├── Readme.md # MobileNetV3相关描述 + ├── scripts + │ ├──run_train.sh # 用于训练的shell脚本 + │ ├──run_eval.sh # 用于评估的shell脚本 + ├── src + │ ├──config.py # 参数配置 │ ├──dataset.py # 创建数据集 │ ├──launch.py # 启动python脚本 - │ ├──lr_generator.py # 配置学习率 + │ ├──lr_generator.py # 配置学习率 │ ├──mobilenetV3.py # MobileNetV3架构 ├── train.py # 训练脚本 ├── eval.py # 评估脚本 @@ -91,7 +89,7 @@ MobileNetV3总体网络架构如下: ### 启动 -``` +```bash # 训练示例 python: GPU: python train.py --dataset_path ~/imagenet/train/ --device_targe GPU @@ -101,9 +99,9 @@ MobileNetV3总体网络架构如下: ### 结果 -训练结果保存在示例路径中。检查点默认保存在`./checkpoint`中,训练日志重定向到`./train/train.log`,如下所示: +训练结果保存在示例路径中。检查点默认保存在`./checkpoint`中,训练日志重定向到`./train/train.log`,如下所示: -``` +```text epoch:[ 0/200], step:[ 624/ 625], loss:[5.258/5.258], time:[140412.236], lr:[0.100] epoch time:140522.500, per step time:224.836, avg loss:5.258 epoch:[ 1/200], step:[ 624/ 625], loss:[3.917/3.917], time:[138221.250], lr:[0.200] @@ -120,7 +118,7 @@ epoch time:138331.250, per step time:221.330, avg loss:3.917 ### 启动 -``` +```bash # 推理示例 python: GPU: python eval.py --dataset_path ~/imagenet/val/ --checkpoint_path mobilenet_199.ckpt --device_targe GPU @@ -129,13 +127,13 @@ epoch time:138331.250, per step time:221.330, avg loss:3.917 GPU: sh run_infer.sh GPU ~/imagenet/val/ ~/train/mobilenet-200_625.ckpt ``` -> 训练过程中可以生成检查点。 +> 训练过程中可以生成检查点。 ### 结果 -推理结果保存示例路径中,可以在`val.log`中找到如下结果: +推理结果保存示例路径中,可以在`val.log`中找到如下结果: -``` +```text result:{'acc':0.71976314102564111} ckpt=/path/to/checkpoint/mobilenet-200_625.ckpt ``` @@ -143,7 +141,7 @@ result:{'acc':0.71976314102564111} ckpt=/path/to/checkpoint/mobilenet-200_625.ck 修改`src/config.py`文件中的`export_mode`和`export_file`, 运行`export.py`。 -``` +```bash python export.py --device_target [PLATFORM] --checkpoint_path [CKPT_PATH] ``` @@ -173,8 +171,8 @@ python export.py --device_target [PLATFORM] --checkpoint_path [CKPT_PATH] # 随机情况说明 -[dataset.py](http://dataset.py/)中设置了“create_dataset”函数内的种子,同时还使用了train.py中的随机种子。 +`dataset.py`中设置了“create_dataset”函数内的种子,同时还使用了train.py中的随机种子。 # ModelZoo主页 - -请浏览官网[主页](https://gitee.com/mindspore/mindspore/tree/master/model_zoo)。 + +请浏览官网[主页](https://gitee.com/mindspore/mindspore/tree/master/model_zoo)。 diff --git a/model_zoo/official/cv/psenet/README_CN.md b/model_zoo/official/cv/psenet/README_CN.md index b94633f53f4..20e4dce55b1 100644 --- a/model_zoo/official/cv/psenet/README_CN.md +++ b/model_zoo/official/cv/psenet/README_CN.md @@ -52,7 +52,7 @@ - 框架 - [MindSpore](https://www.mindspore.cn/install) - 如需查看详情,请参见如下资源: - - [MindSpore教程](https://www.mindspore.cn/tutory/training/en/master/index.html) + - [MindSpore教程](https://www.mindspore.cn/tutorial/training/zh-CN/master/index.html) - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/zh-CN/master/index.html) - 安装Mindspore - 安装[pyblind11](https://github.com/pybind/pybind11) diff --git a/model_zoo/official/cv/resnet/README_CN.md b/model_zoo/official/cv/resnet/README_CN.md index cc62f0801b1..2b58b4fe78e 100755 --- a/model_zoo/official/cv/resnet/README_CN.md +++ b/model_zoo/official/cv/resnet/README_CN.md @@ -491,7 +491,7 @@ result:{'top_5_accuracy':0.9342589628681178, 'top_1_accuracy':0.768065781049936} # 随机情况说明 -[dataset.py](http://dataset.py/)中设置了“create_dataset”函数内的种子,同时还使用了train.py中的随机种子。 +`dataset.py`中设置了“create_dataset”函数内的种子,同时还使用了train.py中的随机种子。 # ModelZoo主页 diff --git a/model_zoo/official/cv/retinaface_resnet50/README.md b/model_zoo/official/cv/retinaface_resnet50/README.md index cfe9d62ec0a..08f8c1b1503 100644 --- a/model_zoo/official/cv/retinaface_resnet50/README.md +++ b/model_zoo/official/cv/retinaface_resnet50/README.md @@ -5,7 +5,7 @@ - [Pretrain Model](#pretrain-model) - [Dataset](#dataset) - [Environment Requirements](#environment-requirements) -- [Quick Start](#quick-start) +- [Quick Start](#quick-start) - [Script Description](#script-description) - [Script and Sample Code](#script-and-sample-code) - [Script Parameters](#script-parameters) @@ -22,10 +22,9 @@ - [Description of Random Situation](#description-of-random-situation) - [ModelZoo Homepage](#modelzoo-homepage) - # [RetinaFace Description](#contents) -Retinaface is a face detection model, which was proposed in 2019 and achieved the best results on the wideface dataset at that time. Retinaface, the full name of the paper is retinaface: single stage dense face localization in the wild. Compared with s3fd and mtcnn, it has a significant improvement, and has a higher recall rate for small faces. It is not good for multi-scale face detection. In order to solve these problems, retinaface feature pyramid structure is used for feature fusion between different scales, and SSH module is added. +Retinaface is a face detection model, which was proposed in 2019 and achieved the best results on the wideface dataset at that time. Retinaface, the full name of the paper is retinaface: single stage dense face localization in the wild. Compared with s3fd and mtcnn, it has a significant improvement, and has a higher recall rate for small faces. It is not good for multi-scale face detection. In order to solve these problems, retinaface feature pyramid structure is used for feature fusion between different scales, and SSH module is added. [Paper](https://arxiv.org/abs/1905.00641v2): Jiankang Deng, Jia Guo, Yuxiang Zhou, Jinke Yu, Irene Kotsia, Stefanos Zafeiriou. "RetinaFace: Single-stage Dense Face Localisation in the Wild". 2019. @@ -33,6 +32,7 @@ Retinaface is a face detection model, which was proposed in 2019 and achieved th Retinaface needs a resnet50 backbone to extract image features for detection. You could get resnet50 train script from our modelzoo and modify the pad structure of resnet50 according to resnet in ./src/network.py, Final train it on imagenet2012 to get resnet50 pretrain model. Steps: + 1. Get resnet50 train script from our modelzoo. 2. Modify the resnet50 architecture according to resnet in ```./src/network.py```.(You can also leave the structure of a unchanged, but the accuracy will be 2-3 percentage points lower.) 3. Train resnet50 on imagenet2012. @@ -41,47 +41,44 @@ Steps: Specifically, the retinaface network is based on retinanet. The feature pyramid structure of retinanet is used in the network, and SSH structure is added. Besides the traditional detection branch, the prediction branch of key points and self-monitoring branch are added in the network. The paper indicates that the two branches can improve the performance of the model. Here we do not implement the self-monitoring branch. - # [Dataset](#contents) -Dataset used: [WIDERFACE]() +Dataset used: [WIDERFACE](http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/WiderFace_Results.html) -Dataset acquisition: -1. Get the dataset and annotations from [here](). -2. Get the eval ground truth label from [here](). +Dataset acquisition: + +1. Get the dataset and annotations from [here](https://github.com/peteryuX/retinaface-tf2). +2. Get the eval ground truth label from [here](https://github.com/peteryuX/retinaface-tf2/tree/master/widerface_evaluate/ground_truth). - Dataset size:3.42G,32,203 colorful images - - Train:1.36G,12,800 images - - Val:345.95M,3,226 images - - Test:1.72G,16,177 images - + - Train:1.36G,12,800 images + - Val:345.95M,3,226 images + - Test:1.72G,16,177 images # [Environment Requirements](#contents) - Hardware(GPU) - - Prepare hardware environment with GPU processor. + - Prepare hardware environment with GPU processor. - Framework - - [MindSpore](https://www.mindspore.cn/install/en) + - [MindSpore](https://www.mindspore.cn/install/en) - For more information, please check the resources below: - - [MindSpore tutorials](https://www.mindspore.cn/tutorial/zh-CN/master/index.html) - - [MindSpore API](https://www.mindspore.cn/api/zh-CN/master/index.html) - - + - [MindSpore tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html) + - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html) # [Quick Start](#contents) -After installing MindSpore via the official website and download the dataset, you can start training and evaluation as follows: +After installing MindSpore via the official website and download the dataset, you can start training and evaluation as follows: - running on GPU - + ```python # run training example export CUDA_VISIBLE_DEVICES=0 - python train.py > train.log 2>&1 & - + python train.py > train.log 2>&1 & + # run distributed training example bash scripts/run_distribute_gpu_train.sh 4 0,1,2,3 - + # run evaluation example export CUDA_VISIBLE_DEVICES=0 python eval.py > eval.log 2>&1 & @@ -89,34 +86,32 @@ After installing MindSpore via the official website and download the dataset, yo bash run_standalone_gpu_eval.sh 0 ``` - - # [Script Description](#contents) ## [Script and Sample Code](#contents) -``` +```text ├── model_zoo ├── README.md // descriptions about all the models - ├── retinaface + ├── retinaface ├── README.md // descriptions about googlenet - ├── scripts + ├── scripts │ ├──run_distribute_gpu_train.sh // shell script for distributed on GPU │ ├──run_standalone_gpu_eval.sh // shell script for evaluation on GPU - ├── src + ├── src │ ├──dataset.py // creating dataset │ ├──network.py // retinaface architecture - │ ├──config.py // parameter configuration - │ ├──augmentation.py // data augment method - │ ├──loss.py // loss function + │ ├──config.py // parameter configuration + │ ├──augmentation.py // data augment method + │ ├──loss.py // loss function │ ├──utils.py // data preprocessing │ ├──lr_schedule.py // learning rate schedule - ├── data + ├── data │ ├──widerface // dataset data │ ├──resnet50_pretrain.ckpt // resnet50 imagenet pretrain model │ ├──ground_truth // eval label - ├── train.py // training script - ├── eval.py // evaluation script + ├── train.py // training script + ├── eval.py // evaluation script ``` ## [Script Parameters](#contents) @@ -163,39 +158,36 @@ Parameters for both training and evaluation can be set in config.py 'val_nms_threshold': 0.4, # Threshold for val NMS 'val_iou_threshold': 0.5, # Threshold for val IOU 'val_save_result': False, # Whether save the resultss - 'val_predict_save_folder': './widerface_result', # Result save path + 'val_predict_save_folder': './widerface_result', # Result save path 'val_gt_dir': './data/ground_truth/', # Path of val set ground_truth ``` - ## [Training Process](#contents) -### Training +### Training - running on GPU - ``` + ```bash export CUDA_VISIBLE_DEVICES=0 - python train.py > train.log 2>&1 & + python train.py > train.log 2>&1 & ``` The python command above will run in the background, you can view the results through the file `train.log`. - - After training, you'll get some checkpoint files under the folder `./checkpoint/` by default. + After training, you'll get some checkpoint files under the folder `./checkpoint/` by default. ### Distributed Training - running on GPU - ``` + ```bash bash scripts/run_distribute_gpu_train.sh 4 0,1,2,3 ``` - - The above shell script will run distribute training in the background. You can view the results through the file `train/train.log`. - - After training, you'll get some checkpoint files under the folder `./checkpoint/ckpt_0/` by default. + The above shell script will run distribute training in the background. You can view the results through the file `train/train.log`. + + After training, you'll get some checkpoint files under the folder `./checkpoint/ckpt_0/` by default. ## [Evaluation Process](#contents) @@ -204,15 +196,15 @@ Parameters for both training and evaluation can be set in config.py - evaluation on WIDERFACE dataset when running on GPU Before running the command below, please check the checkpoint path used for evaluation. Please set the checkpoint path to be the absolute full path in src/config.py, e.g., "username/retinaface/checkpoint/ckpt_0/RetinaFace-100_402.ckpt". - - ``` + + ```bash export CUDA_VISIBLE_DEVICES=0 python eval.py > eval.log 2>&1 & ``` - + The above python command will run in the background. You can view the results through the file "eval.log". The result of the test dataset will be as follows: - - ``` + + ```text # grep "Val AP" eval.log Easy Val AP : 0.9422 Medium Val AP : 0.9325 @@ -221,28 +213,26 @@ Parameters for both training and evaluation can be set in config.py OR, - ``` + ```bash bash run_standalone_gpu_eval.sh 0 ``` - + The above python command will run in the background. You can view the results through the file "eval/eval.log". The result of the test dataset will be as follows: - - ``` + + ```text # grep "Val AP" eval.log Easy Val AP : 0.9422 Medium Val AP : 0.9325 Hard Val AP : 0.8900 ``` - - - # [Model Description](#contents) + ## [Performance](#contents) -### Evaluation Performance +### Evaluation Performance -| Parameters | GPU | +| Parameters | GPU | | -------------------------- | -------------------------------------------------------------| | Model Version | RetinaFace + Resnet50 | | Resource | NV SMX2 V100-16G | @@ -260,17 +250,16 @@ Parameters for both training and evaluation can be set in config.py | Checkpoint for Fine tuning | 336.3M (.ckpt file) | | Scripts | [retinaface script](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/retinaface) | - - ## [How to use](#contents) -### Continue Training on the Pretrained Model + +### Continue Training on the Pretrained Model - running on GPU - ``` + ```python # Load dataset ds_train = create_dataset(training_dataset, cfg, batch_size, multiprocessing=True, num_worker=cfg['num_workers']) - + # Define model multibox_loss = MultiBoxLoss(num_classes, cfg['num_anchor'], negative_ratio, cfg['batch_size']) lr = adjust_learning_rate(initial_lr, gamma, stepvalues, steps_per_epoch, max_epoch, warmup_epoch=cfg['warmup_epoch']) @@ -278,24 +267,24 @@ Parameters for both training and evaluation can be set in config.py weight_decay=weight_decay, loss_scale=1) backbone = resnet50(1001) net = RetinaFace(phase='train', backbone=backbone) - + # Continue training if resume_net is not None pretrain_model_path = cfg['resume_net'] param_dict_retinaface = load_checkpoint(pretrain_model_path) load_param_into_net(net, param_dict_retinaface) - + net = RetinaFaceWithLossCell(net, multibox_loss, cfg) net = TrainingWrapper(net, opt) - + model = Model(net) - - # Set callbacks + + # Set callbacks config_ck = CheckpointConfig(save_checkpoint_steps=cfg['save_checkpoint_steps'], keep_checkpoint_max=cfg['keep_checkpoint_max']) ckpoint_cb = ModelCheckpoint(prefix="RetinaFace", directory=cfg['ckpt_path'], config=config_ck) time_cb = TimeMonitor(data_size=ds_train.get_dataset_size()) callback_list = [LossMonitor(), time_cb, ckpoint_cb] - + # Start training model.train(max_epoch, ds_train, callbacks=callback_list, dataset_sink_mode=False) @@ -305,6 +294,6 @@ Parameters for both training and evaluation can be set in config.py In train.py, we set the seed with setup_seed function. +# [ModelZoo Homepage](#contents) -# [ModelZoo Homepage](#contents) Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo). diff --git a/model_zoo/official/cv/retinaface_resnet50/README_CN.md b/model_zoo/official/cv/retinaface_resnet50/README_CN.md index 3d91be423cf..a76bec94437 100644 --- a/model_zoo/official/cv/retinaface_resnet50/README_CN.md +++ b/model_zoo/official/cv/retinaface_resnet50/README_CN.md @@ -67,8 +67,8 @@ RetinaFace使用ResNet50骨干提取图像特征进行检测。从ModelZoo获取 - 框架 - [MindSpore](https://www.mindspore.cn/install) - 如需查看详情,请参见如下资源: - - [MindSpore教程](https://www.mindspore.cn/tutorial/zh-CN/master/index.html) - - [MindSpore Python API](https://www.mindspore.cn/api/zh-CN/master/index.html) + - [MindSpore教程](https://www.mindspore.cn/tutorial/training/zh-CN/master/index.html) + - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/zh-CN/master/index.html) # 快速入门 diff --git a/model_zoo/official/cv/simple_pose/README.md b/model_zoo/official/cv/simple_pose/README.md index 76de9c3a196..1a3e2bb8a16 100644 --- a/model_zoo/official/cv/simple_pose/README.md +++ b/model_zoo/official/cv/simple_pose/README.md @@ -53,7 +53,7 @@ Dataset used: COCO2017 ## [Mixed Precision](#contents) -The [mixed precision](https://www.mindspore.cn/tutorial/zh-CN/master/advanced_use/mixed_precision.html) training method accelerates the deep learning neural network training process by using both the single-precision and half-precision data formats, and maintains the network precision achieved by the single-precision training at the same time. Mixed precision training can accelerate the computation process, reduce memory usage, and enable a larger model or batch size to be trained on specific hardware. For FP16 operators, if the input data type is FP32, the backend of MindSpore will automatically handle it with reduced precision. Users could check the reduced-precision operators by enabling INFO log and then searching ‘reduce precision’. +The [mixed precision](https://www.mindspore.cn/tutorial/training/en/master/advanced_use/enable_mixed_precision.html) training method accelerates the deep learning neural network training process by using both the single-precision and half-precision data formats, and maintains the network precision achieved by the single-precision training at the same time. Mixed precision training can accelerate the computation process, reduce memory usage, and enable a larger model or batch size to be trained on specific hardware. For FP16 operators, if the input data type is FP32, the backend of MindSpore will automatically handle it with reduced precision. Users could check the reduced-precision operators by enabling INFO log and then searching ‘reduce precision’. # [Environment Requirements](#contents) @@ -68,8 +68,8 @@ To run the python scripts in the repository, you need to prepare the environment - opencv-python 4.3.0.36 - pycocotools 2.0 - For more information, please check the resources below: - - [MindSpore tutorials](https://www.mindspore.cn/tutorial/zh-CN/master/index.html) - - [MindSpore API](https://www.mindspore.cn/api/zh-CN/master/index.html) + - [MindSpore tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html) + - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html) # [Quick Start](#contents) diff --git a/model_zoo/official/cv/yolov4/README.MD b/model_zoo/official/cv/yolov4/README.MD index af5effb23f1..29110f63235 100644 --- a/model_zoo/official/cv/yolov4/README.MD +++ b/model_zoo/official/cv/yolov4/README.MD @@ -20,10 +20,10 @@ - [Inference Performance](#inference-performance) - [ModelZoo Homepage](#modelzoo-homepage) - # [YOLOv4 Description](#contents) + YOLOv4 is a state-of-the-art detector which is faster (FPS) and more accurate (MS COCO AP50...95 and AP50) than all available alternative detectors. -YOLOv4 has verified a large number of features, and selected for use such of them for improving the accuracy of both the classifier and the detector. +YOLOv4 has verified a large number of features, and selected for use such of them for improving the accuracy of both the classifier and the detector. These features can be used as best-practice for future studies and developments. [Paper](https://arxiv.org/pdf/2004.10934.pdf): @@ -39,7 +39,8 @@ Dataset support: [MS COCO] or datasetd with the same format as MS COCO Annotation support: [MS COCO] or annotation as the same format as MS COCO - The directory structure is as follows, the name of directory and file is user define: - ``` + + ```text ├── dataset ├── YOLOv4 ├── annotations @@ -55,23 +56,25 @@ Annotation support: [MS COCO] or annotation as the same format as MS COCO └─picturen.jpg ``` + we suggest user to use MS COCO dataset to experience our model, other datasets need to use the same format as MS COCO. # [Environment Requirements](#contents) - Hardware(Ascend) - - Prepare hardware environment with Ascend processor. If you want to try Ascend, please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources. + - Prepare hardware environment with Ascend processor. If you want to try Ascend, please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources. - Framework - - [MindSpore](https://www.mindspore.cn/) + - [MindSpore](https://www.mindspore.cn/install/en) - For more information, please check the resources below: - - [MindSpore tutorials](https://www.mindspore.cn/tutorial/training/zh-CN/master/index.html) - - [MindSpore API](https://www.mindspore.cn/doc/api_python/zh-CN/master/index.html) + - [MindSpore tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html) + - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html) # [Quick Start](#contents) After installing MindSpore via the official website, you can start training and evaluation as follows: -``` + +```text # The cspdarknet53_backbone.ckpt in the follow script is got from cspdarknet53 training like paper. # The parameter of training_shape define image shape for network, default is [416, 416], @@ -88,7 +91,7 @@ After installing MindSpore via the official website, you can start training and # It means use 11 kinds of shape as input shape, or it can be set some kind of shape. ``` -``` +```bash #run training example(1p) by python command python train.py \ --data_dir=./dataset/xxx \ @@ -102,17 +105,17 @@ python train.py \ --lr_scheduler=cosine_annealing > log.txt 2>&1 & ``` -``` +```bash # standalone training example(1p) by shell script sh run_standalone_train.sh dataset/xxx cspdarknet53_backbone.ckpt ``` -``` +```bash # For Ascend device, distributed training example(8p) by shell script sh run_distribute_train.sh dataset/xxx cspdarknet53_backbone.ckpt rank_table_8p.json ``` -``` +```bash # run evaluation by python command python eval.py \ --data_dir=./dataset/xxx \ @@ -120,7 +123,7 @@ python eval.py \ --testing_shape=416 > log.txt 2>&1 & ``` -``` +```bash # run evaluation by shell script sh run_eval.sh dataset/xxx checkpoint/xxx.ckpt ``` @@ -128,11 +131,12 @@ sh run_eval.sh dataset/xxx checkpoint/xxx.ckpt # [Script Description](#contents) ## [Script and Sample Code](#contents) -``` -└─yolov4 + +```text +└─yolov4 ├─README.md ├─mindspore_hub_conf.py # config for mindspore hub - ├─scripts + ├─scripts ├─run_standalone_train.sh # launch standalone training(1p) in ascend ├─run_distribute_train.sh # launch distributed training(8p) in ascend └─run_eval.sh # launch evaluating in ascend @@ -151,15 +155,17 @@ sh run_eval.sh dataset/xxx checkpoint/xxx.ckpt ├─util.py # util function ├─yolo.py # yolov4 network ├─yolo_dataset.py # create dataset for YOLOV4 - + ├─eval.py # evaluate val results ├─test.py# # evaluate test results └─train.py # train net ``` ## [Script Parameters](#contents) + Major parameters train.py as follows: -``` + +```text optional arguments: -h, --help show this help message and exit --device_target device where the code will be implemented: "Ascend", default is "Ascend" @@ -219,16 +225,21 @@ optional arguments: ``` ## [Training Process](#contents) -YOLOv4 can be trained from the scratch or with the backbone named cspdarknet53. + +YOLOv4 can be trained from the scratch or with the backbone named cspdarknet53. Cspdarknet53 is a classifier which can be trained on some dataset like ImageNet(ILSVRC2012). -It is easy for users to train Cspdarknet53. Just replace the backbone of Classifier Resnet50 with cspdarknet53. +It is easy for users to train Cspdarknet53. Just replace the backbone of Classifier Resnet50 with cspdarknet53. Resnet50 is easy to get in mindspore model zoo. + ### Training + For Ascend device, standalone training example(1p) by shell script -``` + +```bash sh run_standalone_train.sh dataset/coco2017 cspdarknet53_backbone.ckpt ``` -``` + +```text python train.py \ --data_dir=/dataset/xxx \ --pretrained_backbone=cspdarknet53_backbone.ckpt \ @@ -240,10 +251,12 @@ python train.py \ --training_shape=416 \ --lr_scheduler=cosine_annealing > log.txt 2>&1 & ``` + The python command above will run in the background, you can view the results through the file log.txt. After training, you'll get some checkpoint files under the outputs folder by default. The loss value will be achieved as follows: -``` + +```text # grep "loss:" train/log.txt 2020-10-16 15:00:37,483:INFO:epoch[0], iter[0], loss:8248.610352, 0.03 imgs/sec, lr:2.0466639227834094e-07 @@ -259,13 +272,16 @@ After training, you'll get some checkpoint files under the outputs folder by def ``` ### Distributed Training + For Ascend device, distributed training example(8p) by shell script -``` + +```bash sh run_distribute_train.sh dataset/coco2017 cspdarknet53_backbone.ckpt rank_table_8p.json ``` The above shell script will run distribute training in the background. You can view the results through the file train_parallel[X]/log.txt. The loss value will be achieved as follows: -``` + +```text # distribute training result(8p, shape=416) ... 2020-10-16 14:58:25,142:INFO:epoch[0], iter[1000], loss:242.509259, 388.73 imgs/sec, lr:0.00032783843926154077 @@ -286,7 +302,7 @@ The above shell script will run distribute training in the background. You can v ``` -``` +```text # distribute training result(8p, dynamic shape) ... 2020-10-16 20:40:17,148:INFO:epoch[0], iter[800], loss:283.765033, 248.93 imgs/sec, lr:0.00026233625249005854 @@ -305,12 +321,11 @@ The above shell script will run distribute training in the background. You can v ... ``` - ## [Evaluation Process](#contents) ### Valid -``` +```bash python eval.py \ --data_dir=./dataset/coco2017 \ --pretrained=yolov4.ckpt \ @@ -320,7 +335,8 @@ sh run_eval.sh dataset/coco2017 checkpoint/yolov4.ckpt ``` The above python command will run in the background. You can view the results through the file "log.txt". The mAP of the test dataset will be as follows: -``` + +```text # log.txt =============coco eval reulst========= Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.442 @@ -336,8 +352,10 @@ The above python command will run in the background. You can view the results th Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.638 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.717 ``` + ### Test-dev -``` + +```bash python test.py \ --data_dir=./dataset/coco2017 \ --pretrained=yolov4.ckpt \ @@ -345,11 +363,13 @@ python test.py \ OR sh run_test.sh dataset/coco2017 checkpoint/yolov4.ckpt ``` + The predict_xxx.json will be found in test/outputs/%Y-%m-%d_time_%H_%M_%S/. Rename the file predict_xxx.json to detections_test-dev2017_yolov4_results.json and compress it to detections_test-dev2017_yolov4_results.zip -Submit file detections_test-dev2017_yolov4_results.zip to the MS COCO evaluation server for the test-dev2019 (bbox) https://competitions.codalab.org/competitions/20794#participate +Submit file detections_test-dev2017_yolov4_results.zip to the MS COCO evaluation server for the test-dev2019 (bbox) You will get such results in the end of file View scoring output log. -``` + +```text overall performance Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.447 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.642 @@ -364,9 +384,11 @@ overall performance Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.627 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.711 ``` + ## [Convert Process](#contents) ### Convert + If you want to infer the network on Ascend 310, you should convert the model to AIR: ```python @@ -378,6 +400,7 @@ python src/export.py --pretrained=[PRETRAINED_BACKBONE] --batch_size=[BATCH_SIZE ## [Performance](#contents) ### Evaluation Performance + YOLOv4 on 118K images(The annotation and data format must be the same as coco2017) | Parameters | YOLOv4 | @@ -394,9 +417,10 @@ YOLOv4 on 118K images(The annotation and data format must be the same as coco201 | Speed | 1p 53FPS 8p 390FPS(shape=416) 220FPS(dynamic shape) | | Total time | 48h(dynamic shape) | | Checkpoint for Fine tuning | about 500M (.ckpt file) | -| Scripts | https://gitee.com/mindspore/mindspore/tree/master/model_zoo/ | +| Scripts | | ### Inference Performance + YOLOv4 on 20K images(The annotation and data format must be the same as coco test2017 ) | Parameters | YOLOv4 | @@ -416,4 +440,5 @@ In dataset.py, we set the seed inside ```create_dataset``` function. In var_init.py, we set seed for weight initilization # [ModelZoo Homepage](#contents) + Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo). diff --git a/model_zoo/official/nlp/gnmt_v2/README.md b/model_zoo/official/nlp/gnmt_v2/README.md index 64d2745073b..618ab203ba6 100644 --- a/model_zoo/official/nlp/gnmt_v2/README.md +++ b/model_zoo/official/nlp/gnmt_v2/README.md @@ -52,7 +52,7 @@ Note that you can run the scripts based on the dataset mentioned in original pap - Install [MindSpore](https://www.mindspore.cn/install/en). - For more information, please check the resources below: - [MindSpore tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html) - - [MindSpore API](https://www.mindspore.cn/doc/api_python/en/master/index.html) + - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html) ## Software diff --git a/model_zoo/official/nlp/prophetnet/README.md b/model_zoo/official/nlp/prophetnet/README.md index 3361f459bc4..141b008584f 100644 --- a/model_zoo/official/nlp/prophetnet/README.md +++ b/model_zoo/official/nlp/prophetnet/README.md @@ -550,8 +550,8 @@ The comparisons between MASS and other baseline methods in terms of PPL on Corne - Framework - [MindSpore](https://www.mindspore.cn/install/en) - For more information, please check the resources below: - - [MindSpore tutorials](https://www.mindspore.cn/tutorial/zh-CN/master/index.html) - - [MindSpore API](https://www.mindspore.cn/api/zh-CN/master/index.html) + - [MindSpore tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html) + - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html) ## Requirements @@ -562,7 +562,7 @@ subword-nmt rouge ``` - + # Get started @@ -624,7 +624,7 @@ Get the log and output files under the path `./train_mass_*/`, and the model fil ## Inference -If you need to use the trained model to perform inference on multiple hardware platforms, such as GPU, Ascend 910 or Ascend 310, you can refer to this [Link](https://www.mindspore.cn/tutorial/zh-CN/master/advanced_use/network_migration.html). +If you need to use the trained model to perform inference on multiple hardware platforms, such as GPU, Ascend 910 or Ascend 310, you can refer to this [Link](https://www.mindspore.cn/tutorial/training/en/master/advanced_use/migrate_3rd_scripts.html). For inference, config the options in `config.json` firstly: - Assign the `test_dataset` under `dataset_config` node to the dataset path. diff --git a/model_zoo/official/nlp/tinybert/README.md b/model_zoo/official/nlp/tinybert/README.md index 2ad53774c27..840d3dc0033 100644 --- a/model_zoo/official/nlp/tinybert/README.md +++ b/model_zoo/official/nlp/tinybert/README.md @@ -1,4 +1,5 @@ # Contents + - [Contents](#contents) - [TinyBERT Description](#tinybert-description) - [Model Architecture](#model-architecture) @@ -6,58 +7,64 @@ - [Environment Requirements](#environment-requirements) - [Quick Start](#quick-start) - [Script Description](#script-description) - - [Script and Sample Code](#script-and-sample-code) - - [Script Parameters](#script-parameters) - - [General Distill](#general-distill) - - [Task Distill](#task-distill) - - [Options and Parameters](#options-and-parameters) - - [Options:](#options) - - [Parameters:](#parameters) - - [Training Process](#training-process) - - [Training](#training) - - [running on Ascend](#running-on-ascend) - - [running on GPU](#running-on-gpu) - - [Distributed Training](#distributed-training) - - [running on Ascend](#running-on-ascend-1) - - [running on GPU](#running-on-gpu-1) - - [Evaluation Process](#evaluation-process) - - [Evaluation](#evaluation) - - [evaluation on SST-2 dataset](#evaluation-on-sst-2-dataset) - - [evaluation on MNLI dataset](#evaluation-on-mnli-dataset) - - [evaluation on QNLI dataset](#evaluation-on-qnli-dataset) - - [Model Description](#model-description) - - [Performance](#performance) - - [training Performance](#training-performance) - - [Inference Performance](#inference-performance) + - [Script and Sample Code](#script-and-sample-code) + - [Script Parameters](#script-parameters) + - [General Distill](#general-distill) + - [Task Distill](#task-distill) + - [Options and Parameters](#options-and-parameters) + - [Options:](#options) + - [Parameters:](#parameters) + - [Training Process](#training-process) + - [Training](#training) + - [running on Ascend](#running-on-ascend) + - [running on GPU](#running-on-gpu) + - [Distributed Training](#distributed-training) + - [running on Ascend](#running-on-ascend-1) + - [running on GPU](#running-on-gpu-1) + - [Evaluation Process](#evaluation-process) + - [Evaluation](#evaluation) + - [evaluation on SST-2 dataset](#evaluation-on-sst-2-dataset) + - [evaluation on MNLI dataset](#evaluation-on-mnli-dataset) + - [evaluation on QNLI dataset](#evaluation-on-qnli-dataset) + - [Model Description](#model-description) + - [Performance](#performance) + - [training Performance](#training-performance) + - [Inference Performance](#inference-performance) - [Description of Random Situation](#description-of-random-situation) - [ModelZoo Homepage](#modelzoo-homepage) # [TinyBERT Description](#contents) + [TinyBERT](https://github.com/huawei-noah/Pretrained-Language-Model/tree/master/TinyBERT) is 7.5x smalller and 9.4x faster on inference than [BERT-base](https://github.com/google-research/bert) (the base version of BERT model) and achieves competitive performances in the tasks of natural language understanding. It performs a novel transformer distillation at both the pre-training and task-specific learning stages. -[Paper](https://arxiv.org/abs/1909.10351): Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu. [TinyBERT: Distilling BERT for Natural Language Understanding](https://arxiv.org/abs/1909.10351). arXiv preprint arXiv:1909.10351. +[Paper](https://arxiv.org/abs/1909.10351): Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu. [TinyBERT: Distilling BERT for Natural Language Understanding](https://arxiv.org/abs/1909.10351). arXiv preprint arXiv:1909.10351. # [Model Architecture](#contents) + The backbone structure of TinyBERT is transformer, the transformer contains four encoder modules, one encoder contains one selfattention module and one selfattention module contains one attention module. # [Dataset](#contents) + - Download the zhwiki or enwiki dataset for general distillation. Extract and clean text in the dataset with [WikiExtractor](https://github.com/attardi/wikiextractor). Convert the dataset to TFRecord format, please refer to create_pretraining_data.py which in [BERT](https://github.com/google-research/bert) repository. - Download glue dataset for task distillation. Convert dataset files from json format to tfrecord format, please refer to run_classifier.py which in [BERT](https://github.com/google-research/bert) repository. # [Environment Requirements](#contents) + - Hardware(Ascend/GPU) - - Prepare hardware environment with Ascend or GPU processor. If you want to try Ascend, please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources. + - Prepare hardware environment with Ascend or GPU processor. If you want to try Ascend, please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources. - Framework - - [MindSpore](https://gitee.com/mindspore/mindspore) + - [MindSpore](https://gitee.com/mindspore/mindspore) - For more information, please check the resources below: - - [MindSpore Tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html) - - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html) + - [MindSpore Tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html) + - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html) # [Quick Start](#contents) + After installing MindSpore via the official website, you can start general distill, task distill and evaluation as follows: -```bash + +```text # run standalone general distill example -bash scripts/run_standalone_gd.sh +bash scripts/run_standalone_gd.sh Before running the shell script, please set the `load_teacher_ckpt_path`, `data_dir`, `schema_dir` and `dataset_type` in the run_standalone_gd.sh file first. If running on GPU, please set the `device_target=GPU`. @@ -70,7 +77,7 @@ Before running the shell script, please set the `load_teacher_ckpt_path`, `data_ bash scripts/run_distributed_gd_gpu.sh 8 1 /path/data/ /path/schema.json /path/teacher.ckpt # run task distill and evaluation example -bash scripts/run_standalone_td.sh +bash scripts/run_standalone_td.sh Before running the shell script, please set the `task_name`, `load_teacher_ckpt_path`, `load_gd_ckpt_path`, `train_data_dir`, `eval_data_dir`, `schema_dir` and `dataset_type` in the run_standalone_td.sh file first. If running on GPU, please set the `device_target=GPU`. @@ -80,39 +87,41 @@ For distributed training on Ascend, a hccl configuration file with JSON format n Please follow the instructions in the link below: https:gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools. -For dataset, if you want to set the format and parameters, a schema configuration file with JSON format needs to be created, please refer to [tfrecord](https://www.mindspore.cn/doc/programming_guide/zh-CN/master/dataset_loading.html#tfrecord) format. -``` +For dataset, if you want to set the format and parameters, a schema configuration file with JSON format needs to be created, please refer to [tfrecord](https://www.mindspore.cn/doc/programming_guide/en/master/dataset_loading.html#tfrecord) format. + +```text For general task, schema file contains ["input_ids", "input_mask", "segment_ids"]. -For task distill and eval phase, schema file contains ["input_ids", "input_mask", "segment_ids", "label_ids"]. +For task distill and eval phase, schema file contains ["input_ids", "input_mask", "segment_ids", "label_ids"]. `numRows` is the only option which could be set by user, the others value must be set according to the dataset. For example, the dataset is cn-wiki-128, the schema file for general distill phase as following: { - "datasetType": "TF", - "numRows": 7680, - "columns": { - "input_ids": { - "type": "int64", - "rank": 1, - "shape": [256] - }, - "input_mask": { - "type": "int64", - "rank": 1, - "shape": [256] - }, - "segment_ids": { - "type": "int64", - "rank": 1, - "shape": [256] - } - } + "datasetType": "TF", + "numRows": 7680, + "columns": { + "input_ids": { + "type": "int64", + "rank": 1, + "shape": [256] + }, + "input_mask": { + "type": "int64", + "rank": 1, + "shape": [256] + }, + "segment_ids": { + "type": "int64", + "rank": 1, + "shape": [256] + } + } } ``` # [Script Description](#contents) + ## [Script and Sample Code](#contents) ```shell @@ -134,19 +143,21 @@ For example, the dataset is cn-wiki-128, the schema file for general distill pha ├─tinybert_model.py # backbone code of network ├─utils.py # util function ├─__init__.py - ├─run_general_distill.py # train net for general distillation - ├─run_task_distill.py # train and eval net for task distillation + ├─run_general_distill.py # train net for general distillation + ├─run_task_distill.py # train and eval net for task distillation ``` ## [Script Parameters](#contents) + ### General Distill -``` -usage: run_general_distill.py [--distribute DISTRIBUTE] [--epoch_size N] [----device_num N] [--device_id N] + +```text +usage: run_general_distill.py [--distribute DISTRIBUTE] [--epoch_size N] [----device_num N] [--device_id N] [--device_target DEVICE_TARGET] [--do_shuffle DO_SHUFFLE] - [--enable_data_sink ENABLE_DATA_SINK] [--data_sink_steps N] + [--enable_data_sink ENABLE_DATA_SINK] [--data_sink_steps N] [--save_ckpt_path SAVE_CKPT_PATH] [--load_teacher_ckpt_path LOAD_TEACHER_CKPT_PATH] - [--save_checkpoint_step N] [--max_ckpt_num N] + [--save_checkpoint_step N] [--max_ckpt_num N] [--data_dir DATA_DIR] [--schema_dir SCHEMA_DIR] [--dataset_type DATASET_TYPE] [train_steps N] options: @@ -155,7 +166,7 @@ options: --epoch_size epoch size: N, default is 1 --device_id device id: N, default is 0 --device_num number of used devices: N, default is 1 - --save_ckpt_path path to save checkpoint files: PATH, default is "" + --save_ckpt_path path to save checkpoint files: PATH, default is "" --max_ckpt_num max number for saving checkpoint files: N, default is 1 --do_shuffle enable shuffle: "true" | "false", default is "true" --enable_data_sink enable data sink: "true" | "false", default is "true" @@ -166,14 +177,15 @@ options: --schema_dir path to schema.json file, PATH, default is "" --dataset_type the dataset type which can be tfrecord/mindrecord, default is tfrecord ``` - + ### Task Distill -``` -usage: run_general_task.py [--device_target DEVICE_TARGET] [--do_train DO_TRAIN] [--do_eval DO_EVAL] - [--td_phase1_epoch_size N] [--td_phase2_epoch_size N] + +```text +usage: run_general_task.py [--device_target DEVICE_TARGET] [--do_train DO_TRAIN] [--do_eval DO_EVAL] + [--td_phase1_epoch_size N] [--td_phase2_epoch_size N] [--device_id N] [--do_shuffle DO_SHUFFLE] - [--enable_data_sink ENABLE_DATA_SINK] [--save_ckpt_step N] - [--max_ckpt_num N] [--data_sink_steps N] + [--enable_data_sink ENABLE_DATA_SINK] [--save_ckpt_step N] + [--max_ckpt_num N] [--data_sink_steps N] [--load_teacher_ckpt_path LOAD_TEACHER_CKPT_PATH] [--load_gd_ckpt_path LOAD_GD_CKPT_PATH] [--load_td1_ckpt_path LOAD_TD1_CKPT_PATH] @@ -188,8 +200,8 @@ options: --td_phase1_epoch_size epoch size for td phase1: N, default is 10 --td_phase2_epoch_size epoch size for td phase2: N, default is 3 --device_id device id: N, default is 0 - --do_shuffle enable shuffle: "true" | "false", default is "true" - --enable_data_sink enable data sink: "true" | "false", default is "true" + --do_shuffle enable shuffle: "true" | "false", default is "true" + --enable_data_sink enable data sink: "true" | "false", default is "true" --save_ckpt_step steps for saving checkpoint files: N, default is 1000 --max_ckpt_num max number for saving checkpoint files: N, default is 1 --data_sink_steps set data sink steps: N, default is 1 @@ -204,14 +216,17 @@ options: ``` ## Options and Parameters + `gd_config.py` and `td_config.py` contain parameters of BERT model and options for optimizer and lossscale. -### Options: -``` + +### Options + +```text batch_size batch size of input dataset: N, default is 16 Parameters for lossscale: loss_scale_value initial value of loss scale: N, default is 2^8 scale_factor factor used to update loss scale: N, default is 2 - scale_window steps for once updatation of loss scale: N, default is 50 + scale_window steps for once updatation of loss scale: N, default is 50 Parameters for optimizer: learning_rate value of learning rate: Q @@ -221,8 +236,9 @@ Parameters for optimizer: eps term added to the denominator to improve numerical stability: Q ``` -### Parameters: -``` +### Parameters + +```text Parameters for bert network: seq_length length of input sequence: N, default is 128 vocab_size size of each embedding vector: N, must be consistant with the dataset you use. Default is 30522 @@ -242,15 +258,22 @@ Parameters for bert network: dtype data type of input: mstype.float16 | mstype.float32, default is mstype.float32 compute_type compute type in BertTransformer: mstype.float16 | mstype.float32, default is mstype.float16 ``` + ## [Training Process](#contents) + ### Training + #### running on Ascend + Before running the command below, please check `load_teacher_ckpt_path`, `data_dir` and `schma_dir` has been set. Please set the path to be the absolute full path, e.g:"/username/checkpoint_100_300.ckpt". -``` + +```bash bash scripts/run_standalone_gd.sh ``` + The command above will run in the background, you can view the results the file log.txt. After training, you will get some checkpoint files under the script folder by default. The loss value will be achieved as follows: -``` + +```text # grep "epoch" log.txt epoch: 1, step: 100, outpus are (Tensor(shape=[1], dtype=Float32, 28.2093), Tensor(shape=[], dtype=Bool, False), Tensor(shape=[], dtype=Float32, 65536)) epoch: 2, step: 200, outpus are (Tensor(shape=[1], dtype=Float32, 30.1724), Tensor(shape=[], dtype=Bool, False), Tensor(shape=[], dtype=Float32, 65536)) @@ -260,25 +283,34 @@ epoch: 2, step: 200, outpus are (Tensor(shape=[1], dtype=Float32, 30.1724), Tens > **Attention** This will bind the processor cores according to the `device_num` and total processor numbers. If you don't expect to run pretraining with binding processor cores, remove the operations about `taskset` in `scripts/run_distributed_gd_ascend.sh` #### running on GPU + Before running the command below, please check `load_teacher_ckpt_path`, `data_dir` `schma_dir` and `device_target=GPU` has been set. Please set the path to be the absolute full path, e.g:"/username/checkpoint_100_300.ckpt". -``` + +```bash bash scripts/run_standalone_gd.sh ``` + The command above will run in the background, you can view the results the file log.txt. After training, you will get some checkpoint files under the script folder by default. The loss value will be achieved as follows: -``` + +```text # grep "epoch" log.txt epoch: 1, step: 100, outpus are 28.2093 ... ``` ### Distributed Training + #### running on Ascend + Before running the command below, please check `load_teacher_ckpt_path`, `data_dir` and `schma_dir` has been set. Please set the path to be the absolute full path, e.g:"/username/checkpoint_100_300.ckpt". -``` + +```bash bash scripts/run_distributed_gd_ascend.sh 8 1 /path/hccl.json ``` + The command above will run in the background, you can view the results the file log.txt. After training, you will get some checkpoint files under the LOG* folder by default. The loss value will be achieved as follows: -``` + +```text # grep "epoch" LOG*/log.txt epoch: 1, step: 100, outpus are (Tensor(shape=[1], dtype=Float32, 28.1478), Tensor(shape=[], dtype=Bool, False), Tensor(shape=[], dtype=Float32, 65536)) ... @@ -287,25 +319,35 @@ epoch: 1, step: 100, outpus are (Tensor(shape=[1], dtype=Float32, 30.5901), Tens ``` #### running on GPU + Please input the path to be the absolute full path, e.g:"/username/checkpoint_100_300.ckpt". -``` + +```bash bash scripts/run_distributed_gd_gpu.sh 8 1 /path/data/ /path/schema.json /path/teacher.ckpt ``` + The command above will run in the background, you can view the results the file log.txt. After training, you will get some checkpoint files under the LOG* folder by default. The loss value will be achieved as follows: -``` + +```text # grep "epoch" LOG*/log.txt epoch: 1, step: 1, outpus are 63.4098 ... ``` ## [Evaluation Process](#contents) + ### Evaluation + If you want to after running and continue to eval, please set `do_train=true` and `do_eval=true`, If you want to run eval alone, please set `do_train=false` and `do_eval=true`. If running on GPU, please set `device_target=GPU`. + #### evaluation on SST-2 dataset -``` + +```bash bash scripts/run_standalone_td.sh ``` -The command above will run in the background, you can view the results the file log.txt. The accuracy of the test dataset will be as follows: + +The command above will run in the background, you can view the results the file log.txt. The accuracy of the test dataset will be as follows: + ```bash # grep "The best acc" log.txt The best acc is 0.872685 @@ -315,13 +357,18 @@ The best acc is 0.899305 The best acc is 0.902777 ... ``` + #### evaluation on MNLI dataset + Before running the command below, please check the load pretrain checkpoint path has been set. Please set the checkpoint path to be the absolute full path, e.g:"/username/pretrain/checkpoint_100_300.ckpt". -``` + +```bash bash scripts/run_standalone_td.sh ``` -The command above will run in the background, you can view the results the file log.txt. The accuracy of the test dataset will be as follows: -``` + +The command above will run in the background, you can view the results the file log.txt. The accuracy of the test dataset will be as follows: + +```text # grep "The best acc" log.txt The best acc is 0.803206 The best acc is 0.803308 @@ -330,13 +377,18 @@ The best acc is 0.810355 The best acc is 0.813929 ... ``` + #### evaluation on QNLI dataset + Before running the command below, please check the load pretrain checkpoint path has been set. Please set the checkpoint path to be the absolute full path, e.g:"/username/pretrain/checkpoint_100_300.ckpt". -``` + +```bash bash scripts/run_standalone_td.sh ``` -The command above will run in the background, you can view the results the file log.txt. The accuracy of the test dataset will be as follows: -``` + +The command above will run in the background, you can view the results the file log.txt. The accuracy of the test dataset will be as follows: + +```text # grep "The best acc" log.txt The best acc is 0.870772 The best acc is 0.871691 @@ -345,10 +397,13 @@ The best acc is 0.875183 The best acc is 0.891176 ... ``` - + ## [Model Description](#contents) + ## [Performance](#contents) + ### training Performance + | Parameters | Ascend | GPU | | -------------------------- | ---------------------------------------------------------- | ------------------------- | | Model Version | TinyBERT | TinyBERT | @@ -364,13 +419,13 @@ The best acc is 0.891176 | Speed | 35.4ms/step | 98.654ms/step | | Total time | 17.3h(3poch, 8p) | 48h(3poch, 8p) | | Params (M) | 15M | 15M | -| Checkpoint for task distill| 74M(.ckpt file) | 74M(.ckpt file) | +| Checkpoint for task distill| 74M(.ckpt file) | 74M(.ckpt file) | | Scripts | [TinyBERT](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/nlp/tinybert) | | #### Inference Performance | Parameters | Ascend | GPU | -| -------------------------- | ----------------------------- | ------------------------- | +| -------------------------- | ----------------------------- | ------------------------- | | Model Version | | | | Resource | Ascend 910 | NV SMX2 V100-32G | | uploaded Date | 08/20/2020 | 08/24/2020 | @@ -384,12 +439,12 @@ The best acc is 0.891176 # [Description of Random Situation](#contents) -In run_standaloned_td.sh, we set do_shuffle to shuffle the dataset. +In run_standaloned_td.sh, we set do_shuffle to shuffle the dataset. In gd_config.py and td_config.py, we set the hidden_dropout_prob and attention_pros_dropout_prob to dropout some network node. In run_general_distill.py, we set the random seed to make sure distribute training has the same init weight. # [ModelZoo Homepage](#contents) - -Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo). + +Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo). diff --git a/model_zoo/official/recommend/ncf/README.md b/model_zoo/official/recommend/ncf/README.md index 8e55d4add5f..7fc7d790112 100644 --- a/model_zoo/official/recommend/ncf/README.md +++ b/model_zoo/official/recommend/ncf/README.md @@ -6,7 +6,7 @@ - [Features](#features) - [Mixed Precision](#mixed-precision) - [Environment Requirements](#environment-requirements) -- [Quick Start](#quick-start) +- [Quick Start](#quick-start) - [Script Description](#script-description) - [Script and Sample Code](#script-and-sample-code) - [Script Parameters](#script-parameters) @@ -20,46 +20,48 @@ - [Evaluation Performance](#evaluation-performance) - [Inference Performance](#evaluation-performance) - [How to use](#how-to-use) - - [Inference](#inference) + - [Inference](#inference) - [Continue Training on the Pretrained Model](#continue-training-on-the-pretrained-model) - - [Transfer Learning](#transfer-learning) + - [Transfer Learning](#transfer-learning) - [Description of Random Situation](#description-of-random-situation) - [ModelZoo Homepage](#modelzoo-homepage) - # [NCF Description](#contents) NCF is a general framework for collaborative filtering of recommendations in which a neural network architecture is used to model user-item interactions. Unlike traditional models, NCF does not resort to Matrix Factorization (MF) with an inner product on latent features of users and items. It replaces the inner product with a multi-layer perceptron that can learn an arbitrary function from data. [Paper](https://arxiv.org/abs/1708.05031): He X, Liao L, Zhang H, et al. Neural collaborative filtering[C]//Proceedings of the 26th international conference on world wide web. 2017: 173-182. - # [Model Architecture](#contents) Two instantiations of NCF are Generalized Matrix Factorization (GMF) and Multi-Layer Perceptron (MLP). GMF applies a linear kernel to model the latent feature interactions, and and MLP uses a nonlinear kernel to learn the interaction function from data. NeuMF is a fused model of GMF and MLP to better model the complex user-item interactions, and unifies the strengths of linearity of MF and non-linearity of MLP for modeling the user-item latent structures. NeuMF allows GMF and MLP to learn separate embeddings, and combines the two models by concatenating their last hidden layer. [neumf_model.py](neumf_model.py) defines the architecture details. - - # [Dataset](#contents) The [MovieLens datasets](http://files.grouplens.org/datasets/movielens/) are used for model training and evaluation. Specifically, we use two datasets: **ml-1m** (short for MovieLens 1 million) and **ml-20m** (short for MovieLens 20 million). -### ml-1m +## ml-1m + ml-1m dataset contains 1,000,209 anonymous ratings of approximately 3,706 movies made by 6,040 users who joined MovieLens in 2000. All ratings are contained in the file "ratings.dat" without header row, and are in the following format: -``` + +```cpp UserID::MovieID::Rating::Timestamp ``` - - UserIDs range between 1 and 6040. - - MovieIDs range between 1 and 3952. - - Ratings are made on a 5-star scale (whole-star ratings only). -### ml-20m +- UserIDs range between 1 and 6040. +- MovieIDs range between 1 and 3952. +- Ratings are made on a 5-star scale (whole-star ratings only). + +## ml-20m + ml-20m dataset contains 20,000,263 ratings of 26,744 movies by 138493 users. All ratings are contained in the file "ratings.csv". Each line of this file after the header row represents one rating of one movie by one user, and has the following format: -``` + +```text userId,movieId,rating,timestamp ``` - - The lines within this file are ordered first by userId, then, within user, by movieId. - - Ratings are made on a 5-star scale, with half-star increments (0.5 stars - 5.0 stars). + +- The lines within this file are ordered first by userId, then, within user, by movieId. +- Ratings are made on a 5-star scale, with half-star increments (0.5 stars - 5.0 stars). In both datasets, the timestamp is represented in seconds since midnight Coordinated Universal Time (UTC) of January 1, 1970. Each user has at least 20 ratings. @@ -67,26 +69,22 @@ In both datasets, the timestamp is represented in seconds since midnight Coordin ## Mixed Precision -The [mixed precision](https://www.mindspore.cn/tutorial/zh-CN/master/advanced_use/mixed_precision.html) training method accelerates the deep learning neural network training process by using both the single-precision and half-precision data formats, and maintains the network precision achieved by the single-precision training at the same time. Mixed precision training can accelerate the computation process, reduce memory usage, and enable a larger model or batch size to be trained on specific hardware. +The [mixed precision](https://www.mindspore.cn/tutorial/training/en/master/advanced_use/enable_mixed_precision.html) training method accelerates the deep learning neural network training process by using both the single-precision and half-precision data formats, and maintains the network precision achieved by the single-precision training at the same time. Mixed precision training can accelerate the computation process, reduce memory usage, and enable a larger model or batch size to be trained on specific hardware. For FP16 operators, if the input data type is FP32, the backend of MindSpore will automatically handle it with reduced precision. Users could check the reduced-precision operators by enabling INFO log and then searching ‘reduce precision’. - - # [Environment Requirements](#contents) - Hardware(Ascend/GPU) - - Prepare hardware environment with Ascend or GPU processor. If you want to try Ascend , please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources. + - Prepare hardware environment with Ascend or GPU processor. If you want to try Ascend , please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources. - Framework - - [MindSpore](https://www.mindspore.cn/install/en) + - [MindSpore](https://www.mindspore.cn/install/en) - For more information, please check the resources below: - - [MindSpore tutorials](https://www.mindspore.cn/tutorial/zh-CN/master/index.html) - - [MindSpore API](https://www.mindspore.cn/api/zh-CN/master/index.html) - - + - [MindSpore tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html) + - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html) # [Quick Start](#contents) -After installing MindSpore via the official website, you can start training and evaluation as follows: +After installing MindSpore via the official website, you can start training and evaluation as follows: ```python #run data process @@ -102,34 +100,31 @@ sh scripts/run_train.sh rank_table.json sh run_eval.sh ``` - - # [Script Description](#contents) ## [Script and Sample Code](#contents) - -``` -├── ModelZoo_NCF_ME +```text +├── ModelZoo_NCF_ME ├── README.md // descriptions about NCF - ├── scripts - │ ├──run_train.sh // shell script for train - │ ├──run_distribute_train.sh // shell script for distribute train - │ ├──run_eval.sh // shell script for evaluation - │ ├──run_download_dataset.sh // shell script for dataget and process - │ ├──run_transfer_ckpt_to_air.sh // shell script for transfer model style - ├── src + ├── scripts + │ ├──run_train.sh // shell script for train + │ ├──run_distribute_train.sh // shell script for distribute train + │ ├──run_eval.sh // shell script for evaluation + │ ├──run_download_dataset.sh // shell script for dataget and process + │ ├──run_transfer_ckpt_to_air.sh // shell script for transfer model style + ├── src │ ├──dataset.py // creating dataset │ ├──ncf.py // ncf architecture - │ ├──config.py // parameter configuration - │ ├──movielens.py // data download file - │ ├──callbacks.py // model loss and eval callback file - │ ├──constants.py // the constants of model - │ ├──export.py // export checkpoint files into geir/onnx + │ ├──config.py // parameter configuration + │ ├──movielens.py // data download file + │ ├──callbacks.py // model loss and eval callback file + │ ├──constants.py // the constants of model + │ ├──export.py // export checkpoint files into geir/onnx │ ├──metrics.py // the file for auc compute │ ├──stat_utils.py // the file for data process functions - ├── train.py // training script - ├── eval.py // evaluation script + ├── train.py // training script + ├── eval.py // evaluation script ``` ## [Script Parameters](#contents) @@ -149,15 +144,15 @@ Parameters for both training and evaluation can be set in config.py. * `--num_factors`:The Embedding size of MF model. * `--output_path`:The location of the output file. * `--eval_file_name` : Eval output file. - * `--loss_file_name` : Loss output file. + * `--loss_file_name` : Loss output file. ``` ## [Training Process](#contents) -### Training +### Training ```python - bash scripts/run_train.sh + bash scripts/run_train.sh ``` The python command above will run in the background, you can view the results through the file `train.log`. After training, you'll get some checkpoint files under the script folder by default. The loss value will be achieved as follows: @@ -171,7 +166,7 @@ Parameters for both training and evaluation can be set in config.py. ... ``` - The model checkpoint will be saved in the current directory. + The model checkpoint will be saved in the current directory. ## [Evaluation Process](#contents) @@ -182,7 +177,7 @@ Parameters for both training and evaluation can be set in config.py. Before running the command below, please check the checkpoint path used for evaluation. Please set the checkpoint path to be the absolute full path, e.g., "checkpoint/ncf-125_390.ckpt". ```python - sh scripts/run_eval.sh + sh scripts/run_eval.sh ``` The above python command will run in the background. You can view the results through the file "eval.log". The accuracy of the test dataset will be as follows: @@ -192,12 +187,11 @@ Parameters for both training and evaluation can be set in config.py. HR:0.6846,NDCG:0.410 ``` - - # [Model Description](#contents) + ## [Performance](#contents) -### Evaluation Performance +### Evaluation Performance | Parameters | Ascend | | -------------------------- | ------------------------------------------------------------ | @@ -213,90 +207,86 @@ Parameters for both training and evaluation can be set in config.py. | Speed | 1pc: 0.575 ms/step | | Total time | 1pc: 5 mins | - ### Inference Performance -| Parameters | Ascend | -| ------------------- | --------------------------- | -| Model Version | NCF | -| Resource | Ascend 910 | -| Uploaded Date | 10/23/2020 (month/day/year) | +| Parameters | Ascend | +| ------------------- | --------------------------- | +| Model Version | NCF | +| Resource | Ascend 910 | +| Uploaded Date | 10/23/2020 (month/day/year) | | MindSpore Version | 1.0.0 | -| Dataset | ml-1m | -| batch_size | 256 | -| outputs | probability | -| Accuracy | HR:0.6846,NDCG:0.410 | +| Dataset | ml-1m | +| batch_size | 256 | +| outputs | probability | +| Accuracy | HR:0.6846,NDCG:0.410 | ## [How to use](#contents) + ### Inference -If you need to use the trained model to perform inference on multiple hardware platforms, such as GPU, Ascend 910 or Ascend 310, you can refer to this [Link](https://www.mindspore.cn/tutorial/zh-CN/master/advanced_use/network_migration.html). Following the steps below, this is a simple example: +If you need to use the trained model to perform inference on multiple hardware platforms, such as GPU, Ascend 910 or Ascend 310, you can refer to this [Link](https://www.mindspore.cn/tutorial/training/en/master/advanced_use/migrate_3rd_scripts.html). Following the steps below, this is a simple example: -https://www.mindspore.cn/tutorial/zh-CN/master/use/multi_platform_inference.html + - - ``` + ```python # Load unseen dataset for inference dataset = dataset.create_dataset(cfg.data_path, 1, False) - - # Define model + + # Define model net = GoogleNet(num_classes=cfg.num_classes) opt = Momentum(filter(lambda x: x.requires_grad, net.get_parameters()), 0.01, cfg.momentum, weight_decay=cfg.weight_decay) loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean') model = Model(net, loss_fn=loss, optimizer=opt, metrics={'acc'}) - + # Load pre-trained model param_dict = load_checkpoint(cfg.checkpoint_path) load_param_into_net(net, param_dict) net.set_train(False) - + # Make predictions on the unseen dataset acc = model.eval(dataset) print("accuracy: ", acc) ``` +### Continue Training on the Pretrained Model -### Continue Training on the Pretrained Model - - ``` + ```python # Load dataset dataset = create_dataset(cfg.data_path, cfg.epoch_size) batch_num = dataset.get_dataset_size() - + # Define model net = GoogleNet(num_classes=cfg.num_classes) # Continue training if set pre_trained to be True if cfg.pre_trained: param_dict = load_checkpoint(cfg.checkpoint_path) load_param_into_net(net, param_dict) - lr = lr_steps(0, lr_max=cfg.lr_init, total_epochs=cfg.epoch_size, + lr = lr_steps(0, lr_max=cfg.lr_init, total_epochs=cfg.epoch_size, steps_per_epoch=batch_num) - opt = Momentum(filter(lambda x: x.requires_grad, net.get_parameters()), + opt = Momentum(filter(lambda x: x.requires_grad, net.get_parameters()), Tensor(lr), cfg.momentum, weight_decay=cfg.weight_decay) loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean') model = Model(net, loss_fn=loss, optimizer=opt, metrics={'acc'}, amp_level="O2", keep_batchnorm_fp32=False, loss_scale_manager=None) - - # Set callbacks - config_ck = CheckpointConfig(save_checkpoint_steps=batch_num * 5, + + # Set callbacks + config_ck = CheckpointConfig(save_checkpoint_steps=batch_num * 5, keep_checkpoint_max=cfg.keep_checkpoint_max) time_cb = TimeMonitor(data_size=batch_num) - ckpoint_cb = ModelCheckpoint(prefix="train_googlenet_cifar10", directory="./", + ckpoint_cb = ModelCheckpoint(prefix="train_googlenet_cifar10", directory="./", config=config_ck) loss_cb = LossMonitor() - + # Start training model.train(cfg.epoch_size, dataset, callbacks=[time_cb, ckpoint_cb, loss_cb]) print("train success") ``` - # [Description of Random Situation](#contents) In dataset.py, we set the seed inside “create_dataset" function. We also use random seed in train.py. - # [ModelZoo Homepage](#contents) Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo). diff --git a/model_zoo/research/audio/fcn-4/README.md b/model_zoo/research/audio/fcn-4/README.md index 5791e7a97dd..dfd03595995 100644 --- a/model_zoo/research/audio/fcn-4/README.md +++ b/model_zoo/research/audio/fcn-4/README.md @@ -32,7 +32,7 @@ FCN-4 is a convolutional neural network architecture, its name FCN-4 comes from ### Mixed Precision -The [mixed precision](https://www.mindspore.cn/tutorial/zh-CN/master/advanced_use/mixed_precision.html) training method accelerates the deep learning neural network training process by using both the single-precision and half-precision data formats, and maintains the network precision achieved by the single-precision training at the same time. Mixed precision training can accelerate the computation process, reduce memory usage, and enable a larger model or batch size to be trained on specific hardware. +The [mixed precision](https://www.mindspore.cn/tutorial/training/en/master/advanced_use/enable_mixed_precision.html) training method accelerates the deep learning neural network training process by using both the single-precision and half-precision data formats, and maintains the network precision achieved by the single-precision training at the same time. Mixed precision training can accelerate the computation process, reduce memory usage, and enable a larger model or batch size to be trained on specific hardware. For FP16 operators, if the input data type is FP32, the backend of MindSpore will automatically handle it with reduced precision. Users could check the reduced-precision operators by enabling INFO log and then searching ‘reduce precision’. ## [Environment Requirements](#contents) @@ -42,8 +42,8 @@ For FP16 operators, if the input data type is FP32, the backend of MindSpore wil - Framework - [MindSpore](https://www.mindspore.cn/install/en) - For more information, please check the resources below: - - [MindSpore tutorials](https://www.mindspore.cn/tutorial/zh-CN/master/index.html) - - [MindSpore API](https://www.mindspore.cn/api/zh-CN/master/index.html) + - [MindSpore tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html) + - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html) ## [Quick Start](#contents) diff --git a/model_zoo/research/cv/FaceAttribute/README.md b/model_zoo/research/cv/FaceAttribute/README.md index 433c10514ae..adeae9f938a 100644 --- a/model_zoo/research/cv/FaceAttribute/README.md +++ b/model_zoo/research/cv/FaceAttribute/README.md @@ -90,8 +90,8 @@ We use about 91K face images as training dataset and 11K as evaluating dataset i - Framework - [MindSpore](https://www.mindspore.cn/install/en) - For more information, please check the resources below: - - [MindSpore tutorials](https://www.mindspore.cn/tutorial/zh-CN/master/index.html) - - [MindSpore API](https://www.mindspore.cn/api/zh-CN/master/index.html) + - [MindSpore tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html) + - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html) # [Script Description](#contents) diff --git a/model_zoo/research/cv/FaceDetection/README.md b/model_zoo/research/cv/FaceDetection/README.md index ae8a3851665..a5f6f67db8b 100644 --- a/model_zoo/research/cv/FaceDetection/README.md +++ b/model_zoo/research/cv/FaceDetection/README.md @@ -74,8 +74,8 @@ We use about 13K images as training dataset and 3K as evaluating dataset in this - Framework - [MindSpore](https://www.mindspore.cn/install/en) - For more information, please check the resources below: - - [MindSpore tutorials](https://www.mindspore.cn/tutorial/zh-CN/master/index.html) - - [MindSpore API](https://www.mindspore.cn/api/zh-CN/master/index.html) + - [MindSpore tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html) + - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html) # [Script Description](#contents) diff --git a/model_zoo/research/cv/FaceQualityAssessment/README.md b/model_zoo/research/cv/FaceQualityAssessment/README.md index 74598c373f3..d34337d8a43 100644 --- a/model_zoo/research/cv/FaceQualityAssessment/README.md +++ b/model_zoo/research/cv/FaceQualityAssessment/README.md @@ -72,8 +72,8 @@ We use about 122K face images as training dataset and 2K as evaluating dataset i - Framework - [MindSpore](https://www.mindspore.cn/install/en) - For more information, please check the resources below: - - [MindSpore tutorials](https://www.mindspore.cn/tutorial/zh-CN/master/index.html) - - [MindSpore API](https://www.mindspore.cn/api/zh-CN/master/index.html) + - [MindSpore tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html) + - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html) # [Script Description](#contents) diff --git a/model_zoo/research/cv/FaceRecognition/README.md b/model_zoo/research/cv/FaceRecognition/README.md index c23a9f64d7a..964668fb7b9 100644 --- a/model_zoo/research/cv/FaceRecognition/README.md +++ b/model_zoo/research/cv/FaceRecognition/README.md @@ -60,8 +60,8 @@ The directory structure is as follows: - Framework - [MindSpore](https://www.mindspore.cn/install/en) - For more information, please check the resources below: - - [MindSpore tutorials](https://www.mindspore.cn/tutorial/zh-CN/master/index.html) - - [MindSpore API](https://www.mindspore.cn/api/zh-CN/master/index.html) + - [MindSpore tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html) + - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html) # [Script Description](#contents) @@ -241,4 +241,4 @@ sh run_export.sh 16 0 ./0-1_1.ckpt # [ModelZoo Homepage](#contents) -Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo). \ No newline at end of file +Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo). diff --git a/model_zoo/research/cv/FaceRecognitionForTracking/README.md b/model_zoo/research/cv/FaceRecognitionForTracking/README.md index c9de351e4e0..2fe4d6c2eee 100644 --- a/model_zoo/research/cv/FaceRecognitionForTracking/README.md +++ b/model_zoo/research/cv/FaceRecognitionForTracking/README.md @@ -60,8 +60,8 @@ The directory structure is as follows: - Framework - [MindSpore](https://www.mindspore.cn/install/en) - For more information, please check the resources below: - - [MindSpore tutorials](https://www.mindspore.cn/tutorial/zh-CN/master/index.html) - - [MindSpore API](https://www.mindspore.cn/api/zh-CN/master/index.html) + - [MindSpore tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html) + - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html) # [Script Description](#contents) diff --git a/model_zoo/research/cv/centernet/README.md b/model_zoo/research/cv/centernet/README.md index 2c76af2e881..ef31c84f0a9 100644 --- a/model_zoo/research/cv/centernet/README.md +++ b/model_zoo/research/cv/centernet/README.md @@ -37,7 +37,7 @@ In the current model, we use CenterNet to estimate multi-person pose. The DLA(De Note that you can run the scripts based on the dataset mentioned in original paper or widely used in relevant domain/network architecture. In the following sections, we will introduce how to run the scripts using the related dataset below. -Dataset used: [COCO2017]() +Dataset used: [COCO2017](https://cocodataset.org/) - Dataset size:26G - Train:19G,118000 images @@ -81,8 +81,8 @@ Dataset used: [COCO2017]() - Framework - [MindSpore](https://cmc-szv.clouddragon.huawei.com/cmcversion/index/search?searchKey=Do-MindSpore%20V100R001C00B622) - For more information, please check the resources below: - - [MindSpore tutorials](https://www.mindspore.cn/tutorial/zh-CN/master/index.html) - - [MindSpore API](https://www.mindspore.cn/api/zh-CN/master/index.html) + - [MindSpore tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html) + - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html) - Download the dataset COCO2017. - We use COCO2017 as training dataset in this example by default, and you can also use your own datasets. diff --git a/model_zoo/research/nlp/dscnn/README.md b/model_zoo/research/nlp/dscnn/README.md index 93e0ac68b27..b312c3e0fcf 100644 --- a/model_zoo/research/nlp/dscnn/README.md +++ b/model_zoo/research/nlp/dscnn/README.md @@ -4,7 +4,7 @@ - [Model Architecture](#model-architecture) - [Dataset](#dataset) - [Environment Requirements](#environment-requirements) -- [Quick Start](#quick-start) +- [Quick Start](#quick-start) - [Script Description](#script-description) - [Script and Sample Code](#script-and-sample-code) - [Script Parameters](#script-parameters) @@ -17,20 +17,18 @@ - [Evaluation Performance](#evaluation-performance) - [Inference Performance](#evaluation-performance) - [How to use](#how-to-use) - - [Inference](#inference) + - [Inference](#inference) - [Continue Training on the Pretrained Model](#continue-training-on-the-pretrained-model) - - [Transfer Learning](#transfer-learning) + - [Transfer Learning](#transfer-learning) - [Description of Random Situation](#description-of-random-situation) - [ModelZoo Homepage](#modelzoo-homepage) - # [DS-CNN Description](#contents) -DS-CNN, depthwise separable convolutional neural network, was first used in Keyword Spotting in 2017. KWS application has highly constrained power budget and typically runs on tiny microcontrollers with limited memory and compute capability. depthwise separable convolutions are more efficient both in number of parameters and operations, which makes deeper and wider architecture possible even in the resource-constrained microcontroller devices. +DS-CNN, depthwise separable convolutional neural network, was first used in Keyword Spotting in 2017. KWS application has highly constrained power budget and typically runs on tiny microcontrollers with limited memory and compute capability. depthwise separable convolutions are more efficient both in number of parameters and operations, which makes deeper and wider architecture possible even in the resource-constrained microcontroller devices. [Paper](https://arxiv.org/abs/1711.07128): Zhang, Yundong, Naveen Suda, Liangzhen Lai, and Vikas Chandra. "Hello edge: Keyword spotting on microcontrollers." arXiv preprint arXiv:1711.07128 (2017). - # [Model Architecture](#contents) The overall network architecture of DS-CNN is show below: @@ -38,49 +36,47 @@ The overall network architecture of DS-CNN is show below: # [Dataset](#contents) - -Dataset used: [Speech commands dataset version 1]() +Dataset used: [Speech commands dataset version 1](https://ai.googleblog.com/2017/08/launching-speech-commands-dataset.html) - Dataset size:2.02GiB, 65,000 one-second long utterances of 30 short words, by thousands of different people - - Train: 80% - - Val: 10% - - Test: 10% + - Train: 80% + - Val: 10% + - Test: 10% - Data format:WAVE format file, with the sample data encoded as linear 16-bit single-channel PCM values, at a 16 KHz rate - - Note:Data will be processed in download_process_data.py + - Note:Data will be processed in download_process_data.py -Dataset used: [Speech commands dataset version 2]() +Dataset used: [Speech commands dataset version 2](https://arxiv.org/abs/1804.03209) - Dataset size: 8.17 GiB. 105,829 a one-second (or less) long utterances of 35 words by 2,618 speakers - - Train: 80% - - Val: 10% - - Test: 10% + - Train: 80% + - Val: 10% + - Test: 10% - Data format:WAVE format file, with the sample data encoded as linear 16-bit single-channel PCM values, at a 16 KHz rate - - Note:Data will be processed in download_process_data.py + - Note:Data will be processed in download_process_data.py - # [Environment Requirements](#contents) - Hardware(Ascend/GPU) - - Prepare hardware environment with Ascend or GPU processor. If you want to try Ascend , please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources. + - Prepare hardware environment with Ascend or GPU processor. If you want to try Ascend , please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources. - Framework - - [MindSpore](https://www.mindspore.cn/install/en) + - [MindSpore](https://www.mindspore.cn/install/en) - Third party open source package(if have) - - numpy - - soundfile - - python_speech_features + - numpy + - soundfile + - python_speech_features - For more information, please check the resources below: - - [MindSpore tutorials](https://www.mindspore.cn/tutorial/zh-CN/master/index.html) - - [MindSpore API](https://www.mindspore.cn/api/zh-CN/master/index.html) - - + - [MindSpore tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html) + - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html) # [Quick Start](#contents) After installing MindSpore via the official website, you can start training and evaluation as follows: First set the config for data, train, eval in src/config.py + - download and process dataset - ``` + + ```bash python src/download_process_data.py ``` @@ -88,8 +84,8 @@ First set the config for data, train, eval in src/config.py ```python # run training example - python train.py - + python train.py + # run evaluation example # if you want to eval a specific model, you should specify model_dir to the ckpt path: python eval.py --model_dir your_ckpt_path @@ -102,14 +98,14 @@ First set the config for data, train, eval in src/config.py ## [Script and Sample Code](#contents) -``` -├── dscnn +```text +├── dscnn ├── README.md // descriptions about ds-cnn - ├── scripts - │ ├──run_download_process_data.sh // shell script for download dataset and prepare feature and label + ├── scripts + │ ├──run_download_process_data.sh // shell script for download dataset and prepare feature and label │ ├──run_train_ascend.sh // shell script for train on ascend - │ ├──run_eval_ascend.sh // shell script for evaluation on ascend - ├── src + │ ├──run_eval_ascend.sh // shell script for evaluation on ascend + ├── src │ ├──callback.py // callbacks │ ├──config.py // parameter configuration of data, train and eval │ ├──dataset.py // creating dataset @@ -118,10 +114,10 @@ First set the config for data, train, eval in src/config.py │ ├──log.py // logging class │ ├──loss.py // loss function │ ├──lr_scheduler.py // lr_scheduler - │ ├──models.py // load ckpt - │ ├──utils.py // some function for prepare data - ├── train.py // training script - ├── eval.py // evaluation script + │ ├──models.py // load ckpt + │ ├──utils.py // some function for prepare data + ├── train.py // training script + ├── eval.py // evaluation script ├── export.py // export checkpoint files into air/geir ├── requirements.txt // Third party open source package ``` @@ -130,21 +126,21 @@ First set the config for data, train, eval in src/config.py Parameters for both training and evaluation can be set in config.py. -- config for dataset for Speech commands dataset version 1 +- config for dataset for Speech commands dataset version 1 ```python - 'data_url': 'http://download.tensorflow.org/data/speech_commands_v0.01.tar.gz' + 'data_url': 'http://download.tensorflow.org/data/speech_commands_v0.01.tar.gz' # Location of speech training data archive on the web 'data_dir': 'data' # Where to download the dataset - 'feat_dir': 'feat' # Where to save the feature and label of audios + 'feat_dir': 'feat' # Where to save the feature and label of audios 'background_volume': 0.1 # How loud the background noise should be, between 0 and 1. 'background_frequency': 0.8 # How many of the training samples have background noise mixed in. - 'silence_percentage': 10.0 # How much of the training data should be silence. - 'unknown_percentage': 10.0 # How much of the training data should be unknown words - 'time_shift_ms': 100.0 # Range to randomly shift the training audio by in time + 'silence_percentage': 10.0 # How much of the training data should be silence. + 'unknown_percentage': 10.0 # How much of the training data should be unknown words + 'time_shift_ms': 100.0 # Range to randomly shift the training audio by in time 'testing_percentage': 10 # What percentage of wavs to use as a test set 'validation_percentage': 10 # What percentage of wavs to use as a validation set - 'wanted_words': 'yes,no,up,down,left,right,on,off,stop,go' + 'wanted_words': 'yes,no,up,down,left,right,on,off,stop,go' # Words to use (others will be added to an unknown label) 'sample_rate': 16000 # Expected sample rate of the wavs 'device_id': 1000 # device ID used to train or evaluate the dataset. @@ -153,25 +149,25 @@ Parameters for both training and evaluation can be set in config.py. 'window_stride_ms': 20.0 # How long each spectrogram timeslice is 'dct_coefficient_count': 20 # How many bins to use for the MFCC fingerprint ``` - -- config for DS-CNN and train parameters of Speech commands dataset version 1 + +- config for DS-CNN and train parameters of Speech commands dataset version 1 ```python - 'model_size_info': [6, 276, 10, 4, 2, 1, 276, 3, 3, 2, 2, 276, 3, 3, 1, 1, 276, 3, 3, 1, 1, 276, 3, 3, 1, 1, 276, 3, 3, 1, 1] + 'model_size_info': [6, 276, 10, 4, 2, 1, 276, 3, 3, 2, 2, 276, 3, 3, 1, 1, 276, 3, 3, 1, 1, 276, 3, 3, 1, 1, 276, 3, 3, 1, 1] # Model dimensions - different for various models - 'drop': 0.9 # dropout - 'pretrained': '' # model_path, local pretrained model to load + 'drop': 0.9 # dropout + 'pretrained': '' # model_path, local pretrained model to load 'use_graph_mode': 1 # use graph mode or feed mode 'val_interval': 1 # validate interval 'per_batch_size': 100 # batch size for per gpu - 'lr_scheduler': 'multistep' # lr-scheduler, option type: multistep, cosine_annealing + 'lr_scheduler': 'multistep' # lr-scheduler, option type: multistep, cosine_annealing 'lr': 0.1 # learning rate of the training - 'lr_epochs': '20,40,60,80' # epoch of lr changing + 'lr_epochs': '20,40,60,80' # epoch of lr changing 'lr_gamma': 0.1 # decrease lr by a factor of exponential lr_scheduler 'eta_min': 0 # eta_min in cosine_annealing scheduler 'T_max': 80 # T-max in cosine_annealing scheduler 'max_epoch': 80 # max epoch num to train the model - 'warmup_epochs': 0 # warmup epoch + 'warmup_epochs': 0 # warmup epoch 'weight_decay': 0.001 # weight decay 'momentum': 0.98 # weight decay 'log_interval': 100 # logging interval @@ -179,12 +175,12 @@ Parameters for both training and evaluation can be set in config.py. 'ckpt_interval': 100 # save ckpt_interval ``` -- config for DS-CNN and evaluation parameters of Speech commands dataset version 1 +- config for DS-CNN and evaluation parameters of Speech commands dataset version 1 ```python 'feat_dir': 'feat' # Where to save the feature of audios - 'model_dir': '' # which folder the models are saved in or specific path of one model - 'wanted_words': 'yes,no,up,down,left,right,on,off,stop,go' + 'model_dir': '' # which folder the models are saved in or specific path of one model + 'wanted_words': 'yes,no,up,down,left,right,on,off,stop,go' # Words to use (others will be added to an unknown label) 'sample_rate': 16000 # Expected sample rate of the wavs 'device_id': 1000 # device ID used to train or evaluate the dataset. @@ -192,33 +188,35 @@ Parameters for both training and evaluation can be set in config.py. 'window_size_ms': 40.0 # How long each spectrogram timeslice is 'window_stride_ms': 20.0 # How long each spectrogram timeslice is 'dct_coefficient_count': 20 # How many bins to use for the MFCC fingerprint - 'model_size_info': [6, 276, 10, 4, 2, 1, 276, 3, 3, 2, 2, 276, 3, 3, 1, 1, 276, 3, 3, 1, 1, 276, 3, 3, 1, 1, 276, 3, 3, 1, 1] + 'model_size_info': [6, 276, 10, 4, 2, 1, 276, 3, 3, 2, 2, 276, 3, 3, 1, 1, 276, 3, 3, 1, 1, 276, 3, 3, 1, 1, 276, 3, 3, 1, 1] # Model dimensions - different for various models 'pre_batch_size': 100 # batch size for eval 'drop': 0.9 # dropout in train 'log_path': 'eval_outputs' # path to save eval log ``` - ## [Training Process](#contents) -### Training +### Training - running on Ascend for shell script: + ```python # sh srcipts/run_train_ascend.sh [device_id] sh srcipts/run_train_ascend.sh 0 ``` + for python script: + ```python # python train.py --device_id [device_id] python train.py --device_id 0 ``` you can see the args and loss, acc info on your screen, you also can view the results in folder train_outputs - + ```python epoch[1], iter[443], loss:0.73811543, mean_wps:12102.26 wavs/sec Eval: top1_cor:737, top5_cor:1699, tot:3000, acc@1=24.57%, acc@5=56.63% @@ -229,9 +227,7 @@ Parameters for both training and evaluation can be set in config.py. Best epoch:41 acc:93.73% ``` - The checkpoints and log will be saved in the train_outputs. - - + The checkpoints and log will be saved in the train_outputs. ## [Evaluation Process](#contents) @@ -242,17 +238,20 @@ Parameters for both training and evaluation can be set in config.py. Before running the command below, please check the checkpoint path used for evaluation. Please set model_dir in config.py or pass model_dir in your command line. for shell scripts: - ```python + + ```bash # sh scripts/run_eval_ascend.sh device_id model_dir sh scripts/run_eval_ascend.sh 0 train_outputs/*/*.ckpt - or + or sh scripts/run_eval_ascend.sh 0 train_outputs/*/ ``` + for python scripts: - ```python + + ```bash # python eval.py --device_id device_id --model_dir model_dir python eval.py --device_id 0 --model_dir train_outputs/*/*.ckpt - or + or python eval.py --device_id 0 --model_dir train_outputs/* ``` @@ -264,51 +263,49 @@ Parameters for both training and evaluation can be set in config.py. ``` # [Model Description](#contents) + ## [Performance](#contents) -### Train Performance +### Train Performance -| Parameters | Ascend | +| Parameters | Ascend | | -------------------------- | ------------------------------------------------------------ | -| Model Version | DS-CNN | +| Model Version | DS-CNN | | Resource | Ascend 910 ;CPU 2.60GHz,56cores;Memory,314G | | uploaded Date | 27/09/2020 (month/day/year) | | MindSpore Version | 1.0.0 | -| Dataset | Speech commands dataset version 1 | -| Training Parameters | epoch=80, batch_size = 100, lr=0.1 | -| Optimizer | Momentum | +| Dataset | Speech commands dataset version 1 | +| Training Parameters | epoch=80, batch_size = 100, lr=0.1 | +| Optimizer | Momentum | | Loss Function | Softmax Cross Entropy | -| outputs | probability | +| outputs | probability | | Loss | 0.0019 | -| Speed | 2s/epoch | +| Speed | 2s/epoch | | Total time | 4 mins | -| Parameters (K) | 500K | +| Parameters (K) | 500K | | Checkpoint for Fine tuning | 3.3M (.ckpt file) | | Script | [Link]() | [Link]() | - ### Inference Performance | Parameters | Ascend | -| ------------------- | --------------------------- | -| Model Version | DS-CNN | +| ------------------- | --------------------------- | +| Model Version | DS-CNN | | Resource | Ascend 910 | | Uploaded Date | 09/27/2020 | -| MindSpore Version | 1.0.0 | -| Dataset |Speech commands dataset version 1 | -| Training Parameters | src/config.py | -| outputs | probability | -| Accuracy | 93.96% | -| Total time | 3min | -| Params (K) | 500K | -|Checkpoint for Fine tuning (M) | 3.3M | - - +| MindSpore Version | 1.0.0 | +| Dataset |Speech commands dataset version 1 | +| Training Parameters | src/config.py | +| outputs | probability | +| Accuracy | 93.96% | +| Total time | 3min | +| Params (K) | 500K | +|Checkpoint for Fine tuning (M) | 3.3M | # [Description of Random Situation](#contents) -In download_process_data.py, we set the seed for split train, val, test set. +In download_process_data.py, we set the seed for split train, val, test set. +# [ModelZoo Homepage](#contents) -# [ModelZoo Homepage](#contents) Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo). diff --git a/model_zoo/research/nlp/textrcnn/readme.md b/model_zoo/research/nlp/textrcnn/readme.md index e7922bbc044..9ea5da2afee 100644 --- a/model_zoo/research/nlp/textrcnn/readme.md +++ b/model_zoo/research/nlp/textrcnn/readme.md @@ -27,7 +27,7 @@ Specifically, the TextRCNN is mainly composed of three parts: a recurrent struct ## [Dataset](#contents) -Dataset used: [Sentence polarity dataset v1.0]() +Dataset used: [Sentence polarity dataset v1.0](http://www.cs.cornell.edu/people/pabo/movie-review-data/) - Dataset size:10662 movie comments in 2 classes, 9596 comments for train set, 1066 comments for test set. - Data format:text files. The processed data is in ```./data/``` @@ -36,7 +36,7 @@ Dataset used: [Sentence polarity dataset v1.0](