History

looop5 65f9fc3f48 enable graph kernel when training dqn on GPU back-end		2021-05-31 15:29:09 +08:00
..
scripts	dqn	2021-03-12 10:00:34 +08:00
src	dqn	2021-03-12 10:00:34 +08:00
README.md	update README at model_zoo	2021-04-12 19:56:43 +08:00
eval.py	cartpolev0-cartpolev1	2021-03-12 16:37:27 +08:00
requirements.txt	dqn	2021-03-12 10:00:34 +08:00
train.py	enable graph kernel when training dqn on GPU back-end	2021-05-31 15:29:09 +08:00

DQN Description

DQN is the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. Paper Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves et al. "Human-level control through deep reinforcement learning." nature 518, no. 7540 (2015): 529-533.

Model Architecture

The overall network architecture of DQN is show below:

Paper

Dataset

Requirements

Hardware（Ascend/GPU/CPU）
- Prepare hardware environment with Ascend or GPU processor.
Framework
- MindSpore
For more information, please check the resources below：
- MindSpore Tutorials
- MindSpore Python API
third-party libraries

pip install gym

Script Description

Scripts and Sample Code

├── dqn
  ├── README.md              # descriptions about DQN
  ├── scripts
  │   ├──run_standalone_eval_ascend.sh        # shell script for evaluation with Ascend
  │   ├──run_standalone_eval_gpu.sh         # shell script for evaluation with GPU
  │   ├──run_standalone_train_ascend.sh        # shell script for train with Ascend
  │   ├──run_standalone_train_gpu.sh         # shell script for train with GPU
  ├── src
  │   ├──agent.py             # model agent
  │   ├──config.py           # parameter configuration
  │   ├──dqn.py      # dqn architecture
  ├── train.py               # training script
  ├── eval.py                # evaluation script

Script Parameter

    'gamma': 0.8             # the proportion of choose next state value
    'epsi_high': 0.9         # the highest exploration rate
    'epsi_low': 0.05         # the Lowest exploration rate
    'decay': 200             # number of steps to start learning
    'lr': 0.001              # learning rate
    'capacity': 100000       # the capacity of data buffer
    'batch_size': 512        # training batch size
    'state_space_dim': 4     # the environment state space dim
    'action_space_dim': 2    # the action dim

Training Process

# training example
  python
      Ascend: python train.py --device_target Ascend --ckpt_path ckpt > log.txt 2>&1 &  
      GPU: python train.py --device_target GPU --ckpt_path ckpt > log.txt 2>&1 &  

  shell:
      Ascend: sh run_standalone_train_ascend.sh ckpt
      GPU: sh run_standalone_train_gpu.sh ckpt

Evaluation Process

# evaluat example
  python
      Ascend: python eval.py --device_target Ascend --ckpt_path .ckpt/checkpoint_dqn.ckpt
      GPU: python eval.py --device_target GPU --ckpt_path .ckpt/checkpoint_dqn.ckpt

  shell:
      Ascend: sh run_standalone_eval_ascend.sh .ckpt/checkpoint_dqn.ckpt
      GPU: sh run_standalone_eval_gpu.sh .ckpt/checkpoint_dqn.ckpt

Performance

Inference Performance

Parameters	DQN
Resource	Ascend 910; CPU 2.60GHz, 192cores; Memory 755G; OS Euler2.8
uploaded Date	03/10/2021 (month/day/year)
MindSpore Version	1.1.0
Training Parameters	batch_size = 512, lr=0.001
Optimizer	RMSProp
Loss Function	MSELoss
outputs	probability
Params (M)	7.3k
Scripts	https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/rl/dqn

Description of Random Situation

We use random seed in train.py.

ModeZoo Homepage

Please check the official homepage.

README.md Unescape Escape

Contents

Inference Performance

README.md