mindspore/model_zoo/official/rl/dqn
looop5 65f9fc3f48 enable graph kernel when training dqn on GPU back-end 2021-05-31 15:29:09 +08:00
..
scripts dqn 2021-03-12 10:00:34 +08:00
src dqn 2021-03-12 10:00:34 +08:00
README.md update README at model_zoo 2021-04-12 19:56:43 +08:00
eval.py cartpolev0-cartpolev1 2021-03-12 16:37:27 +08:00
requirements.txt dqn 2021-03-12 10:00:34 +08:00
train.py enable graph kernel when training dqn on GPU back-end 2021-05-31 15:29:09 +08:00

README.md

Contents

DQN Description

DQN is the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. Paper Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves et al. "Human-level control through deep reinforcement learning." nature 518, no. 7540 (2015): 529-533.

Model Architecture

The overall network architecture of DQN is show below:

Paper

Dataset

Requirements

pip install gym

Script Description

Scripts and Sample Code

├── dqn
  ├── README.md              # descriptions about DQN
  ├── scripts
     ├──run_standalone_eval_ascend.sh        # shell script for evaluation with Ascend
     ├──run_standalone_eval_gpu.sh         # shell script for evaluation with GPU
     ├──run_standalone_train_ascend.sh        # shell script for train with Ascend
     ├──run_standalone_train_gpu.sh         # shell script for train with GPU
  ├── src
     ├──agent.py             # model agent
     ├──config.py           # parameter configuration
     ├──dqn.py      # dqn architecture
  ├── train.py               # training script
  ├── eval.py                # evaluation script

Script Parameter

    'gamma': 0.8             # the proportion of choose next state value
    'epsi_high': 0.9         # the highest exploration rate
    'epsi_low': 0.05         # the Lowest exploration rate
    'decay': 200             # number of steps to start learning
    'lr': 0.001              # learning rate
    'capacity': 100000       # the capacity of data buffer
    'batch_size': 512        # training batch size
    'state_space_dim': 4     # the environment state space dim
    'action_space_dim': 2    # the action dim

Training Process

# training example
  python
      Ascend: python train.py --device_target Ascend --ckpt_path ckpt > log.txt 2>&1 &  
      GPU: python train.py --device_target GPU --ckpt_path ckpt > log.txt 2>&1 &  

  shell:
      Ascend: sh run_standalone_train_ascend.sh ckpt
      GPU: sh run_standalone_train_gpu.sh ckpt

Evaluation Process

# evaluat example
  python
      Ascend: python eval.py --device_target Ascend --ckpt_path .ckpt/checkpoint_dqn.ckpt
      GPU: python eval.py --device_target GPU --ckpt_path .ckpt/checkpoint_dqn.ckpt

  shell:
      Ascend: sh run_standalone_eval_ascend.sh .ckpt/checkpoint_dqn.ckpt
      GPU: sh run_standalone_eval_gpu.sh .ckpt/checkpoint_dqn.ckpt

Performance

Inference Performance

Parameters DQN
Resource Ascend 910; CPU 2.60GHz, 192cores; Memory 755G; OS Euler2.8
uploaded Date 03/10/2021 (month/day/year)
MindSpore Version 1.1.0
Training Parameters batch_size = 512, lr=0.001
Optimizer RMSProp
Loss Function MSELoss
outputs probability
Params (M) 7.3k
Scripts https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/rl/dqn

Description of Random Situation

We use random seed in train.py.

ModeZoo Homepage

Please check the official homepage.