forked from mindspore-Ecosystem/mindspore
65f9fc3f48 | ||
---|---|---|
.. | ||
scripts | ||
src | ||
README.md | ||
eval.py | ||
requirements.txt | ||
train.py |
README.md
Contents
- DQN Description
- Model Architecture
- Dataset
- Requirements
- Script Description
- Model Description
- Description of Random Situation
- ModelZoo Homepage
DQN Description
DQN is the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. Paper Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves et al. "Human-level control through deep reinforcement learning." nature 518, no. 7540 (2015): 529-533.
Model Architecture
The overall network architecture of DQN is show below:
Dataset
Requirements
-
Hardware(Ascend/GPU/CPU)
- Prepare hardware environment with Ascend or GPU processor.
-
Framework
-
For more information, please check the resources below:
-
third-party libraries
pip install gym
Script Description
Scripts and Sample Code
├── dqn
├── README.md # descriptions about DQN
├── scripts
│ ├──run_standalone_eval_ascend.sh # shell script for evaluation with Ascend
│ ├──run_standalone_eval_gpu.sh # shell script for evaluation with GPU
│ ├──run_standalone_train_ascend.sh # shell script for train with Ascend
│ ├──run_standalone_train_gpu.sh # shell script for train with GPU
├── src
│ ├──agent.py # model agent
│ ├──config.py # parameter configuration
│ ├──dqn.py # dqn architecture
├── train.py # training script
├── eval.py # evaluation script
Script Parameter
'gamma': 0.8 # the proportion of choose next state value
'epsi_high': 0.9 # the highest exploration rate
'epsi_low': 0.05 # the Lowest exploration rate
'decay': 200 # number of steps to start learning
'lr': 0.001 # learning rate
'capacity': 100000 # the capacity of data buffer
'batch_size': 512 # training batch size
'state_space_dim': 4 # the environment state space dim
'action_space_dim': 2 # the action dim
Training Process
# training example
python
Ascend: python train.py --device_target Ascend --ckpt_path ckpt > log.txt 2>&1 &
GPU: python train.py --device_target GPU --ckpt_path ckpt > log.txt 2>&1 &
shell:
Ascend: sh run_standalone_train_ascend.sh ckpt
GPU: sh run_standalone_train_gpu.sh ckpt
Evaluation Process
# evaluat example
python
Ascend: python eval.py --device_target Ascend --ckpt_path .ckpt/checkpoint_dqn.ckpt
GPU: python eval.py --device_target GPU --ckpt_path .ckpt/checkpoint_dqn.ckpt
shell:
Ascend: sh run_standalone_eval_ascend.sh .ckpt/checkpoint_dqn.ckpt
GPU: sh run_standalone_eval_gpu.sh .ckpt/checkpoint_dqn.ckpt
Performance
Inference Performance
Parameters | DQN |
---|---|
Resource | Ascend 910; CPU 2.60GHz, 192cores; Memory 755G; OS Euler2.8 |
uploaded Date | 03/10/2021 (month/day/year) |
MindSpore Version | 1.1.0 |
Training Parameters | batch_size = 512, lr=0.001 |
Optimizer | RMSProp |
Loss Function | MSELoss |
outputs | probability |
Params (M) | 7.3k |
Scripts | https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/rl/dqn |
Description of Random Situation
We use random seed in train.py.
ModeZoo Homepage
Please check the official homepage.