update: star readme

This commit is contained in:
出蛰 2022-11-14 14:19:10 +08:00
parent b29637b237
commit 0746daf1c0
1 changed files with 59 additions and 35 deletions

View File

@ -1,6 +1,19 @@
# STAR # 🌟 STAR: SQL Guided Pre-Training for Context-dependent Text-to-SQL Parsing
This is the project containing source code for the EMNLP 2022 paper "**STAR: SQL Guided Pre-Training for Context-dependent Text-to-SQL Parsing***"* <p align="center">
<a href="./LICENSE"><img src="https://img.shields.io/badge/license-MIT-red.svg">
</a>
<a href="https://github.com/huggingface/transformers/tree/main/examples/research_projects/tapex">
<img alt="🤗 transformers support" src="https://img.shields.io/badge/🤗 transformers-master-green" />
</a>
<a href="support os"><img src="https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg">
</a>
<a href=""><img src="https://img.shields.io/badge/python-3.7+-aff.svg">
</a>
<br />
</p>
This is the official project containing source code for the EMNLP 2022 paper "STAR: SQL Guided Pre-Training for Context-dependent Text-to-SQL Parsing"
You can use our checkpoint to evaluation directly or train from scratch with our instructions. You can use our checkpoint to evaluation directly or train from scratch with our instructions.
@ -10,8 +23,17 @@ You can use our checkpoint to evaluation directly or train from scratch with our
The relevant models and data involved in the paper can be downloaded through [Baidu Netdisk](https://pan.baidu.com/s/1uA63h4zpwyDSqY5cprbeJQ?pwd=6666), or downloaded through Google Drive in the corresponding folder. The relevant models and data involved in the paper can be downloaded through [Baidu Netdisk](https://pan.baidu.com/s/1uA63h4zpwyDSqY5cprbeJQ?pwd=6666), or downloaded through Google Drive in the corresponding folder.
## Citation
```
@article{cai2022star,
title={STAR: SQL Guided Pre-Training for Context-dependent Text-to-SQL Parsing},
author={Cai, Zefeng and Li, Xiangyu and Hui, Binyuan and Yang, Min and Li, Bowen and Li, Binhua and Cao, Zheng and Li, Weijie and Huang, Fei and Si, Luo and others},
journal={arXiv preprint arXiv:2210.11888},
year={2022}
}
```
## Pretrain ## 🪜 Pretrain
### Create conda environment ### Create conda environment
@ -49,7 +71,7 @@ python save_model.py
Then you can get the trained model and its configuration (at least containing `model.bin` and `config.json`) under `pretrained/sss` direction. Then you can get the trained model and its configuration (at least containing `model.bin` and `config.json`) under `pretrained/sss` direction.
## Finetuning ## 🚪 Fine-tuning and Evaluation
This section presents the results on CoSQL and SParC datasets with STAR fine-tuned with LGESQL. This section presents the results on CoSQL and SParC datasets with STAR fine-tuned with LGESQL.
@ -70,44 +92,46 @@ Create conda environment `lgesql`:
python -c "import stanza; stanza.download('en')" python -c "import stanza; stanza.download('en')"
python -c "import nltk; nltk.download('stopwords')" python -c "import nltk; nltk.download('stopwords')"
``` ```
## Using our checkpoint to evaluation: ### Using our checkpoint to evaluation:
- Download our processed datasets [CoSQL](https://drive.google.com/file/d/1suuQnHVPxZZKRiUBvsUIlw7BnY21Q_6u/view?usp=sharing) or [SParC](https://drive.google.com/file/d/1DrGBq7WGdieanq90TjkiO5JgZMwcDGUu/view?usp=sharing) and unzip them into the `cosql/data` and `sparc/data` respectively. Make sure the datasets are correctly located as: - Download our processed datasets [CoSQL](https://drive.google.com/file/d/1suuQnHVPxZZKRiUBvsUIlw7BnY21Q_6u/view?usp=sharing) or [SParC](https://drive.google.com/file/d/1DrGBq7WGdieanq90TjkiO5JgZMwcDGUu/view?usp=sharing) and unzip them into the `cosql/data` and `sparc/data` respectively. Make sure the datasets are correctly located as:
``` ```
data data
├── database ├── database
├── dev_electra.json ├── dev_electra.json
├── dev_electra.bin ├── dev_electra.bin
├── dev_electra.lgesql.bin ├── dev_electra.lgesql.bin
├── dev_gold.txt ├── dev_gold.txt
├── label.json ├── label.json
├── tables_electra.bin ├── tables_electra.bin
├── tables.json ├── tables.json
├── train_electra.bin ├── train_electra.bin
├── train_electra.json ├── train_electra.json
└── train_electra.lgesql.bin └── train_electra.lgesql.bin
``` ```
- Download our processed checkpoints [CoSQL](https://drive.google.com/file/d/1y4edJJ2xoA_JUGCoegEd8xLopAaUuvmp/view?usp=sharing) or [SParC](https://drive.google.com/file/d/1UDs956PgVlZT1hZ4pRm3Mox3Hs5u42sF/view?usp=sharing) and unzip them into the `cosql/checkpoints` and `sparc/checkpoints` respectively. Make sure the checkpoints are correctly located as: - Download our processed checkpoints [CoSQL](https://drive.google.com/file/d/1y4edJJ2xoA_JUGCoegEd8xLopAaUuvmp/view?usp=sharing) or [SParC](https://drive.google.com/file/d/1UDs956PgVlZT1hZ4pRm3Mox3Hs5u42sF/view?usp=sharing) and unzip them into the `cosql/checkpoints` and `sparc/checkpoints` respectively. Make sure the checkpoints are correctly located as:
``` ```
checkpoints checkpoints
├── model_IM.bin ├── model_IM.bin
└── params.json └── params.json
``` ```
- Execute the following command and the results are recorded in result_XXX.txt(it will take 10 to 30 minutes on one Tesla V100-PCIE-32GB GPU): - Execute the following command and the results are recorded in result_XXX.txt(it will take 10 to 30 minutes on one Tesla V100-PCIE-32GB GPU):
```
sh run/run_evaluation.sh
```
sh run/run_evaluation.sh ### Train from scratch
## Train from scratch
- You can train STAR yourself by following the process in the `pretrain` file or download our pre-trained [STAR](https://drive.google.com/file/d/1zfvNpofVzLixzzFyqLO0NP-WQSKKENIC/view?usp=sharing) and unzip it into the `pretrained_models/sss` directory. Make sure the STAR are correctly located as: - You can train STAR yourself by following the process in the `pretrain` file or download our pre-trained [STAR](https://drive.google.com/file/d/1zfvNpofVzLixzzFyqLO0NP-WQSKKENIC/view?usp=sharing) and unzip it into the `pretrained_models/sss` directory. Make sure the STAR are correctly located as:
``` ```
pretrained_models pretrained_models
└── sss └── sss
├── config.json ├── config.json
├── pytorch_model.bin ├── pytorch_model.bin
└── vocab.txt └── vocab.txt
``` ```
- You can preprocess the data with the `process_data&&label.py` file and refer to the methods in LGESQL, or download our processed data as described above directly. - You can preprocess the data with the `process_data&&label.py` file and refer to the methods in LGESQL, or download our processed data as described above directly.
- Traning: - Traning:
(it will take 4 days on one Tesla V100-PCIE-32GB GPU) (it will take 4 days on one Tesla V100-PCIE-32GB GPU)
``` ```
sh run/run_lgesql_plm.sh sh run/run_lgesql_plm.sh
``` ```