DAMO-ConvAI

History

出蛰 b29637b237 add: star		2022-11-14 14:09:25 +08:00
..
asdl	add: star	2022-11-14 14:09:25 +08:00
preprocessed	add: star	2022-11-14 14:09:25 +08:00
snowball	add: star	2022-11-14 14:09:25 +08:00
utils	add: star	2022-11-14 14:09:25 +08:00
README.md	add: star	2022-11-14 14:09:25 +08:00
convert2id_electra.ipynb	add: star	2022-11-14 14:09:25 +08:00
preprocess.ipynb	add: star	2022-11-14 14:09:25 +08:00

README.md

Firstly, create conda environment data_construction:

 conda create -n text2sql python=3.6
 source activate text2sql
 pip install grakel
 python -c "import nltk; nltk.download('punkt')"
 cd snowball
 pip install -r requirements.txt

Download raw data and unzip it into the raw_data directory. Make sure the datasets are correctly located as:

data
├── database
├── tables.json
└── text_to_sql_data.json

Execute the command in the file preprocess.ipynb to generate three data files alldata.json,logic.json,question_sql.json in the preprocessed directory.
Follow the paper Logic-Consistency Text Generation from Semantic Parses, train a snowball model from scratch or just download our pre-trained checkpoint snollball and unzip it into the saves/checkpoint-epoch-10.0 directory. Then run the following command to generate the final_generation.json file:
```
 cd snowball
 python eval.py
```
Run the command in the file convert2id.ipynb to generate final pre-train data alltask_final.txt in the final_data directory.