DAMO-ConvAI/star/data_systhesis
出蛰 b29637b237 add: star 2022-11-14 14:09:25 +08:00
..
asdl add: star 2022-11-14 14:09:25 +08:00
preprocessed add: star 2022-11-14 14:09:25 +08:00
snowball add: star 2022-11-14 14:09:25 +08:00
utils add: star 2022-11-14 14:09:25 +08:00
README.md add: star 2022-11-14 14:09:25 +08:00
convert2id_electra.ipynb add: star 2022-11-14 14:09:25 +08:00
preprocess.ipynb add: star 2022-11-14 14:09:25 +08:00

README.md

  1. Firstly, create conda environment data_construction:

     conda create -n text2sql python=3.6
     source activate text2sql
     pip install grakel
     python -c "import nltk; nltk.download('punkt')"
     cd snowball
     pip install -r requirements.txt
    
  2. Download raw data and unzip it into the raw_data directory. Make sure the datasets are correctly located as:

data
├── database
├── tables.json
└── text_to_sql_data.json
  1. Execute the command in the file preprocess.ipynb to generate three data files alldata.json,logic.json,question_sql.json in the preprocessed directory.

  2. Follow the paper Logic-Consistency Text Generation from Semantic Parses, train a snowball model from scratch or just download our pre-trained checkpoint snollball and unzip it into the saves/checkpoint-epoch-10.0 directory. Then run the following command to generate the final_generation.json file:

     cd snowball
     python eval.py
    
  3. Run the command in the file convert2id.ipynb to generate final pre-train data alltask_final.txt in the final_data directory.