DAMO-ConvAI/sunsql
出蛰 4e718b53aa fix: sunsql preprocess issue 2022-10-09 10:32:48 +08:00
..
asdl add: sunsql 2022-09-19 09:42:32 +08:00
model add: sunsql 2022-09-19 09:42:32 +08:00
preprocess add: sunsql 2022-09-19 09:42:32 +08:00
run fix: sunsql preprocess issue 2022-10-09 10:32:48 +08:00
scripts add: sunsql 2022-09-19 09:42:32 +08:00
utils add: sunsql 2022-09-19 09:42:32 +08:00
README.md add: sunsql 2022-09-19 09:42:32 +08:00
eval.py add: sunsql 2022-09-19 09:42:32 +08:00
evaluation.py add: sunsql 2022-09-19 09:42:32 +08:00
process_sql.py add: sunsql 2022-09-19 09:42:32 +08:00
setup.sh add: sunsql 2022-09-19 09:42:32 +08:00

README.md

SUN: Exploring Intrinsic Uncertainties in Text-to-SQL Parsers

This repository contains code for the COLING 2022 paper [SUN: Exploring Intrinsic Uncertainties in Text-to-SQL Parsers].

If you use SUN in your work, please cite it as follows:

@inproceedings{qin2022sun,
    title={SUN: Exploring Intrinsic Uncertainties in Text-to-SQL Parsers},
    author={Bowen Qin, Lihan Wang, Binyuan Hui, Bowen Li, Pengxiang Wei, Binhua Li, Fei Huang, Luo Si, Min Yang, Yongbin Li},
    booktitle={COLING},
    year={2022}
}

Codebase

Prepare Environment

The setup of the environment is exactly the same as that of LGESQL:

The environment-related commands are provided in setup.sh.

sh setup.sh

Download dataset.

Download, unzip and rename the spider_sun.zip into the directory data. In which, train_spider.json contains all pairs of data and train_rd.json contains the others data.

Preprocess dataset.

Preprocess the train and dev dataset, including input normalization, schema linking, graph construction and output actions generation.

./run/run_preprocessing.sh

Training

Training SUN with:

./run/run_lgesql_plm.sh msde electra-large-discriminator

Evaluation

For evaluation, see run/run_evaluation.sh and run/run_submission.sh (eval from scratch) for reference.

Acknowledgements

This implementation is based on RAT-SQL: Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers. and LGESQL: Line Graph Enhanced Text-to-SQL Model with Mixed Local and Non-Local Relations. Thanks to the author for releasing the code.