Code for paper: Resolving Ambiguities in Text-to-SQL Systems
We propose the RAQS-SQL framework to Resolve Ambiguities in QuestionS for Text-to-SQL system. To handle schema-level ambiguity, we use a model that aligns the query intent directly with relevant database columns. To handle value ambiguity, we introduce techniques leveraging semantic similarities and hierarchical entity relationships of value entities stored in the database and in the question.
conda create -n raqs-sql python=3.10.14
conda activate raqs-sql
pip install -r requirements.txtAmbiVal dataset can be downloaded from here AmbiVal dataset. Store dataset in misc/dataset/AmbiVal/ambival
python preprocessing/download.py --type ambival
cd misc/dataset/ambival/
tar -xvf database_test.tar
tar -xvf ext_info.tar Spider and bird preprocess
python preprocessing/download.py --type spiderbirdSpider database can be download from here Spider dataset.
Store database in misc/dataset/SpiderBIRD_dataset/
cd misc/dataset/SpiderBIRD_dataset
unzip spider_data.zip -d spiderBIRD database can be download from here BIRD dataset.
Store database in misc/dataset/SpiderBIRD_dataset/
cd misc/dataset/SpiderBIRD_dataset
wget https://bird-bench.oss-cn-beijing.aliyuncs.com/dev.zip
unzip dev.zip -d bird
cd bird/dev_20240627/
unzip dev_databases.zip Download relational classification and schema linking models
python preprocessing/download.py --type modelAmbiVal
rm -rf misc/dataset/AmbiVal/ambival_embedding
python preprocessing/embedding.py --dataset ambivalSpider
rm -rf misc/dataset/SpiderBIRD_dataset/spider_embedding
python preprocessing/embedding.py --dataset spiderBIRD
rm -rf misc/dataset/SpiderBIRD_dataset/bird_embedding
python preprocessing/embedding.py --dataset birdExport api-key
export OPENAI_API_KEY=your_openai_api_key
export TOGETHER_API_KEY=your_together_api_key
export NVIDIA_API_KEY=your_nvidia_api_keyRun pipeline for AmbiVal dataset
python run.py --config_file=configs/api_qwen25_ambival.yaml --output_path outputs/ambival_qwen25.jsonlSpider
python run.py --config_file=configs/api_qwen25_spider_dev.yaml --output_path outputs/spider_qwen25.jsonlBird
python run.py --config_file=configs/api_qwen25_bird_dev.yaml --output_path outputs/bird_qwen25.jsonlpython eval/evaluate.py --dataset ambival --output_path outputs/ambival_qwen25.jsonl