Skip to content

Resolving Ambiguities in Text-to-SQL Systems . We propose the RAQS-SQL framework to Resolve Ambiguities in QuestionS for Text-to-SQL system. To handle schema-level ambiguity, we use a model that aligns the query intent directly with relevant database columns.

License

Notifications You must be signed in to change notification settings

pminhtam/RAQS-SQL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RAQS-SQL : Resolving Ambiguities in Text-to-SQL Systems

Code for paper: Resolving Ambiguities in Text-to-SQL Systems

We propose the RAQS-SQL framework to Resolve Ambiguities in QuestionS for Text-to-SQL system. To handle schema-level ambiguity, we use a model that aligns the query intent directly with relevant database columns. To handle value ambiguity, we introduce techniques leveraging semantic similarities and hierarchical entity relationships of value entities stored in the database and in the question.

Setup environment

conda create -n raqs-sql python=3.10.14
conda activate raqs-sql
pip install -r requirements.txt

Prepare dataset

Download the dataset

AmbiVal dataset can be downloaded from here AmbiVal dataset. Store dataset in misc/dataset/AmbiVal/ambival

python preprocessing/download.py --type ambival
cd misc/dataset/ambival/
tar -xvf database_test.tar 
tar -xvf ext_info.tar 

Spider and bird preprocess

python preprocessing/download.py --type spiderbird

Spider database can be download from here Spider dataset. Store database in misc/dataset/SpiderBIRD_dataset/

cd misc/dataset/SpiderBIRD_dataset
unzip spider_data.zip -d spider

BIRD database can be download from here BIRD dataset. Store database in misc/dataset/SpiderBIRD_dataset/

cd misc/dataset/SpiderBIRD_dataset
wget https://bird-bench.oss-cn-beijing.aliyuncs.com/dev.zip
unzip dev.zip -d bird
cd bird/dev_20240627/
unzip dev_databases.zip 

Download pre-trained models

Download relational classification and schema linking models

python preprocessing/download.py --type model

Prepare embeddings

AmbiVal

rm -rf misc/dataset/AmbiVal/ambival_embedding
python preprocessing/embedding.py --dataset ambival

Spider

rm -rf misc/dataset/SpiderBIRD_dataset/spider_embedding
python preprocessing/embedding.py --dataset spider

BIRD

rm -rf misc/dataset/SpiderBIRD_dataset/bird_embedding
python preprocessing/embedding.py --dataset bird

Run

Export api-key

export OPENAI_API_KEY=your_openai_api_key
export TOGETHER_API_KEY=your_together_api_key
export NVIDIA_API_KEY=your_nvidia_api_key

Run pipeline for AmbiVal dataset

python run.py --config_file=configs/api_qwen25_ambival.yaml  --output_path outputs/ambival_qwen25.jsonl

Spider

python run.py --config_file=configs/api_qwen25_spider_dev.yaml  --output_path outputs/spider_qwen25.jsonl

Bird

python run.py --config_file=configs/api_qwen25_bird_dev.yaml  --output_path outputs/bird_qwen25.jsonl

Eval

python eval/evaluate.py --dataset ambival --output_path outputs/ambival_qwen25.jsonl

About

Resolving Ambiguities in Text-to-SQL Systems . We propose the RAQS-SQL framework to Resolve Ambiguities in QuestionS for Text-to-SQL system. To handle schema-level ambiguity, we use a model that aligns the query intent directly with relevant database columns.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages