This project demonstrates a Text-to-SQL model comparison application using Streamlit, enhanced with Unsloth for optimized model inference and RAGAS (Retrieval-Augmented Generation Accuracy Scoring) for evaluation. The app allows users to compare the performance of different models in generating SQL queries from natural language inputs and provides explanations for the generated SQL queries.
- Python 3.8 or higher
streamlitpandastorchtransformersunsloth
- Recommended for faster inference
git clone https://github.com/your_username/text-to-sql-comparison.git
cd text-to-sql-comparisonpip install -r requirements.txt- Place your pre-trained and fine-tuned models in the
models/directory. - Ensure that your validation data (
validation_data.csv) is placed in the appropriate directory.
streamlit run explanation.py- Open the app in your browser at http://localhost:8501.
- Select a model (e.g., Base Mistral, Fine-Tuned Llama3).
- Choose a domain (e.g., Artificial Intelligence, Aerospace).
- Select an SQL complexity level (e.g., Basic SQL, Aggregation).
- Pick a sample query from the filtered list.
- Click "Run Prediction" to generate the SQL query and its explanation.
- Unsloth optimizes model inference using 4-bit quantization and efficient GPU utilization, reducing runtime latency.
- The
predict_sql()function supports two modes:- Prediction: Generates an SQL query from a natural language prompt.
- Explanation: Explains how the generated SQL query answers the original question.
- The evaluation pipeline (
Pipeline.ipynb) uses RAGAS metrics to assess model performance:- Semantic Equivalence: Measures how semantically similar the predicted query is to the ground truth.
- Exact Match Accuracy: Checks if the predicted query matches the ground truth exactly.
- Syntax Correctness: Validates if the predicted query is syntactically correct.
-
Select "Fine-Tuned Mistral" as the model.
-
Choose "Artificial Intelligence" as the domain.
-
Select "Basic SQL" as the complexity level.
-
Pick a sample query:
What is the average explainability score of creative AI applications in 'Europe' and 'North America'? -
The app will display:
- Input Prompt
- Ground Truth SQL Query
- Predicted SQL Query
- Explanation of the Predicted SQL Query
-
Run RAGAS evaluation using
Pipeline.ipynbto analyze model performance.
- The evaluation notebook (
Pipeline.ipynb) provides detailed metrics for each model using RAGAS:- Semantic Equivalence Score: Measures how semantically accurate predictions are compared to ground truth.
- Exact Match Accuracy: Validates if predictions exactly match ground truth queries.
- Syntax Correctness Score: Checks if predictions are syntactically valid.
- Results are visualized through plots comparing base vs fine-tuned models across these metrics.
- Add more domains and query complexities to enhance testing diversity.
- Integrate additional evaluation metrics like BLEU or ROUGE scores.
- Incorporate schema-aware generation techniques for improved accuracy.
This project is licensed under the MIT License.
Special thanks to:
- Streamlit for enabling rapid UI development.
- Hugging Face Transformers for providing pre-trained models.
- Unsloth for efficient inference optimization.
- RAGAS for robust evaluation metrics tailored to text-to-SQL tasks.
Feel free to reach out with any questions or suggestions! 😊