📘 SPPU Exam Preparation Chatbot

Automated Question Paper Scraper • Question Analysis • Trend Prediction • Study Insights

This project is an AI-powered exam preparation assistant built using Rasa + Python, capable of:

✔️ Fetching SPPU engineering question papers (pattern 2015/2019) ✔️ Scraping subject-wise PDFs directly from sppuquestionpapers.com ✔️ Extracting questions using regex + text-cleaning ✔️ Clustering & analyzing question trends (semantic + TF-IDF fallback) ✔️ Generating summaries of frequently asked questions ✔️ Creating a cluster analysis chart ✔️ Exporting results as JSON and PDF reports ✔️ Interacting through a friendly chat interface

Features

🔍 1. Automated Question Paper Scraping

Scrapes department → semester → subject → pattern tables.
Extracts all matching PDF links directly from the subject page.
Uses concurrency to download PDFs faster.

2. Intelligent Question Extraction

Extracts questions using robust patterns:
- Q1) a) question [6]
- Q2) b) ...
Cleans multi-line messy OCR text.

🤖 3. Semantic Question Clustering

Uses:

SentenceTransformer ("all-MiniLM-L6-v2") if available
TF-IDF + Agglomerative fallback if model unavailable
Groups similar questions under the same cluster
Shows:
- Representative question
- Cluster frequency
- Topic trends

📊 4. Visual Analysis

Generates:

Top cluster bar chart
Optionally sent inline or as a file (adaptive to channel)

📝 5. Report Generation

Exports:

paper_analysis.json
paper_analysis.pdf

Including:

extracted questions
cluster info
topics
frequent questions
difficulty estimate
question type counts
embedded chart

💬 6. Rasa Chatbot Interface

Guides users step-by-step:

Department
Semester
Subject
Pattern (2015/2019)
Runs analysis and sends the summary + chart + reports

🏗️ Architecture

Rasa Chatbot
     │
     ├── Form (department, semester, subject, pattern)
     │
     ├── Action: ActionFetchAndAnalyze
     │        ├── sppu_scraper.py  → table-based scraper
     │        ├── downloader.py    → multi-threaded downloads
     │        ├── parser.py        → question extraction
     │        ├── clustering.py    → semantic & fallback clusters
     │        ├── reporting.py     → json + pdf export
     │        └── image utils      → plot & inline send
     │
     └── Output:
            - Pretty summary (markdown)
            - Cluster plot
            - JSON report
            - PDF report

🛠️ Installation

1️⃣ Clone the repository

git clone https://github.com/your-username/sppu-exam-chatbot
cd sppu-exam-chatbot

2️⃣ Create a virtual environment

python3 -m venv venv
source venv/bin/activate

3️⃣ Install dependencies

pip install -r requirements.txt

4️⃣ Train Rasa

rasa train

5️⃣ Run Action Server

rasa run actions

6️⃣ Run Chatbot

rasa shell

📂 Project Structure

.
├── actions/
│   ├── actions.py
│   ├── sppu_scraper.py
│   └── __init__.py
│
├── data/
│   ├── nlu.yml
│   ├── stories.yml
│   └── rules.yml
│
├── domain.yml
├── config.yml
├── README.md
└── requirements.txt

🧪 Usage

Start bot:

rasa shell

Example conversation:

User: Can you analyze question papers from the web?
Bot: Which department?
User: Computer Engineering
Bot: Which semester?
User: Sem 5
Bot: Which subject?
User: Database Management Systems
Bot: Which pattern?
User: 2019 pattern
Bot: Found 16 papers. Downloading...
Bot: Analysis Complete! (summary + chart + PDF report)

📄 Outputs

Markdown summary (sample)

**Top recurring question types:**
1. (3 times) Consider following schema
2. (3 times) Write short note on...
3. (2 times) Compare DBMS and File Systems...
...

Generated Files:

paper_analysis.json
paper_analysis.pdf
cluster_analysis_plot.png

🤝 Contributing

Pull requests are welcome! Ideas, improvements, or scraper fixes are appreciated.

📜 License

MIT License Free for personal & academic use.

👤 Author

Hitesh Khare

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
actions		actions
data		data
.gitignore		.gitignore
README.md		README.md
config.yml		config.yml
credentials.yml		credentials.yml
debug_parser.py		debug_parser.py
domain.yml		domain.yml
endpoints.yml		endpoints.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📘 SPPU Exam Preparation Chatbot

Automated Question Paper Scraper • Question Analysis • Trend Prediction • Study Insights

Features

🔍 1. Automated Question Paper Scraping

2. Intelligent Question Extraction

🤖 3. Semantic Question Clustering

📊 4. Visual Analysis

📝 5. Report Generation

💬 6. Rasa Chatbot Interface

🏗️ Architecture

🛠️ Installation

1️⃣ Clone the repository

2️⃣ Create a virtual environment

3️⃣ Install dependencies

4️⃣ Train Rasa

5️⃣ Run Action Server

6️⃣ Run Chatbot

📂 Project Structure

🧪 Usage

Start bot:

📄 Outputs

Markdown summary (sample)

Generated Files:

🤝 Contributing

📜 License

👤 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📘 SPPU Exam Preparation Chatbot

Automated Question Paper Scraper • Question Analysis • Trend Prediction • Study Insights

Features

🔍 1. Automated Question Paper Scraping

2. Intelligent Question Extraction

🤖 3. Semantic Question Clustering

📊 4. Visual Analysis

📝 5. Report Generation

💬 6. Rasa Chatbot Interface

🏗️ Architecture

🛠️ Installation

1️⃣ Clone the repository

2️⃣ Create a virtual environment

3️⃣ Install dependencies

4️⃣ Train Rasa

5️⃣ Run Action Server

6️⃣ Run Chatbot

📂 Project Structure

🧪 Usage

Start bot:

📄 Outputs

Markdown summary (sample)

Generated Files:

🤝 Contributing

📜 License

👤 Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages