This project is an AI-powered exam preparation assistant built using Rasa + Python, capable of:
✔️ Fetching SPPU engineering question papers (pattern 2015/2019) ✔️ Scraping subject-wise PDFs directly from sppuquestionpapers.com ✔️ Extracting questions using regex + text-cleaning ✔️ Clustering & analyzing question trends (semantic + TF-IDF fallback) ✔️ Generating summaries of frequently asked questions ✔️ Creating a cluster analysis chart ✔️ Exporting results as JSON and PDF reports ✔️ Interacting through a friendly chat interface
- Scrapes department → semester → subject → pattern tables.
- Extracts all matching PDF links directly from the subject page.
- Uses concurrency to download PDFs faster.
-
Extracts questions using robust patterns:
Q1) a) question [6]Q2) b) ...
-
Cleans multi-line messy OCR text.
Uses:
-
SentenceTransformer ("all-MiniLM-L6-v2") if available
-
TF-IDF + Agglomerative fallback if model unavailable
-
Groups similar questions under the same cluster
-
Shows:
- Representative question
- Cluster frequency
- Topic trends
Generates:
- Top cluster bar chart
- Optionally sent inline or as a file (adaptive to channel)
Exports:
paper_analysis.jsonpaper_analysis.pdf
Including:
- extracted questions
- cluster info
- topics
- frequent questions
- difficulty estimate
- question type counts
- embedded chart
Guides users step-by-step:
- Department
- Semester
- Subject
- Pattern (2015/2019)
- Runs analysis and sends the summary + chart + reports
Rasa Chatbot
│
├── Form (department, semester, subject, pattern)
│
├── Action: ActionFetchAndAnalyze
│ ├── sppu_scraper.py → table-based scraper
│ ├── downloader.py → multi-threaded downloads
│ ├── parser.py → question extraction
│ ├── clustering.py → semantic & fallback clusters
│ ├── reporting.py → json + pdf export
│ └── image utils → plot & inline send
│
└── Output:
- Pretty summary (markdown)
- Cluster plot
- JSON report
- PDF report
git clone https://github.com/your-username/sppu-exam-chatbot
cd sppu-exam-chatbotpython3 -m venv venv
source venv/bin/activatepip install -r requirements.txtrasa trainrasa run actionsrasa shell.
├── actions/
│ ├── actions.py
│ ├── sppu_scraper.py
│ └── __init__.py
│
├── data/
│ ├── nlu.yml
│ ├── stories.yml
│ └── rules.yml
│
├── domain.yml
├── config.yml
├── README.md
└── requirements.txt
rasa shell
Example conversation:
User: Can you analyze question papers from the web?
Bot: Which department?
User: Computer Engineering
Bot: Which semester?
User: Sem 5
Bot: Which subject?
User: Database Management Systems
Bot: Which pattern?
User: 2019 pattern
Bot: Found 16 papers. Downloading...
Bot: Analysis Complete! (summary + chart + PDF report)
**Top recurring question types:**
1. (3 times) Consider following schema
2. (3 times) Write short note on...
3. (2 times) Compare DBMS and File Systems...
...
paper_analysis.json
paper_analysis.pdf
cluster_analysis_plot.png
Pull requests are welcome! Ideas, improvements, or scraper fixes are appreciated.
MIT License Free for personal & academic use.
Hitesh Khare