This repository contains code, data, and write-ups for the Information Retrieval course assignments and resources. It was organized to store three main assignments, datasets, results, and supporting materials (books, papers, slides).
- Course name: Information Retrieval
- Author / Owner:
Tuhin Mondal (22CS10087)
Top-level layout (folders and representative files):
A1-Boolean AND Retrieval Using Inverted Index/
โโ bool.py
โโ indexer.py
โโ parser.py
โโ README.md
โโ requirements.txt
โโ run.sh
โโ Dataset/
โโ Output/
โโ queries.txt
โโ results.txt
A2-Scoring and Evaluation/
โโ ranker.py
โโ evaluator.py
โโ README.md
โโ requirements.txt
โโ run.sh
โโ Dataset/
โ โโ TIME_Documents.txt
โ โโ TIME_Queries.txt
โ โโ TIME_Relevance.txt
โโ Evals/
โโ Ranks/
A3-Wordnet based Summarization/
โโ summarizer.py
โโ evaluator.py
โโ README.txt
โโ BBC_News_1K/
โโ Generated Summaries/
โโ News Articles/
โโ Summaries/
โโ Evals/
Books/
Research Papers/
Slides/
- Each assignment maintains its own
requirements.txt. Common dependencies for the course include NLTK, NumPy, and scikit-learn (check eachrequirements.txtfor exact pins).
Dataset/directories inside assignments contain the raw documents and query/relevance files used for experiments.Output/,Evals/, andRanks/folders contain generated outputs, ranked lists, and evaluation metrics. Do not overwrite these if you want to preserve results.
- The repository is structured to support learning and evaluation of Information Retrieval concepts:
- Boolean retrieval via inverted indexes
- Scoring and ranking with evaluation against ground truth relevance
- Summarization methods using lexical resources (WordNet)