Book Recommendation System

🛠️ Tech Stack & Tools

Overview This project builds a book recommendation system using data collected from multiple sources. The goal is to demonstrate an end-to-end data analytics workflow, including web scraping, API integration, data cleaning, exploratory analysis, and deployment.

The final system allows users to explore books and receive recommendations through an interactive web application.

A comprehensive book recommendation app that combines both literal and semantic search engines with TF-IDF and SBERT recommendation systems.

App

🔗 Streamlit App

here

Photo credit (https://www.pexels.com/de-de/foto/gestapelte-bucher-1333742/)

🎥 Project Presentation

here

Features

Search Engine System

Search A (Literal): Direct keyword matching across all book fields
Search B (Semantic): SBERT-based semantic understanding
Combined System: Literal results first, then semantic results

Recommendation Engine System

Recommendation A (TF-IDF): Weighted feature similarity (Author 3x, Title 2x, Subjects 2x, Language 1x)
Recommendation B (SBERT): Semantic similarity understanding
Combined System: TF-IDF results first, then SBERT recommendations

User Flow

Main Page: Search interface with filters
Search Results: Combined search results with book selection
Book Details + Recommendations: Selected book with personalized recommendations

Installation

Install dependencies:

pip install -r requirements.txt

Run the app:

streamlit run app.py

Data Setup

The app expects the book dataset at one of these locations:

../../data/clean/books_merged_clean.csv
../data/clean/books_merged_clean.csv
data/clean/books_merged_clean.csv
books_merged_clean.csv

On first run, the app will generate SBERT embeddings and save them as book_embeddings.npy for faster subsequent startups.

Usage

Search: Type any query (title, author, topic, keyword)
Filter: Use language and year filters in sidebar
Explore: Click on books to see detailed recommendations
Navigate: Use sidebar buttons to switch between views

Architecture

Frontend: Streamlit web interface
Search: Combined literal + semantic search
Recommendations: TF-IDF + SBERT hybrid approach
Caching: Efficient model and data loading
Error Handling: Robust error messages and fallbacks

Data Sources

Dataset	Source	Purpose
openlibrary	https://openlibrary.org/subjects/awards	Core data for books and awards given

Workflow

Data collection (Scraping + API)
Data cleaning & deduplication
Exploratory analysis
Content-based recommendation logic
Deployment with Streamlit

Future Improvements

Add user-rating or popularity data
Implement similarity using text descriptions (NLP)
Improve genre standardisation
Expand dataset beyond 1000 books
Deploy using a cloud hosting platform

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.ipynb_checkpoints		.ipynb_checkpoints
API		API
Slides		Slides
Streamlit		Streamlit
data		data
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Book Recommendation System

🛠️ Tech Stack & Tools

App

Features

Search Engine System

Recommendation Engine System

User Flow

Installation

Data Setup

Usage

Architecture

Data Sources

Workflow

Future Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Book Recommendation System

🛠️ Tech Stack & Tools

App

Features

Search Engine System

Recommendation Engine System

User Flow

Installation

Data Setup

Usage

Architecture

Data Sources

Workflow

Future Improvements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages