The Multi-Agent-Recommender-System-MARS project is an intelligent chatbot designed to provide movie-related information and personalized recommendations. It leverages a multi-agent architecture where specialized AI agents collaborate and delegate tasks to fulfill user queries. This system integrates large language models (LLMs) with a custom vector database for semantic search and utilizes dynamically generated content for enhanced functionality.
- Movie Plot Retrieval: Get concise plot summaries for movies.
- Movie Rating Retrieval: Obtain IMDb ratings and vote counts for movies.
- Personalized Movie Recommendations: Receive movie suggestions based on a liked movie's themes and your viewing preferences.
- User Preference Tracking: Store and update user's liked and disliked movies for better personalization.
- Multi-Agent Collaboration: Intelligent routing of user queries to specialized agents.
The system is built around a multi-agent architecture, orchestrated by a central Manager Agent. This manager acts as the brain, routing user requests to the most appropriate specialized "sub-agent" for processing. Each agent is powered by a large language model (LLM) and may have access to specific tools.
User Query
|
v
[ Root Agent (movie_chatbot_manager) ]
| Delegates based on query type
+-------------------------------------------------+
| | | |
v v v v
[ Profile Agent ] [ Movie Info Agent ] [ Recommender Agent ] [ Critic Agent ]
(Manages user (Retrieves factual (Generates movie (Analyzes movies
likes/dislikes) movie data/plots) recommendations) for thematic summaries)
|
+-------------------+
|
v
[ Movie Tools ]
(e.g., search_movies, get_movie_rating, recommend_movies)
|
v
[ Storage ]
(e.g., IMDb data, Vector DB/FAISS Index)
The project is structured with distinct components for agents, tools, and data storage:
-
manager_agent/agent.py(Root Agent):- Role: The central orchestrator. It receives user queries and, based on predefined routing logic, delegates the task to the most suitable sub-agent.
- Instruction: Guides the agent to identify user intent (recommendation, factual question, preference update) and route to
recommender_agent,movie_info_agent, orprofile_agentrespectively. - Model:
gemini-1.5-pro.
-
sub_agents/profile_agent/agent.py(Profile Agent):- Role: Manages user preferences (liked/disliked movies) in the session state.
- Tools:
update_user_preferencestool.
-
sub_agents/movie_info_agent/agent.py(Movie Info Agent):- Role: Answers factual questions about movies, including plots, actors, and genres.
- Instruction: Explicitly guided to use the
search_moviestool and to specifically extract and provide the content of the 'plot' field when asked for a movie's plot. Also tasked with listing actors from theStar Castfield. - Tools:
search_moviestool. - Model:
gemini-1.5-flash.
-
sub_agents/critic_agent/agent.py(Critic Agent):- Role: Provides thematic analysis or summaries of a movie, which are then used as
base_queryfor recommendations. - Model:
gemini-1.5-pro.
- Role: Provides thematic analysis or summaries of a movie, which are then used as
-
sub_agents/recommender_agent/agent.py(Recommender Agent):- Role: Generates movie recommendations based on user preferences and movie themes.
- Instruction: Designed to first use the
critic_agentto get a thematic summary of a liked movie, and then use that summary as a query for therecommend_moviestool. Crucially, it is explicitly instructed to only recommend ACTUAL, EXISTING MOVIES from the tool's results and ABSOLUTELY NOT to invent or alter movie titles or plots. It must extract 'title' and 'plot' fields exactly as provided by the tool. - Tools:
critic_agent(as an AgentTool) andrecommend_moviestool. - Model:
gemini-1.5-pro.
This module defines the callable functions that agents use to interact with data or external services.
get_movie_rating(title: str): Fetches IMDb rating and votes for a given movie.search_movies(query: str): Performs keyword-based search across movie titles, genres, cast, and directors. This tool is responsible for retrieving movie details, including the generated plot.recommend_movies(...): The core recommendation tool. It takes abase_query(thematic summary from critic),liked_movies, anddisliked_movies, and performs a semantic search using the vector store.update_user_preferences(...): Saves user's liked/disliked movies to the session state.
This package manages data access and persistence.
imdb_cleaned.csv(data/): The primary movie dataset. This file was initially sourced but later enriched with aGenerated_Plotcolumn containing LLM-generated plot summaries.storage/movie_data_access.py:- Role: Handles loading
imdb_cleaned.csvinto a Pandas DataFrame. - Functions:
search_movies_by_keywords(which now correctly returns theGenerated_Plotfor factual queries) andget_rating_by_title.
- Role: Handles loading
embeddings/imdb_embeddings.csv: Stores the numerical vector representations (embeddings) of the movies.vectorstore/imdb_faiss.index: The FAISS index, built from the embeddings, enabling efficient nearest-neighbor searches for recommendations.storage/vector_db.py:- Role: Manages the creation, loading, and searching of movie embeddings and the FAISS index.
- Embedding Generation: Crucially, this module's
_load_or_create_embeddingsmethod was modified to use both theGenerated_PlotandGenreof a movie to create its embedding, providing a richer semantic representation. - Embedding Model: Uses
SentenceTransformer("all-MiniLM-L12-v2")(or similar) to generate embeddings.
- User Input: "What is the plot of 'Blade Runner 2049'?"
Manager Agent: Identifies the query as a factual movie question and delegates tomovie_info_agent.Movie Info Agent: Receives the query. Its instruction guides it to use thesearch_moviestool.search_moviestool: Callsmovie_data_access.py'ssearch_movies_by_keywordsfunction with "Blade Runner 2049".movie_data_access.py: Searchesimdb_cleaned.csvfor the movie. Retrieves the row, including theGenerated_PlotandStar Cast.search_moviestool: Returns the movie data (including 'plot') to theMovie Info Agent.Movie Info Agent: Based on its updated instruction, it specifically extracts the 'plot' field from the tool's results and presents it to the user.
- User Input: "Can you recommend a movie like 'Parasite'?" (or "Since I liked 'Electric Mind'...")
Manager Agent: Identifies the query as a recommendation request and delegates torecommender_agent.Recommender Agent:- Identifies the
liked_movie/base_query(e.g., "Parasite" or "Electric Mind"). - Calls the
critic_agentwith the identified movie.
- Identifies the
Critic Agent: Analyzes the movie and returns a JSON string containing a thematicsummary(e.g., "a film about class disparity and social satire").Recommender Agent: Parses thesummaryfrom the critic's response. Uses thissummaryas thebase_queryfor therecommend_moviestool.recommend_moviestool:- Embeds the
base_query(thematic summary). - Uses
movie_retriever_instance.search(fromvector_db.py) to find the most semantically similar movies in the FAISS index. This search is now based on the embeddings created fromGenerated_Plot+Genre. - Retrieves the full movie data (including
Title,Generated_Plot,Genre,IMDb Rating, etc.) for the top similar movies fromimdb_cleaned.csv. - (Optional future step): Can apply filtering (e.g., exclude the input movie) or re-ranking (e.g., by IMDb Rating).
- Returns a list of recommended movie dictionaries to the
Recommender Agent.
- Embeds the
Recommender Agent: Based on its strict instructions, it iterates through the list of actual movies returned by therecommend_moviestool, extracting theirtitleandplot(which isGenerated_Plot) exactly as provided. It then presents these to the user with a brief explanation of why they were chosen.
-
Clone the Repository:
git clone [https://github.com/Vedant-1012/Multi-Agent-Recommender-System-MARS-.git](https://github.com/Vedant-1012/Multi-Agent-Recommender-System-MARS-.git) cd Multi-Agent-Recommender-System-MARS -
Create and Activate Virtual Environment:
python -m venv .venv source .venv/bin/activate -
Install Dependencies:
pip install -r requirements.txt # Ensure google-generativeai, pandas, python-dotenv, faiss-cpu, sentence-transformers, uvicorn, fastapi are listed. # If not, install them manually: pip install google-generativeai pandas python-dotenv faiss-cpu sentence-transformers uvicorn fastapi
-
Obtain API Key:
- Get a Google Cloud API Key (for Gemini models).
- Create a
.envfile in your project's root directory. - Add your API key to the
.envfile:GEMINI_API_KEY="YOUR_ACTUAL_GEMINI_API_KEY"
-
Prepare Data (Generate Plots):
- The
imdb_cleaned.csvfile should be in thedata/directory. - Run the plot generation script (e.g.,
generate_plots.pyfrom our discussions) to populate theGenerated_Plotcolumn in yourimdb_cleaned.csv. This script will overwrite the file.python generate_plots.py
- The
-
Generate Embeddings and FAISS Index:
- Crucially, delete the old embedding and FAISS index files to force regeneration with the updated
vector_db.pylogic:(Userm embeddings/imdb_embeddings.csv rm vectorstore/imdb_faiss.index
delinstead ofrmon Windows if needed). - The embeddings and FAISS index will be created automatically when you start the application for the first time after deletion.
- Crucially, delete the old embedding and FAISS index files to force regeneration with the updated
-
Run the Application:
uvicorn main:app --reload
The application will start, and you will see logs indicating the embedding generation process.
Once the Uvicorn server is running (typically on http://127.0.0.1:8000), you can interact with your chatbot through the provided interface (e.g., a web UI if you have one, or by sending POST requests to the /chat endpoint).
Example Queries:
- "What is the plot of 'Parasite'?"
- "Who are the actors in 'The Matrix'?"
- "Can you recommend movies like 'Electric Mind'?"
- "I loved 'Inception'." (This will update your profile and could influence future recommendations).
This project involved several key debugging challenges:
-
Initial Plot Retrieval Failure:
- Problem: The chatbot couldn't provide movie plots initially.
- Cause: The original
imdb_cleaned.csvlacked a dedicated 'plot' column. - Solution: Implemented a script using an LLM to generate
Generated_Plotsummaries for all movies based on existing metadata (Title, Genre, Director, Star Cast) and added this as a new column to the dataset.
-
Recommendation Hallucination:
- Problem: The recommender agent was inventing movie titles and plots instead of recommending actual movies.
- Cause: The LLM-powered recommender agent's instruction was too generic, allowing it to generate creative responses rather than strictly using tool output.
- Solution: Drastically refined the
recommender_agent's instruction to explicitly command it to only use actual movie titles and plots provided by itsrecommend_moviestool and to "ABSOLUTELY DO NOT INVENT, ALTER, OR HALLUCINATE."
-
Embedding Generation Issues:
- Problem: The embedding generation process would get stuck at 0% during application startup.
- Cause: This was primarily due to resource demands or potential compatibility issues with the initially chosen larger embedding model (
all-mpnet-base-v2) on the specific system, or an internal hang during batch processing. - Solution: Switched to a different, more stable, and less resource-intensive SentenceTransformer model (
all-MiniLM-L12-v2). Ensured old embedding and FAISS index files were deleted to force regeneration with the new content and model.
-
Persistent
faissModule Error (during development/testing):- Problem:
ModuleNotFoundError: No module named 'faiss'when trying to run Python code or scripts. - Cause: The
faiss-cpulibrary was not correctly installed or accessible within the active Python virtual environment for the execution context. - Solution: Ensured
faiss-cpuwas installed viapip install faiss-cpuin the activated virtual environment.
- Problem:
-
LLM Plot Accuracy for Recommendations (Remaining Challenge):
- Problem: For some recommended movies, the chatbot's provided plot still deviated from the actual movie's plot, even when the
Generated_Plotin the CSV was meant to be accurate. - Cause: The LLM within the
recommender_agentmight still, at times, re-synthesize or rephrase theGenerated_Plotfrom the tool's output instead of presenting it verbatim, especially for obscure films where its internal knowledge might conflict or the generated plot is very generic. - Partial Solution (and Future Work): While instructions are strict, this is an inherent LLM behavior. The best current mitigation is improved
Generated_Plotquality (official synopses) and post-retrieval filtering.
- Problem: For some recommended movies, the chatbot's provided plot still deviated from the actual movie's plot, even when the
- Integrate Official Plot Summaries: Implement fetching and incorporating actual, verified plot summaries from APIs like OMDb or TMDb directly into
imdb_cleaned.csv. This is the most significant step for improving semantic accuracy. - Implement Post-Retrieval Ranking & Filtering: Add logic in
movie_tools.pyto:- Filter out the input movie from recommendations.
- Filter by minimum
IMDb RatingorMetaScoreto ensure quality recommendations. - Re-rank semantically similar results by popularity or rating.
- Hybrid Recommendation System: Explore integrating collaborative filtering techniques (like BPR) if user interaction data becomes available.
- Advanced Embedding Models: Experiment with even larger or task-specific embedding models if further semantic improvements are needed and resources allow.
- Streamline Data Pipeline: Automate the data enrichment (plot generation/fetching) and embedding generation as part of a more robust CI/CD pipeline.
This project is licensed under the MIT License.