This project presents an in-depth Exploratory Data Analysis (EDA) of Netflix's extensive catalog of movies and TV shows. The primary objective is to uncover patterns, trends, and insights that can inform content strategies and enhance user engagement. Utilizing Python's robust data analysis libraries, this analysis provides a comprehensive look into Netflix's offerings.
The dataset encompasses a diverse range of information about Netflix titles, including:
- Title: Name of the movie or TV show
- Type: Categorization as a Movie or TV Show
- Director: Director(s) of the title
- Cast: Leading actors and actresses
- Country: Country of production
- Date Added: Date the title was added to Netflix
- Release Year: Original release year
- Rating: Content rating (e.g., PG, TV-MA)
- Duration: Length of the movie or number of seasons
- Genres: Categories or genres the title falls under
- Description: Brief synopsis of the title
π Source: Netflix Movies and TV Shows Dataset on Kaggle
- Data Cleaning: Handle missing values and correct data types
- Content Analysis: Examine the distribution of content types, genres, and ratings
- Temporal Trends: Analyze how content additions have evolved over the years
- Geographical Insights: Identify countries contributing the most content
- Top Contributors: Highlight prolific directors and actors
- Duration Patterns: Explore the distribution of movie durations and TV show seasons
- Python 3.7+
- Pandas: Data manipulation and analysis
- NumPy: Numerical operations
- Matplotlib: Data visualization
- Seaborn: Statistical data visualization
- Jupyter Notebook: Interactive coding environment