Netflix Recommendation System: Analysis and Machine Learning Implementation

Leave a Comment / Business Intelligence, Data Science, Portfolio / By Hijir

Introduction

In the digital age, streaming services like Netflix continue to evolve by offering diverse content. To enhance user experience, machine learning-based recommendation systems play a crucial role in providing content tailored to individual preferences.

This project develops a content-based recommendation system using both supervised and unsupervised learning algorithms to identify patterns in data and suggest relevant movies to users.

Methodology

1. Data Preprocessing

Before building the model, data preprocessing was performed to ensure data quality:

Handling missing values in columns such as director, cast, country, date_added, rating, and duration.
Removing irrelevant attributes like duration and date_added.
Encoding categorical variables using Label Encoding and Bag of Words to transform text into numerical representations.

2. Exploratory Data Analysis (EDA)

Initial analysis of the Netflix dataset revealed several interesting patterns:

Content distribution by type shows that 69.6% of Netflix content consists of movies, while 30.4% are TV shows.
The United States produces the highest number of titles, with 2,819 pieces of content.
Highly correlated genres include Action with TV Action & Adventure and Romantic Movies with Romantic TV Shows.

3. Implementation of the Recommendation System

This project employs two primary approaches:

Content-Based Filtering: Uses cosine similarity and bag-of-words to calculate movie similarity based on attributes such as directors, actors, genres, and country of origin.
Clustering & Graph Representation:
- K-Means Clustering groups movies with similar descriptions using CountVectorizer.
- A NetworkX Graph is created where nodes represent movies, actors, directors, and genres, and relationships between entities are analyzed using cosine similarity.

For example, the system provides top recommendations for:

Ocean’s Twelve → Movies with similar action and crime elements.
Stranger Things → TV shows with mystery and adventure themes

Network Analysis

Top Recommendation for Ocean's Twelve

Top Recommendation for Stranger Things

Model	Metric	Value
KNN	Accuracy	1.00
Decision Tree	Accuracy	0.8638
Random Forest	Accuracy	0.8383
Logistic Regression	Accuracy	0.3990
Naive Bayes	Davies-Bouldin Index	1.310
K-Means Clustering	Davies-Bouldin Index	1.451

Netflix Recommendation System: Analysis and Machine Learning Implementation

Introduction

Methodology

1. Data Preprocessing

2. Exploratory Data Analysis (EDA)

3. Implementation of the Recommendation System

Network Analysis

Top Recommendation for Ocean's Twelve

Top Recommendation for Stranger Things

Model Evaluation

Evaluation Insights:

Business Recommendations

Conclusion

Leave a Comment Cancel Reply