Netflix Recommendation System: Analysis and Machine Learning Implementation

Introduction

In the digital age, streaming services like Netflix continue to evolve by offering diverse content. To enhance user experience, machine learning-based recommendation systems play a crucial role in providing content tailored to individual preferences.

This project develops a content-based recommendation system using both supervised and unsupervised learning algorithms to identify patterns in data and suggest relevant movies to users.

Methodology

1. Data Preprocessing

Before building the model, data preprocessing was performed to ensure data quality:

  • Handling missing values in columns such as director, cast, country, date_added, rating, and duration.
  • Removing irrelevant attributes like duration and date_added.
  • Encoding categorical variables using Label Encoding and Bag of Words to transform text into numerical representations.

2. Exploratory Data Analysis (EDA)

Initial analysis of the Netflix dataset revealed several interesting patterns:

  • Content distribution by type shows that 69.6% of Netflix content consists of movies, while 30.4% are TV shows.
  • The United States produces the highest number of titles, with 2,819 pieces of content.
  • Highly correlated genres include Action with TV Action & Adventure and Romantic Movies with Romantic TV Shows.

3. Implementation of the Recommendation System

This project employs two primary approaches:

  1. Content-Based Filtering: Uses cosine similarity and bag-of-words to calculate movie similarity based on attributes such as directors, actors, genres, and country of origin.
  2. Clustering & Graph Representation:
    • K-Means Clustering groups movies with similar descriptions using CountVectorizer.
    • A NetworkX Graph is created where nodes represent movies, actors, directors, and genres, and relationships between entities are analyzed using cosine similarity.

For example, the system provides top recommendations for:

  • Ocean’s Twelve → Movies with similar action and crime elements.
  • Stranger Things → TV shows with mystery and adventure themes

Network Analysis

Top Recommendation for Ocean's Twelve

Top Recommendation for Stranger Things

Model Evaluation

Various machine learning models were tested to measure performance:

ModelMetricValue
KNNAccuracy1.00
Decision TreeAccuracy0.8638
Random ForestAccuracy0.8383
Logistic RegressionAccuracy0.3990
Naive BayesDavies-Bouldin Index1.310
K-Means ClusteringDavies-Bouldin Index1.451

Evaluation Insights:

  • KNN achieved the highest accuracy (100%), making it highly effective for recommendation purposes.
  • Decision Tree and Random Forest performed well, with accuracy above 83%.
  • Logistic Regression performed poorly, likely due to the non-linearity of feature relationships.
  • Naive Bayes and K-Means Clustering produced well-structured clusters, with Davies-Bouldin Index values indicating good cluster separation.

Business Recommendations

Based on the findings, several business strategies can be implemented for the streaming industry:

  1. Personalized Content: Using KNN as the main model can improve user experience by providing more accurate movie recommendations.
  2. User Segmentation: Applying K-Means Clustering to group users based on their viewing preferences, allowing for targeted content recommendations.
  3. Marketing Optimization: Leveraging recommendations based on similarities in actors, directors, genres, or country of origin to engage users more effectively.

Conclusion

The recommendation system developed in this project utilizes multiple machine learning methods to improve the accuracy of movie suggestions for Netflix users. By employing content-based filtering and cluster analysis, the system offers a more personalized and relevant streaming experience.

Leave a Comment

Your email address will not be published. Required fields are marked *