Predicting Car Selling Prices with Machine Learning

In this project, I set out to predict car selling prices using a dataset of various car features such as engine capacity, fuel type, and mileage. This task can be particularly useful for individuals or dealerships looking to estimate the fair market value of vehicles based on their characteristics.

 

1. Introduction

The used car market is thriving, and with the rapid evolution of technology, we can leverage machine learning models to predict the selling price of a car. The goal of this project was to build a regression model that accurately predicts car prices based on key features like engine size, mileage, fuel type, and others.

2. Data Overview

The dataset used in this project contained multiple columns, such as:

  • Year: The manufacturing year of the car.
  • KM Driven: How far the car has been driven.
  • Fuel Type: Type of fuel used (Diesel, Petrol, CNG, etc.).
  • Seller Type: Whether the seller is an individual or a dealership.
  • Transmission: Whether the car has an automatic or manual transmission.
  • Seats: Number of seats in the car.
  • Engine Capacity: The engine’s capacity in cubic centimeters.
  • Max Power: The maximum power output of the engine in horsepower.

After cleaning the data, we were able to move forward with the modeling process.

3. Exploratory Data Analysis (EDA)

Before building the model, it’s crucial to understand the relationships between the features. Here’s a heatmap that visualizes the correlation between numeric columns:

As seen above, the selling price has the highest correlation with the max power output of the car, indicating that cars with higher power tend to have higher selling prices. Despite this, none of the features showed such high correlation that it would warrant their exclusion.

4. Feature Engineering

To prepare the data for machine learning, I performed several transformations:

  • Converted categorical data, such as fuel type and seller type, into numerical values.
  • Scaled the features to standardize the data and improve model performance.

The final set of features used for training the model were:

[‘year’, ‘km_driven’, ‘fuel’, ‘seller_type’, ‘transmission’, ‘seats’,
‘torque_rpm’, ‘mil_kmpl’, ‘engine_cc’, ‘max_power_new’, ‘First Owner’,
‘Fourth & Above Owner’, ‘Second Owner’, ‘Test Drive Car’, ‘Third Owner’]

5. Building the Model

For this task, I chose to use a Random Forest Regressor, which is a robust machine learning algorithm that works well with both numerical and categorical data. Random Forests are also less prone to overfitting compared to simpler decision tree models.

6. Model Evaluation

The model was evaluated using the coefficient of determination (R² score), which measures how well it explains the variance in the target variable.

Model Performance Analysis

  • Training Accuracy: 99.12%
    → Excellent fit to the training data.
  • Test Accuracy: 97.13%
    → Strong performance on unseen data, though a slight overfitting tendency may exist.

Key Observations

High R² score, indicating that most of the variance is well explained.
Small gap between training and test accuracy, suggesting good generalization with a minor risk of overfitting.
If overfitting is a concern, potential improvements could include feature selection, hyperparameter tuning, or reducing model complexity.

7. Making Predictions

To test the model, I used a new set of car data and made predictions. Here’s an example of a prediction for a 2017 car with 50,000 km driven:

The predicted selling price for this car is INR 1,259,990.

8. Conclusion

By using machine learning, we can predict car selling prices with a good level of accuracy. This model could be deployed in dealerships or used by individuals to estimate the fair market value of a vehicle before making a purchase or sale.

If you’re interested in learning more or trying this out with your own data, check out the GitHub repository for this project, where I’ve uploaded all the code and data.

Leave a Comment

Your email address will not be published. Required fields are marked *