Overview
Problem
Accurately predicting the prices of preowned cars is challenging due to the wide variation in car features (brand, model, year, mileage, fuel type, etc.) and the non-linear relationships between them. Traditional ML models often capture structured patterns well but struggle with temporal or sequential dependencies, while deep learning models excel at complex feature interactions but can overfit or require large datasets. The challenge was to design a solution that leverages the strengths of both.
Approach
- Data Collection & Preparation: Scraped large-scale car listing data from CarDekho. Performed extensive data cleaning, feature engineering, and categorical encoding to handle missing values, outliers, and categorical attributes (fuel type, transmission, owner type, etc.).
- Modeling:
- Implemented deep learning models (CNN, RNN, LSTM) to capture sequential and feature interactions.
- Trained machine learning models (Random Forest, XGBoost) for robust baseline predictions.
- Designed a hybrid ensemble that combined predictions from deep learning and ML models, balancing generalization and accuracy.
- Evaluation: Benchmarked models using metrics like MAE, RMSE, and R². The hybrid ensemble consistently outperformed individual models.
- Deployment: Integrated the best-performing model into a Flask-based web application for real-time, user-friendly predictions.
Uniqueness & Impact
- Performance Improvements: The hybrid model achieved a ~12–15% reduction in RMSE compared to standalone ML or DL models. This demonstrates the complementary strengths of both approaches.
- Accessibility: Deployment via Flask enabled real-time predictions through a lightweight web interface, making it accessible to non-technical users.
- Scalability and Constraints:
- Addressed technical constraints such as handling imbalanced categorical features, optimizing training time for deep networks, and ensuring the Flask app remained lightweight for deployment.
- Designed modular pipelines for easy retraining when new car data becomes available.
- Uniqueness: Unlike traditional price prediction systems, this solution integrates sequential learning (LSTM for mileage/year progression) with structured ML models (Random Forest), creating a hybrid ensemble that improves both interpretability and predictive power.
Screenshots
Homepage view
Detail view