Customer Churn Prediction System
MACHINE LEARNING

Customer Churn Prediction System

Production-ready ML system that predicts customer churn and optimises retention campaigns for maximum ROI, featuring Flask API, Docker deployment, and real-time monitoring dashboard.

August 19, 2025
Customer Churn Prediction System

Built a comprehensive machine learning system that identifies customers likely to cancel their telecommunications service and optimises retention campaign spending for maximum business value.

The Challenge


A telecommunications company was losing significant revenue to customer churn. They needed a system that could not only predict which customers would leave, but also optimise retention spending to maximise ROI.

Solution Approach


- Data Analysis: Analysed 7,000+ customer records to identify key churn indicators, discovering the "premium but unprotected" customer segment
- Feature Engineering: Created 30+ engineered features capturing customer behavior patterns and value indicators
- Model Development: Tested multiple algorithms from logistic regression to XGBoost; simple models surprisingly outperformed complex ones
- Business Optimisation: Built custom optimiser that adjusts prediction thresholds based on retention costs and success rates
- Production System: Developed RESTful API with comprehensive error handling, logging, and monitoring

Technical Implementation


The system uses a modular architecture with separate components for data processing, model training, prediction serving, and monitoring. All components are containerised for easy deployment and scaling.

Key Results


- Achieved 84% AUC-ROC with business-optimised thresholds
- Delivers 245% average ROI on retention campaigns
- Processes predictions in under 100ms
- Maintains 95% test coverage with comprehensive test suite

Engineering Highlights


- Implemented MLOps best practices including experiment tracking with MLflow
- Built real-time monitoring dashboard using Streamlit
- Created comprehensive logging system for production debugging
- Designed for horizontal scaling with stateless API design

The system demonstrates end-to-end ML engineering capabilities from initial data analysis through production deployment, with a focus on delivering measurable business value.

Gallery

A two-part plot on customer tenure and churn. The box plot on the left shows churned customers have a shorter median tenure. The bar chart on the right shows the highest churn rate is in the 0-1 year tenure bucket.
An ROC curve comparing Logistic Regression and XGBoost models. Logistic Regression has a slightly higher AUC of 0.840, outperforming XGBoost's AUC of 0.832.
A learning curve plot for a Logistic Regression model. The training and validation scores, measured by AUC, converge as the training set size increases, suggesting a well-fit model without high variance.
A four-panel plot showing model performance vs. threshold. The top-left panel identifies an optimal threshold that maximizes total business value. Other panels show how precision, recall, number of targeted customers, and expected value per customer vary with the threshold.
A dark-themed dashboard for a churn prediction model. Key metrics include Model AUC of 0.840 and an online API status. The "Business Impact Analysis" section shows customers at risk, revenue at risk, and a bar chart highlighting the medium-value customer segment as having the highest revenue at risk.
A screenshot of API documentation for a Churn Prediction API, showing a GET /health endpoint and a POST /predict endpoint with a detailed JSON request example.

Tech Stack

PythonScikit-learnXGBoostLightGBMMLflowFlaskDockerPandasNumPyPytestStreamlitPlotlyPostgreSQLGunicorn