Machine Learning model focused on outcomes

When starting with machine learning (ML) in any organization, it’s good to begin with something tangible. Here at Showmax we really care about our customers, and we don’t want to lose a single one of them. Addressing the issue of churn struck us as a great way to introduce a ML model of real value.

We’ve been exploring the idea of linking the data we measure (proxy metrics) to business values (e.g. how long the customer stays, how active they are, etc) since mid-2019. In truth, it kept us awake at night. Most companies have some kind of churn prediction dream, with which they could see how likely a user is to leave the platform. We wanted this, but we also want to improve user experience overall, and to do that we need to see if and when our platform doesn’t resonate with our users and why.

The fact is that churn is measured quite easily. The problem is that when the user churns, it’s already too late. We want to know when a user is becoming somehow less happy with the service, predict churn, take action, and (if at all possible) prevent it.

Customer Churn visualisation source: https://towardsdatascience.com/how-do-you-measure-if-your-customer-churn-predictive-model-is-good-187a49a9eee3

Around October 2019, we (the Showmax Analytics team) performed a correlational analysis. The approach consisted of collecting numerous different metrics and running simple correlation. It didn’t yield very satisfactory results, and we decided to focus on a more complex solution. It was still a valuable idea, so we decided to apply machine learning on a far bigger dataset.

From there, we go to work on a metric that we called “happiness”, which combines multiple parameters into one metric.

One metric to rule them all

Back in 2020, when we were asked to present something about churn or likelihood of users dissatisfaction with our platform, we needed to present dozens of charts. Every team has different charts, different priorities, and so on, and this is just not the way to get the results we want.

Our goal is to improve our services and tailor them to our customers. To do this, we are constantly running AB tests and other experiments to improve content discovery. But, we need to be able to measure the impact.

In a perfect world, we would have one golden metric that defines if our customers will stay or not. It would take into account a matrix of values with different behavior patterns across clusters of customers. That way we could simplify future AB-tests and analysis, and define this as Customer Happiness.

This combines a number of metrics and feeds them into a ML model that evaluates the probability of the customer staying with us.

One Metric to rule them all

Workflow & Technologies

The data we need is mostly present in our on premise data warehouse (PostgreSQL), or can be fetched from Elastic. This is automated with Airflow, with the heavy-lifting done with DASK. Once all the preprocessing is done, the final dataset is fed into a CatBoost classifier.

Technologies used

When the relevant data was fetched, we tried to come up with as many features as possible. We would implement all of them and then let the algorithm decide which ones are more or less relevant. These included things like the number of trailers watched, or the use of the search function.

CatBoost

CatBoost is the ML technique we used for training our model. It stands for Categorical Booster, this has three main advantages:

1) It handles categorical data (e.g. platform, genre of the asset) much more conveniently than other Boosters

2) Boosting is a very popular and smart technique in ML, as it places extra emphasis on getting every sample right. A booster typically trains hundreds of decision trees where each decision tree focuses on the data where the previous decision tree was wrong.

3) CatBoost needs less hyperparameter optimization.

Use-cases

The churn predictor tells us what to focus on next by showing which parameters are highly-correlated with happy users. We use the happiness metric in AB testing to check if our changes impacted the happiness of our users. One thing we noticed is that happiness is very hard to move because it takes so many things into account. In short, it’s hard to impact happiness with a single AB test.

But, we can also check the SHAP values of the happiness model and see which features are the most important to focus our efforts better and try to improve these metrics.

Conclusion

We managed to put a ML model into production for the first time. It outputs a probability (in %) for each user to be happy with the service, helps us to evaluate AB tests, and it eases other analysis. For now, this is a churn prediction model, but we are also thinking about extending this to other labels (instead of churn it could be “watching tomorrow”).

Please check the original version of this article at