In this post, we’ll look at your collected data and start to ask the right questions to determine the importance of identified “triggers” that lead to customer attrition. Using the collected data reveals the true correlation to behaviors that lead to attrition.

Without a strong collection of data in place (Creating a Predictive Churn Model: part 1), it’s difficult to see correlations between customer attrition and specific data points for the model. On the plus side, diving into data can be a very fun process. Seeing more and more information unfold into habits is an engaging and eye-opening experience. But be careful, because it does become wasted time if you’re looking to answer questions that don’t present any value to your problem.

Once you get past the general assumptions and queries we present here, you will surely have developed some of your own questions or queries that will further your research. The important part of this stage of developing the churn model is to ask as many questions as possible. This way you can test, retest, and qualify your assumptions and data before moving to implement a model.

Most questions have two core variables, quantity and time. We break down a few questions to help start the process of data investigation:

1. What is the % of correlation between available data points and attrition?

From this question we could get an answer that says something like “of all our lost customers, 80% of them filed an online complaint”, or “of all of our lost customers, 60% of them never downloaded the free software.” The hope would be to see a pattern of correlation in particular data points to attrition. Also, please be sure to ask the same questions of the current customers to make sure you are not assuming something as an attrition-only trigger. It could be that the same percentage of current customers share the same data values. There are many applications that will do this type of statistical research for you as well.

2. At what point in the product lifecycle did they leave?

When did they leave? Was it when the subscription expired, or was it related to a holiday or other external event? How long did they stay before they left? Did usage slow down before they officially left?

3.  What was the login pattern? or how many times did they use “X”?

What you are doing here is exploring each data point in more detail over time. In each case you are looking for something unique about the pattern and quantity that is unrelated to a current subscriber’s pattern and quantity. In most cases we see a few high correlation attributes, but many fall into an area where seeing them as high attrition based attributes are unclear or guesses at best. Finding those high correlation triggers is key to the final step in the model creation.

Putting it to Together

Once you have exhausted your inquiry into the data, you should start seeing a few interesting patterns of an attrition. You may find one or two very high correlated data points, or possibly a larger bucket that is not as differentiated as you’d like. If you cannot find at least three high correlation attributes, you should keep digging. You are not asking the right questions of your data or you don’t have the right data points in your collections yet. But once you have those triggers mapped out, you are ready to start creating the final documented churn model, which we’ll explore in our third post.

Read Part Three, of Creating a Predictive Churn Model