It's easier to climb the quality-scale of data-driven decisions than you may think

Data-driven decisions exist on a steeply sliding scale. This article expands on how this is especially true of growing SaaS businesses and how making a quantum leap to statistically robust predictive analytics might be easier than you expect.

SCIENCEMETHODOLOGYDATA

Kelvin Claridge

8/26/20245 min read

Like most (all?) disciplines customer success has become deeply data oriented, for me this is great because it brings my passions for data, storytelling and solutions together for the most noble of causes, customers. I have invested nearly two decades in measurement, market research and analytics and in that time customer success has become the best there is when it comes to catalyzing success with advanced data analysis (Amazon and Netflix best prove this). In the home of CS - SaaS however, most company's are not nearly so sophisticated.

I am a huge fan of CS platforms like Vitally.io and Totango as they improve visibility, efficiency and operational capability. But, they proliferate a common customer success addiction, descriptive analytics (I'm looking at you compound health scores!).

Descriptive analytics use almost any observation to explore and describe data sets, outputs include (among limitless possibilities) count, mean average, ratio, segment and health score (time in app / clicks + positive interaction - tickets open for >3 days).

Descriptive analytics are useful, ubiquitous, can be accurate and the starting point for almost every meaningful piece of analysis and data-led decision making. Data scientists will gather, clean and explore all manner of metrics as part of the first stage of any data project, exploratory data analysis (EDA). The problem I see is that this is the only stage for many scaling SaaS companies. this is also the last stage. The proof of imminent churn / the smoking gun cause / the justification for spend is merely a hypothesis.

Businesses that go one step further and invest in advanced data management and analytics capitalize on the opportunity to eliminate false hypotheses quickly, reduce confirmation bias, identify deeper patterns in their data, predict outcome accurately and game hypothetical scenarios. This collectively grants them a significant competitive advantage over those that don't.

Faster and more accessible that you might think

It is true that you need a robust data to produce trustworthy insights but while I will always champion the master data management (MDM) initiatives, data analytics has come a long way without them. Data cleaning and preparation is definitely the largest part of most analytics projects but you are already contending with this - simple descriptive analytics are damaged by missing data or small sample sizes far more than more advanced methods. For example, the T-Test - which might be used to demonstrate (to a specific degree of certainty) the incremental performance gains of customer segment A were not down to chance - works best with a sample size of less than 30.

Of course the benefits of building a master source of truth are numerous and weighty - agility in your tech stack (especially for operations), accuracy in day-to-day operations, cross functional alignment, the simple pleasure of knowing that the number you are looking at is correct and doesn't exist elsewhere (in different orders of magnitude)... One of the most powerful benefits is unlocking what I call data catalytics, the ability to point analytics at a huge number of challenges and opportunities as a matter of routine. It is this frequency of using high quality analytics that enables what what Thomas H Davenport calls Analytics Competitors to dominate their categories. Having said this, if you have the data to make a hypothesis, (be it a fully realised MDM solution or clean and representative CSV) you have the data to go further.

The type of analytics I am committed to seeing more of in customer success is predictive analytics, the stage after EDA where a budding hypothesis' (of which you might make several) accuracy is measured using data models. Worthy opportunities for tests come up every day, for example, tracking a CSMs performance, deciding which customer to contact first, assigning (even automatically) a health score - these are based on hypotheses which may be grounded in intuition, experience or descriptive analytics Anything shy of randomised, double-blind studies are unlikely to demonstrate causation. For the purposes of making robust, fast decisions, predictive analytics are an enormously powerful tool allowing us to get much closer to the accuracy of a blind study with <1% of the investment.

A competitive edge for customer success

Lets look at the not-so humble health score that sits at the heart of many customer success operations, whether as predictive tool or one that drives activity or both. I remember asking four leading CS platforms about health scores as part of consolidating tools for Spotler and was impressed with the depth of logic that each of the systems could cope with but surprised to find that the purpose of the score - to flag customers to churn early enough to save them - was primarily met with common assumptions about churn. In effect they did a good job of mapping the customer experience and ensuring they're being treated the way one might hope. But what a I really wanted was an early, demonstrably reliable indicator of churn.

Using predictive analytics I have been able to satisfy most of my requirements for building a robust health score. A mixture of linear and logistic regression models (measuring predictive correlation), k-means (clustering method) and decision trees have greatly elevated my confidence in identifying metrics that have significant predictive power.

Crucially these methods quantify uncertainty and due to the speed at which they can be applied, allow users to answer many questions and iterate endlessly.

With an increasingly achievable investment in skills (two of the most powerful statistical tools, Python and R are open source) or tools (including common visualisation tools like Power BI and Qlik) customer success can avoid many errors of assumption caused by relying on purely descriptive analytics and capitalize on many opportunities.

A customer organisation able to deploy resources and forecast with laser accuracy, test hypothetical scenarios before committing to action and do this as a mater of routine has a formidable advantage externally and a powerful voice internally.

I have included some example of what can be achieved with enormous speed and precision by turning to more advanced statistics below along with further reading and suggestions on how to advance your analytics skills.

Python packages Seaborn and Pandas include tools that produce a complete scatterplot matrix (plotting correlation of all variables in a dataset against one another) and quantify the strength of the correlation between two metrics as a score between 0 and 1 respectively. Both tools are a single line of code.

Python package Scikit-Learn includes tools to implement Random Forest algorithms creating decision trees with the shortest route from root to leaf, segmenting data by the most significant classifiers (demarking attributes) first. 100k samples can be processed in as little as 20 seconds.

Ready to push your data-driven decisions up that scale? Next steps and further reading:

Speak to me, happy to discuss your needs and explore options or point you in the right direction
Google Data Analytics Professional Certificate - this and the advanced course are a great start
How to measure anything, Douglas Hubbard – calibrated estimates
Competing on Analytics, Thomas H Davenport

It's easier to climb the quality-scale of data-driven decisions than you may think

Contact