Understanding Cohort Analysis

Cohort analysis is conceptually pretty simple yet it’s one of the most important and powerful analysis approach a startup can adopt. I had in my earlier post discussed the importance of Lean Methodology for startups to minimize wastage of resources and getting to product/ market fit first before scaling up. Cohorts play a crucial role in helping us understand user behavior on each iteration or improvement to the product. There are plenty of other business questions that can be understood better using Cohort Analysis. To give you some examples:

1) How are the optimizations made to the product in a defined period affecting conversions?
2) Which traffic source is generating maximum conversions?
3) Which source tends to bring in users with maximum engagement on the platform?
4) Are customers acquired via email marketing more likely to repeat purchase or are they more likely to upgrade, compared to those acquired e.g. via AdWords marketing?

And more. Products such as Mixpanel and Kissmetrics enable us to easily create and analyze cohorts. Google Analytics in it’s early days did not have the Cohort Analysis feature, however, in 2017 they introduced the Cohort reports and according to me, this is one of the most powerful reports you can utilize in your analytics dashboard. And it’s FREE! 🙂

What is a Cohort?

A cohort is simply a group of people who share something in common and is time bound, ie, they had something in common when the grouping was first made. A Cohort is very similar to a segment and often there is a lot of confusion on the difference. To understand better, you can consider a segment as “Employees working in the Marketing Department” while a cohort would be more like “Employees who joined in November 2013”.

Cohort Analysis
Cohort Analysis is very popular in medicine where it is used to study the long term effects of drugs and vaccines:

A cohort is a group of people who share a common characteristic or experience within a defined period (e.g., are born, are exposed to a drug or a vaccine, etc.). Thus a group of people who were born on a day or in a particular period, say 1948, form a birth cohort. The comparison group may be the general population from which the cohort is drawn, or it may be another cohort of persons thought to have had little or no exposure to the substance under investigation, but otherwise similar. Alternatively, subgroups within the cohort may be compared with each other.
Source: Wikipedia

We can apply the same concepts for an online portal/ startup to understand better the different type of users and their behavior on the platform. How we define the cohorts to compare and what we compare about their behavior will depend on the business question we are seeking an answer for. In the case of a Lean Startup, the basic premise is that the product is constantly iterated to find the product/market fit and then iterated on to optimize conversions and scale. This is one of the prime applications of a cohort analysis. We can use Cohort Analysis to compare the users acquired during each iteration and compare their behavior on the platform in terms of retention, engagement, conversions etc. Joshua Porter’s excellent blog post on twitter’s use of Cohort Analysis to track engagement with product improvements is a great example of this.


If you look at the fig, it has rows for cohorts ( User acquired during each month is grouped as a separate cohort) and the columns give the engagement or retention figures for the cohort over a 12-Month period. As you can see this is the only manner in which one could clearly understand if the iterations and product improvements which twitter was rolling out on a regular basis was continually improving the engagement on the platform. Under a normal graph where in the cohorts are not present, many a times this picture won’t get reflected as the engagement from the early set of users will mask the engagement metrics of a particular group, be it in a negative or a positive manner.

The above example from twitter represents just one application of Cohort analysis. There are various business questions as discussed earlier that can be answered using cohorts. Let’s first understand the various ways to define cohorts:

1. Cohorts defined by when the user first Visits:
Many a times a user does not sign up or engage the first time they visit a platform. Grouping users based on their first visit will help one to understand the number of touches required before they sign up or engage on the platform and on what product iterations does one increase the conversion or the engagement metric based on the date of first visit. The earlier case study of Twitter is a good example of using cohorts to understand user engagement for a product.

2. Cohorts defined by when the user Converts:
By Converts, I mean any type of conversion or micro-conversion on the platform. It could be signing up, registering, making a first purchase, subscribing to the list etc.

3. Cohorts define by what channel the user was acquired on:
It’s really important to understand the best channels of user acquisition and the behavior of the users acquired through each channel so that one can focus more on the channels that yield best results. Cohorts based on the Channel of acquisition helps in this.

4. Cohorts based on User behavior:
Users can also be grouped based on the behavior they exhibit on the platform. For eg: In case of Zoomdeck, there are users who are frequent visitors and infrequent visitors. Users can be grouped in to various cohorts based on their re-visit rate and engagement on the platform. This is important as it helps us better understand them by having a look at other metrics exhibited by them. For an e-commerce companies one would need to strategize differently for frequent buyers vs infrequent buyers and this can be done better through cohorts.

5. Cohorts based on Customer Lifecycle:
For a platform having a number of stages it’s important to track various metrics like retention, Customer Lifetime Value, Engagement etc. It could be a simple game having various levels and classifying users based on the levels they are in and understanding the various metrics exhibited by these cohorts would help one take better decision to incentivize the users and make them shift levels.

6. Cohorts based on User Characteristic:
There might be cases where one would also want to create cohorts based on certain user characteristics like Men Vs Women, The Country of Origin, Age Group etc to create targeted campaigns or provide customized incentives to improve the engagement, retention or revenue metrics exhibited by them.

We have covered in general the various cohorts that can be created, although I do agree there might be a few specific ones related to the niche you are operating in. Creating cohorts form just one part of the puzzle, the most important part is to use various metrics to understand the behavior exhibited by these cohorts which enables you to take business decisions. There are various metrics one would need to track depending on the niche, type of product and the product lifecycle stage the Product is in.

Metrics most often tracked between cohorts are:

1. Measures of User Engagement:
During the early stage of a product before validation, User Engagement (including activation) and Retention becomes two of the most important metric. Cohorts based on date of first visit/ conversion, enables us to understand how product iteration is improving user engagement or if any changes made to the product has negatively affected engagement. The earlier example of Twitter was about tracking engagement on the platform. Depending on the product you can define what user action is termed as engagement or activation on your platform.

2. Retention:
Just like engagement is important as a metric, any successful product should have good retention figures as well. I had covered the importance of retention and how it affects virality, cost of user acquisition and customer lifetime value in my earlier posts on Virality. Cohorts help us understand retention better by enabling us to accurately define what features and user flows are improving the retention numbers. Funnel tools don’t help us track retention which needs to record user activity over longer periods.

3. Customer Lifetime Value:
Customer Lifetime Value is probably the most difficult metric to track. One of the questions we might want to understand could be the channels of user acquisition that result in giving us the max. value for CLV, the particular activity that drives a user to upgrade plans, split-test different pricing plans to understand the optimum one, features or user flow changes that results in better CLV. All of these can only be understood better using a cohort group as it allows us to track a cohort over a period of time to better understand their behavior on the platform.

4. Measuring long life-cycle events:
A product undergoes many iterations and feature roll-out. It’s impossible to measure long lifecycle events using just funnels. A prime example could be measuring revenues or retention which is typically a long term thing.

Now depending on the niche and the stage of growth your startup is in, you would have to choose the various metric that you need to track and also for the various cohorts we had earlier described. At the end of the day for any product, things finally boil down to user growth, engagement, retention and revenue. Analytics enable us to improve on each of those metric and cohort analysis is a technique that gives us great insights in measuring metric that are typically long cycle.

Cohort Analysis Presentation (Example)

I love this presentation of Cohort analysis (quoted from this Blog post) :


What you can see immediately is that the area on the right (Period 5) stacks up the current status with users from Period 1 to Period 4. The really interesting piece of the puzzle comes into play when you are considering what exactly your users represent: active, subscribers, etc. So here is what we can infer from the chart:

  • The height of the chart at Period 5 (at 280) is the number of users currently using (or paying for) our system/app.
  • The individual stacks have a drop-off. As we can see, the drop-off is high in the beginning and then starts to level out but does not go down to zero. Since this is homogeneous across all periods, we can infer that there is something we are doing right: user behavior becomes predictable.
  • For each period 1 to 4, new users were signing up and the number of users from Period 1 makes up 17.8% (50 out of 280) of the users in Period 5.
  • The fall off of users from one Period to the next is higher in subsequent Periods, leveling out at about 25%  of the original sign-ups after 3 periods.


Customer Engagement

Customer Engagement – What are the key metrics to track and why?

B2B SaaS is extremely competitive especially for horizontal SaaS products. If you are in the SMB space then that makes it even more challenging for you to survive and then grow. There are a few important metrics the product needs to track assiduously –

  • CAC ( Customer Acquisition Cost)
  • LTV ( Customer Lifetime Value)
  • Payback Period
  • Churn
  • NPS ( Net promoter Score)
  • Sales Velocity

I’m sure most SaaS companies do track these numbers. The key to success is to reduce Churn, CAC and to increase LTV, NPS. One of the key factors that enable a SaaS product to achieve this is customer engagement. But how do you define and measure customer engagement?


What is Customer Engagement?

Customer engagement is the interaction/ activity of your customer on the platform. The customer engagement could be a positive or a negative one and it’s equally important to understand the nature of this engagement.

  • A negative engagement increases the risk of Churn, so there are immediate actions that need to be taken to ensure the customer stays.
  • Similarly, a happy and engaged customer provides you with an opportunity to up-sell or cross-sell.


So, how do you measure Customer Engagement?

Measuring customer engagement inside the product is the same process as lead scoring at the top of funnel. I had covered lead scoring earlier. Lead scoring is a top of the funnel score that we use to qualify leads based on their activity or interaction with various assets/ touchpoints of the product. You could measure customer engagement with either of the two options:

(1) Use 3rd part software tools that let you define and analyse various events inside the product. Here are a few tools you could consider

(2) Setup your own system where you log various datapoints in your DB and run queries to analyse the same.

In either case, you would have define the important events of engagement and also assign points for these events which would help you calculate the all important engagement score. The events that need to be tracked would be based on the application. For eg:

Helpdesk Software: Add support email, setup forwarding rules, setup DNS, Added Agent

A/B Test SaaS App: Create Test, Start Test, End Test, Share Results

Online Billing APP: Create Invoice, Send Invoice, Receive Payment

Once you have defined the events you can log them and also assign weights to each of these events to calculate your Customer engagement score.

Customer Engagement Score = (wt1*e1) + (wt2 * e2) + … + (wt# + e#)

where wt is the weight assigned and e represents the event being tracked.

Along with the consolidated user engagement score, you could also monitor certain specific or low level metrics that again define user engagement. A few examples are:

  • Daily Active Users ( DAU)
  • Weekly Active Users ( WAU)
  • Monthly Active users ( MAU)
  • DAU/ MAU Ratio
  • User Retention – Day1, Day7, Day30

The core metric that you need to track varies from product to product/ app to app. It’s for you to decide what numbers matter for your product.


What next?

Capturing and understanding these metrics defined above is the first step. Setting up steps to improve on these metrics is the next step. This entire process can be automated using a comprehensive automation tool like Marketo, Autopilot, Hubspot Enterprise etc. The right set of messages at the right time goes a long way in optimizing each of the above metrics.

An example:

Pipefy is a great tool for workflow/ process management. It lets you organize all your processes in one place. On signup up with Pipefy, they send you a set of emails to increase engagement.

One of the first emails that they send is a library of pre-existing templates ( most used ones) which would enable the users to get started immediately.


They track weekly retention and send out a mailer to engage the inactive users. This is the second email they send out to inactive users –


Then they follow it up with this email within a few days:


Another example is how Groove improved customer activation using customer engagement data. Grove is a helpdesk software and one of the first things that a user should do after signing up is to setup a support email. They also measure the avg. time it takes for the user to setup the initial support email and if that doesn’t happen then they send an automated email. Here’s the template they use :


They also track user retention and sends out mailers to inactive users to re-engage them. Here’s the template they use for that.


These are proactive measures you can take to increase engagement and user engagement. You can personalize these messages/ automated communications that go out further by segmenting the data. An eg: For Horizontal SaaS products you get registrations from a bunch of industry verticals. You can further segment the user data based on industry vertical and send relevant use case for the industry/ use terminologies that the prospect could relate to. At FieldEZ, we segment prospects based on Industry and the use cases differ across Industry. FieldEZ is used as a Lead/ Sales management tool in industries such as BFSI, Pharma while it’s primarily used for Ticket Management in the Consumer Durables or Manufacturing industry segments.

Other than this customer segmentation also helps in:

  • Identifying what features matter most to a particular segment
  • Measure LTV, CAC, Payback Period, Churn, NPS etc for each segment and work on optimizing the same
  • Measure profitability of each segment
  • Test separate user onboarding techniques for each segment – Messaging and Core interactions based on what matters to the segment