In this article, we will learn more about the famous A/B testing which is an important topic in Data Science to find best performing experiments.
Table of contents
- What is A/B Teating?
- Designing an experiment
- Choosing the metrics
- Developing the experiment
- Analyzing results
What is A/B testing?
A/B testing is a famous testing technique used to compare two variants to determine the best of the two based on user experience. It is a randomized experimentation process. The two variants are generally labelled A and B and hence the name A/B testing. It a methodology used online when we want to test a new product or feature.
In this, we take two sets of users and show the existing product (control set) to one set and the experimental version to another. And based on how the users respond differently to both versions, we determine which version is better. We can use A/B testing for a wide range of things from a newer UI to a new look to your website. And many companies use this technique. When Amazon first started out on personalized recommendations, the did and A/B test and found that they had a significant increase in revenue from personalized recommendations.
Now, the question that arises in our mind is that, should we do an A/B test for all the changes? Let us learn the answer through an analogy. A/B testing is really useful for helping us climb to the peak of the current mountain. But if you are not sure whether to be on this mountain or an another one, then A/B testing does not help us there. There may be some things that are difficult to get through short term A/B testing like if more people are actually coming back to the website from referrals and how long does it take from them to refer your website to their friend.
Designing an experiment
Choosing the metrics
The first step while designing an A/B test is to choose and characterize the metrics. In other words, we need to choose a way to determine whether or not the control set is better than the experimental set. For this, we need an understanding of how the metrics are going to be used.
There are mainly two use cases of the metrics:
- Invariant metrics
Invariant metrics are the ones that does not change across both the groups throughout the experiment. These are sanity checks that make sure that the experiment is run properly. For example, whether there are the same number of people or is the population distribution same in both the sets? Are there comparable number of users across countries? These are invariant metrics.
Evaluation metrics may be chosen based on the needs of the business case. These may be high level business metrics like how much revenue is made, what the company's market share is or they may be detailed metrics that focus on the user experience of the user using the product, how long are users staying on the website and click through rates.
Developing the experiment
After choosing our metrics and having an understanding of what to test, we next need to focus on what we use as the subject in both groups of our A/B test. In other words, how are we going to identify an individual person or subject is in our experiment. It may be based on cookies, user ID or any other imperfect proxy. This is called unit of diversion.
Next, we choose the target population. We will need to decide who are eligibile to be a part of this experiment. Can we test it on everyone or only the people of a particular country? If we are doing a test on an evaluation metric, we need to make sure that the population of both groups are comparable. And also determine a reasonable size to experiment on.
Finally, considering into account all the factors, we need to figure out whether what we planned to do is realistic given the duration of the experiment and the variability of the metrics.
After we have figured all these out, we can deploy the features we need to test and start the actual A/B test.
Before analyzing our traffic, it is important that we filter and segment the traffic so that we don't consider unwanted data for our results. For example, there might be another company that poses a competition and they would have clicked through and viewed every single page of our website. Or, there might be malicious hackers trying to hack our website and all these increases the click through rates and viewership of our website. We do not want such data to tamper the results of our A/B test. Hence it is essential to filter the traffic we get.
It is also important that we perfom sanity checks and ensure that the invariant metrics are the same for both the control and experimental set after we get the data from our test.
As for our results, we decide whether we have observed a statistically significant result of our experiment. We also estimate the magnitude and direction of the change it has on the business before we come to a conclusion.
With this article at OpenGenus, you must have the complete idea of A/B Testing.