In a follow-up to their Seedcamp Firsts conversation on data, our Venture Partner Devin Hunt and Candice Ren, Founder of analytics agency 173Tech and a member of the Seedcamp Expert Collective, dive deep into A/B testing and good data science practices.
“It is good to be more specific towards the goal. You want a better algorithm. What does that mean?” – Candice Ren, Founder of 173Tech and Seedcamp Expert
With new and exciting AI technology emerging around recommendation engines, how can product leads evaluate which solution is better and how to really measure a “better recommendation”?
Focusing on a specific case study – a furniture marketplace, Candice, who worked on A/B testing and recommendation engines for Bumble, Plend Loans, MUBI, Treatwell and many others, shares her thoughts on:
– the intricacies of setting up and analyzing an A/B test experiment focused on comparing two different recommendation algorithms
– how you set your hypothesis
– the best way to segment your user basis
– how to select what you are controlling for (e.g. click-through rate)
– how to interpret test results and consider broader business metrics impact.
Candice and Devin also emphasize the importance of granular testing, proper test design, and documentation of test results for informed decision-making within a company’s testing framework.
Expert tips on A/B Testing
Is It Worth Testing?
To avoid testing things with limited impact, zoom out and think about the customer journey as a whole. Map out the different touchpoints to identify areas with the largest drop-off. Focus on these areas as the right solution will give your product the biggest boost. If your test only affects 5% of users, then it is probably not worth it.
The Right Metrics Your test metric should be the immediate lift as a result of the product change. If your algorithm is returning a suggested sofa, the first measure is if users are interested in this recommendation, i.e. how many users click on it (CTR) to learn more. You should also keep an eye on “counter metrics” for every test.
These are KPIs with a direct impact on business bottomline, e.g. number of purchases and revenue from the recommendations. The funnel between clicks and purchase is also important to analyse, e.g. add to cart and checkout. Perhaps users like the recommendations but it does not tailor to their budget. Or certain recommended products are not available in their location.
Randomised User Allocation When it comes to conducting the test, it is important that you randomise the user allocation into groups, e.g. a control and a test group. Randomisation is important so as to not introduce biases into your results. For example, if one group has a higher percentage of loyal users than the other, it will likely return more favourable results regardless of the algorithm.
Prevent Overlap While a 50/50 split for a single test is the example we used, in real life it is more likely that different departments need to test different elements at the same time. In this case, you will need a mechanism to prevent overlapping tests on the same users. Otherwise your results will be contaminated. For example, the product team is testing different recommendation algorithms while the billing team is testing new pricing strategies.
“Make sure when you test, you are changing one thing at a time for the targeted groups.”
Statistical Significance Once you have results from your test, you have to ask yourself whether it is statistically significant. Significance tests help you minimise the possibility of any uplift resulting from random chance.
Here is a useful link to calculate test statistics. We recommend setting a confidence level of 90% or above.
Automate & Save Your Learnings From a data perspective, all your test results should be automated into a dashboard with a list of test and counter metrics, and their statistical significance. Once a test is concluded, you should summarise the learnings in a centralised place so everyone has visibility. You should NOT test something that has already been tested before without a good reason.
If you want more info on how to run A/B tests across the product lifecycle or need advice in tying together event tracking, website and revenue data, then be sure to get in touch with Candice and her team at 173tech.
Check out our growing Seedcamp Firsts Content Library.