Imagine you’ve been working on optimizing a site for a while now, say 3, 6 or even 12 months.
You’ve had solid winners each month, and you’re confident in the test results. These are not imaginary” lifts>. But now your conversion rate looks the same as when you started. How do you explain this to the boss/client?
Another scenario: you’ve been optimizing for 12 months and your revenue per customer has increased by 2%. Same question: how can you justify your contribution? How can you tell what caused that – optimization, SEM, seasonality, word-of-mouth, or something else?
How do you measure the the ROI of your optimization efforts? The question is actually more complicated than it sounds.
ROI is Difficult to Measure
In fact, it’s easy to project the predicted ROI of optimization (click” here to download conversion optimization roi calculator>). It’s just really hard to measure it, post-hoc.
Not really surprising, really. Measuring ROI of optimization is hard. If anything, I’m skeptical of the 38% that demonstrated positive ROI. How, indeed, did they demonstrate ROI?
There’s a quote in the article from Amelia” showalter>, former Director of Digital Analytics for Obama for America, that explains how hard it is to track and measure everything, at least in the long term:
“When we’re working on the campaign, we’re actually working so hard to run all those tests that we didn’t always keep perfect track of exactly what results were long term. It’s hard to calculate this stuff out when we want to put all our resources into running more tests. So, we don’t actually ever have a perfect estimate of actually how much extra revenue was due to our testing, but I think that $200 million is a fairly reasonable estimate.”
The article also sums things up by saying, “You can also take heart that if you’re running valid tests, you are likely improving the bottom line.”
While that’s heartwarming, it’s not going to satisfy a neurotic boss or client. We’ve got to corner a way to measure our impact. How can we possibly do that?
Time Period Comparison in Analytics (and Why It’s Wrong)
If asked to measure improvement in conversion rate due to optimization efforts, most people would point to Google Analytics. They would perform a time period comparison, looking back 6-12 months ago when you started the campaign and comparing with the conversion rate you have now (linear analysis).
This won’t tell the full story for a few reasons, the big one being the variability of your traffic quality.
Several things can affect your traffic quantity and quality, including but not limited to:
- Press (positive or negative)
Let’s say you’re at a 2% conversion rate with 100,000 monthly visitors to start. Over the course of a year, a lot can change the quality of your traffic. If you’re selling novelty gifts, the holidays might improve your conversion rate with negligible impact from your optimization efforts. Similarly, if you hit the front page of Hacker News, you%E2%80%99ll” get a lot of traffic> – but the quality might be really shitty, lowering your present average conversion rate.
Conversion Rates Are Non-Stationary Data
A stationary time series is one whose statistical properties (mean, variance, autocorrelation, etc) are constant over time. According to an” article on duke university website>, “A stationarized series is relatively easy to predict: you simply predict that its statistical properties will be the same in the future as they have been in the past!”
“Non-stationary data, as a rule, are unpredictable and cannot be modeled or forecasted. The results obtained by using non-stationary time series may be spurious in that they may indicate a relationship between two variables where one does not exist.”
That’s essentially the nature of data. Whether because of seasonality, day of week, external factors, press, advertising, etc, data just fluctuates. Even if you didn’t change anything on your site for a month, you’re not going to get the same result every day. It will fluctuate – sometimes a little, sometimes a lot.
Andrew cites this as the reason time period comparison in analytics won’t work for accurately measuring your ROI, and he gives a great example below:
“You can be costing your company millions and think that everything is better by relying on pre/post. Because of this it is less useful than just flipping a coin. Both have nothing to do with measuring the outcome of a change, but at least with the coin you won’t confidence yourself that the data means something.”
A Possible Exception
After talking to Craig” sullivan>, I found out it is possible to do time period comparison. However, you have to have a predictable traffic stream (ie PPC) and even then it is rough. Craig explains it well:
If we were to assume that PPC traffic is “reliable”, we’d also have to assume that you haven’t changed daily budget, haven’t changed your keywords, and haven’t changed your ad copy. Three, six of twelve months is a very long period and there are too many variables. It’s not the same traffic anymore. In fact, the variables are constantly changing in AdWords, sometimes daily:
high-volume accounts see daily bid/budget adjustments and monthly ad tests. Underlying structure might change 30-40% over 3-6mos.
— Leonardo Saraceni (@leosaraceni) August”>
Also – you also can’t draw broad conclusions from PPC data because you can’t assume that all traffic sources will behave similarly. What works for PPC traffic might not work for returning direct traffic, SEO traffic and so on.
Tests to Gauge Impact
“It can be extremely difficult to explain results when it looks like things are flat or overall down. The fundamental problem is that people are using a linear correlative data set instead of the comparative data that a test provides, or in other words you are saying that you are X percent better, not necessarily X percent better of a specific number. All data is sinusoidal, it goes up and it goes down, despite test results.”
If time period comparison won’t work, what will? There are a few ways to measure impact. None of them are perfect – and there are pros and cons of each – but nonetheless, they’re better than nothing.
1. Retest old versions of the site later on
One of the easiest ways to measure ROI is to retest old versions of the site as part of larger tests later on. Basically, all changes made during the testing period (combined into one metric) tested against the old version.
Though like most methods here, there are some pros and cons. According to Craig” sullivan>, if you’ve been continuously improving and learning, it might not be worth the time to test an old version. Craig:
2. Weak Causal Analysis
Another method is weak causal” analysis>.
As Andrew Anderson said, “use weak causal analysis to get a read on estimated impact. In both cases (cause analysis and retesting old versions) you will often find that you are actually having a bigger impact than you imagine. It is important that you are doing this analysis without prompting and proactively giving others a full evaluation of the overall program.”
What’s is weak causal analysis? Basically this: Do a long term trend line with an estimated error rate. Take that based on prior data before the change and look at the outcome as compared to the expected outcome of the trend line. Make sure you are using independent variables as a basis (like users) so that you can get some read on where you would have been versus where you are.
“Anything that can approximate causal information is better than nothing but has a much higher chance of Type” i or type ii errors> (a ‘false positive’ and a ‘false negative,’ respectively),” according to Andrew.
Not perfect but better than nothing.
3. Measure Impact Through Various Stages of a Funnel
Say these are your stages:
- Step 1: click from email to site
- Step 2: add product to cart
- Step 3: go to checkout
- Step 4: buy
It’s possible you might not have enough data to actually measure a difference at step 4. But as Chris” said:>
“You can often infer data about step 4 from steps 1-3 (i.e. if you made a significant impact on the percentage of people reaching step 3, it is *likely* (though not guaranteed) that you increased conversions). There rigorous ways to estimate this statistically, but they are again somewhat difficult to do.”
4. Send a small part of your traffic to a consistent base
Here’s what Lukas” vermeer> said in a previous quote:
Sending a small part, 5-10% of your total traffic, to a consistent control seems to be the most accurate way to track impact of optimization. This is the method that I heard most consistently from expert optimizers, anyway.
Of course, the question then is with the opportunity costs. If you’re not optimizing 10%, you’re (maybe) missing out on increased revenue. You’re also dealing with less optimizable traffic, so tests will take longer to reach significance.
Are There Opportunity Costs?
As Peep said” in a previous article>, “Testing something is an opportunity cost – means you can’t test something else. While I’m re-validating something here, I could be testing something else that gives me a lift (but of course, it’s not possible to know whether it would). It’s also questionable whether you should be re-testing it.”
This is a question of your specific goals and risk tolerance. Andrew Anderson explains that it’s always worth it to improve your performance, which might mean taking the time to test impact over the long term:
Here’s Craig’s take on opportunity costs:
Then again, optimization is more than just a/b testing and lifts. Matt” gershoff>, CEO of Conductrics,” put it well saying part of is about information to inform decisions. in other words optimization reducing uncertainty and therefore risk aversion decision making. so you have factor everything else gain from conversion optimization.>
Craig also mentioned that conversion optimization isn’t just about the testing. It’s about the big picture:
Measuring ROI is hard. But there are a few ways to do it.
There are some statistically rigorous methods of calculating impact (GA Effect, weak causal analysis), and even though time period comparison analysis is generally wrong (due to non-stationary data), as Craig mentioned, there are a few exceptions when you can get a rough estimate (if you have stable and controllable traffic, like with PPC – though you might not be able to draw overall conclusions this way.). Finally, one of the most common answers I found was to send a consistent amount of traffic to a small holdback set.
Keep in mind, too, that when done correctly, optimization and the insight you gain can be used in all of your marketing. It’s a process that leads to information that informs better decisions, so the return on investment compounds with the customer insight you gain.