How can reducing page memory size increase download times?

This post is part 1 of the "Simpson's Paradox" series:

  1. How can reducing page memory size increase download times?

Introduction

In 2009, Chris Zacharias thought that a 1.2MB video player YouTube was using was far larger than needed. As he points out in his amusing blog post, full Quake clones were being written in under 100kB.

Part of this is an engineering challenge, but there is also business value to be derived as well. People still on 56kbps modems would have to wait approximately 3 minutes just to download the video player (let alone the video itself). In 2009, dial-up was uncommon in Silicon Valley and large urban areas like New York City, but was still common in more rural areas, and many countries overseas.

The load time was important enough for YouTube to track the average load time of the page as one of its key metrics. If this time got too long, users would lose patience and leave.

Chris spent three days improving the player, and got the page size down to 98kB, roughly 8% of the original download size! Paradoxically, Chris saw the average load time increase dramatically as a result. For most changes, this would be enough to do an immediate rollback (after checking for errors in the analysis), but in this case the idea of making the download smaller leading to an increased average time was so bizzare that the team dug deeper.

tl, dr; Download size is 8% of original, but download time is dramatically up

Everyone's result got better (but this wasn't captured in the metric)

Instead of taking a global average, which went up, we could look at the average download time per geographic region: North America, South America, Africa, Europe, Central Asia, South East Asia, and Oceania. In every one of those regions, the average time went down.

How can every region see a decrease in average time, but the overall average see an increase?

The answer lies in the participation rate. With the 1.2MB size, it would take people in Africa approximately 20 minutes to download the video player, and so not many people bothered. When the download was shrunk to 8% of the size, download the player took just under 2 minutes, and a lot more people from Africa used YouTube. This increase in participation from people who were not bothering to use the large downloads is enough to bring the overall average up!

Example numbers

Chris's post didn't include any viewer statistics, so let's make up some numbers to show how this could happen. Let's limit ourselves to North America, Europe, and Africa (the story generalizes if we include South America, Asia, and Oceania as well, but we try to keep just a few regions so we can see what is going on).

When downloading something from the internet, there are three big factors that determine your waiting time:

  • how fast your connection is (both your modem, and your provider),
  • how far away you are from the server,
  • size of the download

Let's say we measure the average download time per region, and the number of monthly views in 2009 for the 1.2MB page

Region Avg Download Time Videos watched per month
North America 2.5 s 6.3 Million
Europe 1.8 s 5.4 Million
Africa 20 minutes 0.01 Million

This gives us an overall average download time of 3.2 seconds.

By making the download 8% of the size, the times should go to roughly 8% as well. The US and Europe see a mild increase in adoption, but not too many users were put off by the wait time. Africa sees a huge increase in people watching videos under the reduced wait time:

Region Avg Download Time Videos watched per month
North America 0.3 s 6.9 Million
Europe 0.2 s 5.7 Million
Africa 1.6 minutes ~ 100 seconds 1.0 Million

Taking the average of all these times now gives 7.6 seconds, more than double what we saw with the old download sizes. This is despite everyone getting shorter download times!

Selection Bias

This resolution may still "feel" wrong - if everyone gets a better deal, how can the average go up. If we were dealing with a class of students, and everyone got 10 more points on their test, this is enough information to tell us the class average must go up.

In the class example, everyone had exactly one grade. Any student increasing their grade (if there were no changes in other grades) raises the average. In the YouTube example, people can choose how many videos to watch (or whether to watch any at all). Because we can increase the number of views due to changing the loads, our intuition about averages doesn't work. We also have the regions with the longest download times having the greatest increase.

This is a form of selection bias : shorter download times will get more views. We often encounter selection bias when only getting certain groups, such as

  • doing telephone polls for political survey (only selects people with phones, not all voters)
  • product launch in a major city (ignores differing preferences of people in rural areas, or other coast)
  • product feedback (only registers users that feel strongly and are comfortable leaving feedback)

Each of these cases feel like, with better designed data sampling, you could have polled voters in a representative fashion, you could have launched your product in representative geographic areas, and (in principle) you could imagine collecting all the people that have bought a product to try and account for the bias.

Why our example feels different

One way that the YouTube example is qualitatively different is that we are not "selecting" a bunch of video views, and happen to over-sample from North America and Europe. Instead, the clients with slow bandwidth in Africa are opting not to view as many videos, as it takes a long time for them to download. The actions of our clients are the ones that do the selection, and we cannot determine how many videos they would have watched if our downloads.

(We can make models that estimate it, however.)

Not being able to determine the "selection bias" is common when opening up your market to new segments. For example, if a delivery app was SF only, but then supports delivery in Arizona, we don't know ahead of time how much demand to expect from Arizona, and the total average value of an order may well go down, but we would probably segment by market, or by new vs existing users.

The YouTube example is a little different from this example as well, as we didn't open up to new users. Even looking at new vs old users wouldn't help here, as user's with slower internet would probably watch more videos. Even though there are some differences, hopefully these examples show why our intuition about averages from fixed populations leads us astray here. Even in fixed populations, the question is surprisingly subtle, as examples of Simpsons paradox show.

A common problem: "per user" metrics

We have seen that, instead of looking at "average load time", we get a much better metric if we look at "average load time per region". If we looked at a metric like "number of videos watched", we would see that the smaller download size saw a significant increase of videos viewed.

We might be a little cautious about this metric, especially if our site is growing. If we are growing through marketing, word of month, and (in YouTube's case) a growth of content, a change that did nothing would still see an increase in "number of videos watched" increase.

One way of trying to solve this is to look at the number of videos watched per user, VWpU (as with any metric, there are issues with this, such as not controlling for increased content, but let's keep things simple for now). If our change in download speed kept the same set of users, but they increased the number of videos they watched now that downloading them was quicker, VWpU is a reasonable metric. If we encourage a lot of new users to join, this is a problem.

To see why, let's say we have 10 people that watch 5 videos a month at the old download speeds. When the new downloads come out, those 10 people watch 25 videos a month, but 100 more causal viewers (who didn't watch anything when it took 20 minutes) decide to watch 2 videos a month. A naive way of calculating the VWpU gives

Download Time 10 engaged users 100 casual users "Average"
20 minutes 5 VWpU MISSING 5.0 VWpU
2 minutes 25 VWpU 2 VWpU 4.1 VWpU

Under the old system, the 100 causal users were not in my data set, so I didn't know I should zero them out, and my data set only had engaged users. When the barrier to entry was lower, my engaged users got more engaged, and some of the causal users started using my platform.

Summary and miscellaneous points

YouTube decrease the size of their download player by 92%, which should reduce download times by (approximately) the same factor. However, they saw the average download time go up dramatically instead. Separating out the different geographic regions (which roughly align with "high bandwidth" and "low bandwidth"), every single region saw a decrease in average time. The reason for the global increase was the regions with the slowest internet saw the greatest increase in the number of videos watched, once the download speed improved.

  • Be careful of selection bias -- especially the bias of who chooses to use your product, and how frequently. It is hard to measure the people who are not interacting with your product.
  • When groups behave differently, it is often worthwhile to monitor the metrics on each group.
  • Make sure to measure what matters. If we looked at the number of videos watched, we would have seen a massive spike in this metric instead.
  • Be aware of "missing data" metrics. A metric like average number of videos watched per user can be misleading if you open up to more causal users.
  • Behavior of the overall average vs the behavior of subgroups is surprisingly subtle (even in cases where the population is fixed).

One thing not to take away from this article is the idea that stratifying your metrics (i.e. looking at the groups) is always the right approach. It was this time, but it isn't always the case! Gelman has an example of unemployment where the aggregate numbers are more relevant. How to decide whether to stratify or not will be discussed in the article on Simpson's paradox.

References

Internal pages

External pages