The art of peak traffic estimation
"How much traffic should I simulate?" is one of the most common questions that we get on load testing. In our previous post, we have already covered how to calculate concurrent users from hourly visitors (you can refer to: http://loadimpact.com/blog/monthly-visitors-and-concurrent-users) but the question still remains - how much traffic should you simulate?
There are several approaches to do this and you can choose to simulate:
1) the average number of hourly v../blog/monthly-visitors-and-concurrent-usersisitors over a period of time
2) the peak number of hourly visitors over a period of time
3) increasing the number of visitors until the server breaks
Which method to choose really depends on what you are trying to find out. For 1) the goal would be to ensure that the average load time is at or below a certain level for most visitors on the site. 2) allows you to find out what will happen to your site when there is a sudden surge in traffic. This could happen if the website goes viral online or if there is a marketing campaign. The 3) approach would be to find out the limits of the server, but it is actually not very useful to know when your server will stop responding. That is because it could well be responding to 100,000 concurrent visitors but with an average load time of 30 seconds. Most visitors will probably not bother to load the website in such a situation, and leave.
Let's focus on finding out how to determine the number of visitors to simulate if you want to prepare for the event of a sudden surge in traffic.
To do this, I studied the relationship of peak and average traffic from statistics available at Quantcast.com.
Here is a typical traffic chart from Quantcast:

For the purpose of this study we will only be focusing on traffic from the USA. We will also obtain the average and maximum daily traffic over a 6 month period. This data is was retrieved on 12/02/2011. We also only used data that was MRC accredited. (For more information you can refer to: http://www.quantcast.com/how-we-do-it/mrc-accredited-traffic-measurement).
The information is summarized here:

*Factor = Maximum daily visitors/Average daily visitors
Notice that "Maximum daily visitors" is 2-8 times higher than "Average daily visitors". This is off monthly traffic of 400,000 visitor's. We compile similar statistics for sites ranked in the 10,000s and 20,000s. The results are as follows:

Note that maximum daily visitors are now 2-10 times higher than average daily visitors. Monthly traffic that each site received was about 195,000. Now let's look at sites ranked in the 20,000s. The results are as follows:

You may notice that maximum daily visitors are greater than average daily visitors by a factor between 2 and 38. The variance has thus increased dramatically. One explanation would be that higher ranking sites with high average monthly visitors are less likely to be affected by a sudden surge in traffic. Large sites like www.cnn.com have high, stable visitorship, and the impact of a popular news story such as protests in Egypt would be much lower as compared to a site with low visitorship. A site like michaelmoore.com can be flooded with traffic if a particular piece of news about Michael Moore becomes viral. So, let's take a look at michaelmoore.com statistics:

It appears that on the 14th of December 2010, Michael Moore decided to contribute $20,000 USD to Julian Assange's bail.

This news was reported on many major news channels, causing traffic to spike 38 times higher than normal.
According to Scott Galloway Clinical Associate Professor of Marketing at NYU, there are 3 elements of viral content:
1) Authenticity
2) Humor
3) Social Debate
In Michael Moore's case we can see these elements coming into play. What is interesting though, is that most of the time, going viral normally catches people by surprise. They are not prepared for sudden fame. Neither are their websites. Imagine your website being hit with 100,000 visitors in an hour, you should be overjoyed right? But most users actually get a BAD experience because the page is slow loading or worst- unavailable. We think that if your website falls into the category of having low stable traffic with the chance of going viral, you should not hesitate to test more than 30x average traffic.
There also seasonal peak traffic that is easier to estimate. Let's look at another site, www.holidayscentral.com:

Notice that peak traffic occurs in October and December and this pattern probably repeats annually. If your site's traffic is similar to www.holidayscentral.com and experiences surges in traffic that are predictive and repeated year after year, then you just have to look at your past data to find peak traffic. Then proceed to add the % growth expectations for your traffic. For example, if peak traffic last year was 20,000 visitors and average visitors this year is 10 % higher than last year, you should be using 22,000 visitors to test for peak traffic this year.
=====================================================================================
Conclusion
We can follow these guidelines for estimating peak traffic:
If you do not have prior peak traffic data and...
1)...if your site has low visitorship and contains content that could go viral, test up to 30 times average daily traffic.
2) ...if your site has high and stable visitorship, test up to 5x average daily traffic.
If you have prior peak traffic data and timing of peak is predictable (seasonal traffic), use past data and add a % growth in traffic to arrive at the final number.
When uncertain, just remember that testing with more users is (almost) always better than with less.
/Jack Zhang
P.S. : Michael Moore, if you are reading this (you are probably too busy "taking on" the health insurance industry), we can help you run a load test on your website.
















This week saw the 