Be prepared for Christmas Traffic

Christmas season is usually the busiest and most profitable period for online retailers and travel related sites. According to Comscore.com, in 2009, November's internet traffic to e-commerce sites grew by up to 47% against the base period (Aug 31st to Nov 1st). With a fast recovering economy, website owners can expect a greater increase in internet traffic during the year-end holiday season.

 

This might sound like good news to e-retailers but a sudden increase in traffic creates stress on a server and slows a website down. Moreover, poorly designed webpages that are not optimized for speed further prolong load times. Even  a one second increase in load-time from three to four seconds will result in a dramatic decrease in customer conversion rate. Key findings from the "Consumer Response to Travel Site Performance" study conducted by PhoCusWright and Akamai showed that 57% of online shoppers will wait three seconds or less before abandoning the site and that 65% of 18-24 years old expect a site to load in two seconds or less. This means e-retailers actually lose many potential customers without even realizing it.

 

In the e-commerce market where profit margin is thin and competition is tough, minor tweaks that make website load faster will give retailers an edge over their competitors. Webmasters can choose between a variety of free online tools to check their website's performance and also get useful tips. Google's Page Speed and Yahoo's YSlow are two more popular choices that analyze web pages and suggest ways to improve their performance based on a set of rules for high performance web pages. Both services provide a grade for the website which webmasters can use to benchmark their website against the competitor's. Good practices as well as suggestions are also give and the services are free to use.

 

The good news is that there is still some time left before the Christmas traffic hits and webmasters can further optimize their website to prevent a "winner takes all" scenario where a few fast loading e-retailers account for the bulk of the online sales.

 

P.S.: We have a Christmas promotion going on here

Real world Wordpress performance testing

The background

A few weeks ago, one of the companies I'm partly involved in purchased a very popular website in the Swedish real estate field. The website we bought was essentially a Wordpress site with some interesting customizations. The new website ranks very good in Google and the other major search engines and the amount of organic traffic that flows in to that site is simply fantastic.

As soon as the deal was done, the team copied the website to a new production environment and started working with the graphic appearance so that the new site would resemble the existing old site. Half way through the work, the CEO of started asking questions about the performance. In short, it was terrible. A page would take anything from 4 to 10 seconds to load and sometimes it would even time out entirely. A few different efforts was made to explain and remedy the problem, but it turned out that there was a lot of guessing involved and none of the guesses made was the right one. Yesterday, I did the right thing and took it to the test bench.


Take it to the lab

They say that once you know how to use a hammer really well, all problems starts to look like a nail. I guess that's starting to become true for me, only in my case the hammer is loadimpact.com. My idea was to copy the Wordpress blog in question to an environment where I could examine it under load, so I did just that. I looked in the back of my closet and found this old 2.1 GHz Quad code with 8Gb RAM machine that was just collecting dust (kidding, I use it all the time). The server has a complete LAMP stack so both the web server and the mysql server resides on the same hardware. 

 

A few short words about how to move a Wordpress installation. It's not complicated, but you can end up with a few really non intuitive problems if you don't get it right. However, if you do it right, it's a 5 minute task (plus the actual time to move large files across the Internet). Here's how I got it right the second time:

Move all Wordpress files to the test environment. There isn't really that many ways to fail but you need to make sure that the web server(I'm assuming everyone uses Apache2 here) on the new host has read access to the destination folder. 

Get a DNS entry for the new destination. In my case, all the customizations made to the blog was assuming that the installation resided in the root folder of the webserver, all images was referred to with src="/images/foo.jpg" etc. So I setup a subdomain that pointed to the web server and set up a new virtual host that pointed directly to the Wordpress installation on the new web server

Copy the database. There's some room for error here. I used phpmyadmin to create a database dump from the production environment with the intention to restore it on a database accessible from the test environment. When restoring it in the test database, I used the command line:

mysql dbname -u username -p  < dumpfile.sql

That went really well, the database was recreated in no time and everything looked really good. But. That doesn't work. Wordpress uses uft-8 as default character set and in the above line the mysql command line tool will assume that the file is ISO-Latin-1. Chances are that it won't really hurt you at all but if it does, the errors will be far from intuitive. I ended up with error messages like:

Warning: array_keys() [function.array-keys]:
The first argument should be an array in wp-includes/widgets.php on line 657

The fact that you're looking at a character encoding problem here isn't really obvious right? But the explanation is farily simple. Worpress uses the PHP internal functions serialize and unserialize to transform complex structures like arrays and objects into strings that are suitable for storing in a database. The format used will for instance describe a string like this: s:10:"räksmörgås". The number 10 tells PHP how many characters to expect in the following string. When exporting the above example into a text file, the file on disk will look something like this: s:10:räksmörgÃ¥s. Now, if mysql thinks that the file you are importing is encoded in ISO-Latin-1, the content of the database will look exactly as it did on disk. Later when Wordpress is asking PHP to unserialize the expression s:10:räksmörgÃ¥s PHP will kindly reply that it can't be done and Wordpress will end up feeding funny looking strings into functions that would expect for instance and array as input parameter. The above error message I got was one of those examples. But the remedy is really simple, import the dumped file with this command instead:

 

mysql dbname -u username -p --default-character-set utf8 < dumpfile.sql

 

Adjust to the new environment. The last thing that  needs to be done when moving a Wordpress installation is to make sure Wordpress knows that it have moved now. There are two important places to change. First, the file wp-config.php needs to be updated with the new database credentials. Second, the table wp_options (could use a different prefix than "wp_" in your installation) stores the URL of the blog in two places, you will find them both using the following query:

SELECT * FROM wp_options WHERE option_name in('siteurl','home');

So, with the above steps taken care of, the Wordpress installation now works as expected in my test envornment. It could have been done in 5 minutes, but I ended up spending a little more due to the database character encoding experiment, but at least I learned something from it and hopefully so did you by reading about it.

 

Take it to the lab

Time to bring out the tool box. In my case, it's loadimpact.com account and a ssh account on the target machine. What I like to do is to put load on the application  and wait for white smoke to appear somewhere. Typically, I'm looking for signs showing that the application is too hard on the database, is using too much CPU or is memory hungry.  Of we go:

 

Yep, there's a problem with this blog alright. Even at 10 concurrent users, we have response times around 7 seconds. Interestingly though, the load times seem farily constant as we put additional load. At 50 concurrent users, we see about the same average response time as with 30 or 40 users. As this test was running, I was watching the server to see where the white smoke would appear. I didn't really have a favorite suspect so I began with using top:

The screenshot shows what top looked like with 50 concurrent users (remember that the response times here was in the range of 10 seconds). There's a couple of apache2 processes that's actually not that hard on resources. 4% CPU each and 0.3% memory (on a fairly powerful machine though). The mysqld process seem calm, it shows up in the list but uses very little CPU and memory resources. This actually looks fairly good. But let's go over the ususal suspects:

RAM memory actually looks really full, only 200 Mb  free, but that's actually just how Linux works with memory (the same numbers on a Windows machine would be awful). Before I see the kswapd process showing up in top, I won't really consider available RAM to be a problem. So conclusion #1 is that we don't have a problem with RAM memory.

CPU also looks good. None of the visible processes in top uses that much CPU. Also, take a look at the header. 95.1%id actually means that the server CPU is idling 95.1% of the time. Conclusion #1 is also easy, we don't have a problem with CPU usage.

Database usage looks good as well. Most of the work is carried out by the mysqld process that we already saw was very quiet. Since we actually have the web server and the database server on the same hardware, excess database usage would also show up as higher total CPU usage. But to be sure, I'll have a look using the tool mytop:

 

 mytop shows the currently running queries that mysql is handling at the moment. If we had any queries that took a really long time without consuming a lot of CPU, we'd catch them here. The screenshot above is takes while there was 50 concurrent users hitting the blog and the response times was in the 10 second range. It's clear that the delay in response times has nothing to do with poor database performance.

So, none of my three usuals suspects seem to be guilty this time. What else could there be?

 

The fourth suspect

The fourth suspect that I came to think about was network performance. Long response times paired with little activity on the server could indicate that there is some other resource on the network that is taking the load. My test server isn't showing any signs of activity, but what about other servers or the network itself? The next handy tool would be to bring out iftop. Here's what the screenshot looks like:

iftop will tell you what other hosts your server is connected to at the moment and how much bandwidth each of those connections are using at the moment. Looking at the list I could explain the reason behind each and everyone of the foreign hosts except for two of them. The last one on the list is an internal host on the same network and that's simply a file server that shouldn't come in to play at all. snik1.gatorhole.com is the server that's putting the load on my machine, so it's only natural that it shows up here. One of the other hosts is the one I'm using ssh from, so that one was also expected. The two I'm curious about was lillamy.ballou.se and s114.loopia.se.

Ballou.se is the domain name of our hosting provider for the new site, I find it interesting that we'd be hitting it with traffic. Loopia.se is the domain name of the OLD hosting provider for the site we're trying to test. That's even more interesting! Perhaps the application have some hard coded configuration that makes it look up things in the old environment!

Any such configuration would either be stored in the database or in the file system so I went there to look. First, I was using grep to search the file system for any references to the old domain name or hosting provider but it turned up blank. Next I was a bit lucky. Knowing that Wordpress stores a lot of information in the wp_options table I did a search for it in phpmyadmin:

SELECT * FROM `wp_lo_options` WHERE option_value like '%lokaler.nu%'

"lokaler.nu" is part of the domain name of this application and the DNS records are actually still pointing to the old server. I got roughly 10 rows back and after looking at them a bit the one that caught my attention was a the configuration for a RSS Widget. This widget would get the latest 6 news items from the blog itself. Instead of using a Widget such as "Recent Posts" or similar, the administrator setting this up had opted to get the news via RSS. Just replacing that Widget for the more natural "Recent posts" was exactly what I was looking for. The next test was much better:

 By removing the RSS Widget and getting the results directly from the internal database was the key. The graph is now much more normal and for a normal 10-20 concurrent users we're in the comfortable 1-2 second range. We still have issues to look at and the next step is to install one of the better Wordpress cache solutions, but the most critical performance issue is resolved. Done.

 

 Conclusions

So, did we learn anything new today? Well, yes I'd like to think so. First and foremost, using a load testing tool to find out what's wrong with an application is actually a very good idea. Putting load on a web application is a good way to make the performance problems stand out a lot more. In this case, the tools used to examine what's going on on the server all takes a little time to produce interesting numbers. A single snapshot of top is useful, but looking at it for 60 seconds is an order of magnitude better. The same goes for all the other tools involved. Even getting to see the s114.loopia.se name show up in iftop would have been impossible or at least required a lot of luck if we had to generate the traffic using manual refresh in Firefox/Firebug.

Second, if you're using an RSS Widget in your Wordpress blog, please be careful. If you're using it to draw news from your own blog, consider using a Widget that can get data directly from the database instead. If you are pulling news from an external source, make sure you use a working cache module.

 

 

 

 

 

 

100,000 load tests!

Load Impact executes 100,000th load test!

We are extremely happy to announce that we recently passed the magic number: 100,000 load tests have been executed since launch early 2009! This is an incredible achievement, and we want to thank all our users for the huge interest you have shown in Load Impact. A hundred thousand load tests is just incredible, and something we never dreamed we would reach this quickly when we launched the service just a little over a year ago.

This blog entry is actually a bit late (as usual), as we executed load test #100,000 this wednesday (the 14th). This means that we have a lucky winner of our iPad competition also. All registered Load Impact users got the opportunity to guess when we would execute load test #100,000, and the one who got closest would win an iPad. The lucky winner is Ben Lamb from the UK (you will also be notified by email, Ben), who guessed that the test would start at 11:00 on the 14th. The test actually started at 7 pm, so Ben was only 8 hours off.

However, we also have another winner! We noticed that one contestant had actually made a better guess than Ben. Vladimir Mischenko from the Ukraine also got the day right, and his guess was 11:07:23!  (Only 7 minutes from Ben's guess. They must both have the same astrologer, or something). The only problem with Vladimir's guess was that he thought the test would start on April the 14th, 2009!

Of course, one could argue that this is our fault for providing a web input form where the default year selected was 2009 - pretty silly for a contest that was launched in 2010. But we prefer to pretend that it was by design. We just wanted to see if people were alert. However, we feel a bit sorry for Vladimir also, so we have decided to give him an iPod touch as a consolation prize (you will also be notified by email, Vladimir).

To all our users we would like to say thank you for your support in making Load Impact the world's most popular load testing tool! Expect to see some nice new features and improvements to the service this year. We are, for instance, very excited about our new load generator application (contact support@loadimpact.com if you want to try it out early) that is being beta-tested right now. It is programmable using Lua and which will allow fully dynamic transactions. More about that later.

Thank you everybody, and don't forget: load testing makes your site better!

 

 

 

 

 

Increased quotas

Increased data transfer quotas for Load Impact premium accounts

The data transfer quotas for Load Impact Premium users (Basic, Professional, Advanced) have been increased to allow more testing using the premium accounts.

The reason to have quotas in the first place has been to on one hand prevent abuse and on the other hand to restrict resource usage. As we have seen that there haven't been a lot of abuse issues surrounding the service, and we have plenty of spare capacity to run tests, we have decided to increase the quotas in order to allow our premium users to get as much testing done as possible.

Previously, these were the data transfer limits:

Load Impact BASIC:

- 40 GB data transfer per 30 days
- 20 GB per target IP per 30 days
- 5 GB per target IP per 24 hours

Load Impact PROFESSIONAL:

- 100 GB data transfer per 30 days
- 50 GB per target IP per 30 days
- 15 GB per target IP per 24 hours

Load Impact ADVANCED:

- 500 GB data transfer per 30 days
- 200 GB per target IP per 30 days
- 50 GB per target IP per 24 hours

Now, the data transfer limits are:

Load Impact BASIC:

- 50 GB data transfer per 30 days
- 50 GB per target IP per 30 days
- 10 GB per target IP per 24 hours

Load Impact PROFESSIONAL:

- 300 GB data transfer per 30 days
- 300 GB per target IP per 30 days
- 50 GB per target IP per 24 hours

Load Impact ADVANCED:

- 1000 GB data transfer per 30 days
- 1000 GB per target IP per 30 days
- 200 GB per target IP per 24 hours

 

Read more about the usage quotas for the different account types

Wordpress load testing part 3 - Multi language woes

Understanding the effects of memory starvation.

This is the third part in a series of posts about Wordpress and performance. In part 1,
we took a look at Wordpress in general. In part 2 and part 2.5 we reviewed a couple of popular caching plugins that can boost performance. In this part, we'll start looking at how various plugins can have a negative effect on performance and if anything can be done about it.

In the comments for one of the previous posts in this series, Yaakov Albietz asked us if we used our own service www.loadimpact.com for the tests. I realize that I haven't been that obvious about that, but yes, absolutely, we're exlusively using our own service. The cool thing is that so can you! If you're curious about how your own web site handles load, take it for a spin using our service. It's free.

We started out by looking for plugins that could have a negative effect on Wordpress performance, thinking, what are the typical properties of a bad performer plugin? Not so obvious as one could think. We installed, tested and tinkered with plenty of suspects without finding anything really interesting to report on. But as it happens, a friend of a friend had just installed the Wordpress Multi Language plugin and noted some performance issues. Worth taking a look at.

The plugin in question is Wordpress Multi Language (WPML). It's got a high rating among the Wordpress community wich makes it even more interesting to have look at. Said and done, we installed WPML and had it for a spin.

The installation is really straight forward. As long as your file permissions are set up correctly and the Wordpress database user have permissions to create tables, it's a 5-6 click process. Install, activate, select default language and at least one additional language and your done. We're eager to test, so as soon as we had the software in place, we did our first test run on our 10 post Wordpress test blog. Here's the graph:

Average load times 10 to 50 users

Ops! The baseline tests we did for this Wordpress installation gave a 1220 ms response time when using 50 concurrent users. We're looking at something completely different here. At 40 concurrent users we're getting 2120 ms and at 50 users we're all the way up to 5.6 seconds or 5600 ms. That needs to be examined a bit more.

Our first suspicion was that WPML would put additional load on the MySQL server. Our analysis was actually quite simple. For each page that needs to be rendered, Wordpress now have to check if any of the posts or pages that appears on that page have a translated version for the selected language. WPML handles that magic by hooking into the main Wordpress loop. The hook rewrites the MySQL query about to be sent to the database so that instead of a simple "select foo from bar" statement (over simplified), it's a more complex JOIN that would typically require more work from the database engine. A prime performance degradation suspect unless it's carefully written and matched with sensible indexes.

So we reran the test. While that test was running we sat down and had a look at the server to see if we could easily spot the problem. In this case, looking at the server means log in via ssh and run the top command (if it had been a Microsoft Windows box, we'd probably have used the Sysinternals Process Exporer utility) to see what's there. Typically, we'd want to know if the server is out of CPU power, RAM memory or some combination. We were expecting to see the mysqld process consume lots of CPU and verify our thesis above. By just keeping an unscientific eye on top and writing down the rough numbers while the test was running, we saw a very clear trend but it was not related to heavy mysqld CPU usage:

20 users: 65-75% idle CPU 640 MB free RAM
30 users: 50-55% idle CPU 430 MB free RAM
40 users: 45-50% idle CPU 210 MB free RAM
50 users: 0%   idle CPU  32 MB free RAM

As more and more users was added we saw CPU resource usage go up and free memory availability go down, as one would expect. The interesting things is that at 50 users we noted that memory was extremely scarce and that the CPU had no idle time at all. Memory consumption increases in a linear fashion, but CPU usage suddenly peaks. That sudden peak in CPU usage was due to swapping. When the server comes to the point where RAM is running low, it's going to do a lot more swapping to disk and that takes time and eats CPU. With this background information in place, we just had to see what happended when going beyond 50 users:

That's very consistent with what we could have expected. Around 50 concurrent users, the server is out of memory and there's a lot of swapping going on. Increasing the load above 50 users will make the situation even worse. Looking at top during the later stages of this test confirms the picture. The kswapd process is using 66% percent of the server CPU resources and there's a steady queue of apache2 processes waiting to get their share. And let's also notice that mysqld is nowhere to be seen (yes, this image is only showing the first 8 processes, you just have to take my word for it).

 

 

The results from this series of tests are not WPML specific but universal. As we put more and more stress on the web server, both memory and CPU consumption will rise. At some point we will reach the limit of what the server can handle and something got to give. When it does, any linear behavior we may have observed will most likely change into something completely different.

There isn't anything wrong with WPML, quite the opposite. It's a great tool for anyone that want a multi language website managed by one of the easiest content management systems out there. But it adds functionality to Wordpress and in order to do so, it uses more server resources. It seems WPML is heavier on memory than on CPU, so we ran out of memory first. It's also interesting to see that WPML is actually quite friendly to the database, at no point during our tests did we see MySQL consume noticeable amounts of CPU.

 

Conclusion 1: If you're interested in using WPML on your site. Make sure you have enough server RAM. Experience of memory requirements from "plain" Wordpress will not apply. From the top screen shot above, we conclude that one apache2 instance running Wordpress + WPML will consume roughly 17 Mb RAM, we havent examined how that differs with number of posts, number of comments etc, so lets use 20Mb as an estimate. If your server is set up to handle 50 such processes at the same time, you're looking at 1000 Mb just for Apache. So bring out your calculators and calculate how much memory your will need for your server by multiplying the peak number of users you expect with 20.

Conclusion 2: This blog post turned out a little different that we first expected and instead of blaming on poor database design we ended up realizing that we were watching a classic case of memory starvation. As it turned out, we also showed how we could use our load testing service to provide a reliable source of traffic volume to create an environment where we could watch the problem as it happens. Good stuff, something that we will appear as a separate blog post shortly.

 

Feedback

We want to know what you think. Are there any other specific plugins that you want to see tested? Should we focus on tests with more users, more posts in the blog, more comments? Please comment on this post and tell us what you think.

 

 1 2 3 4 Next →