Background, Wall of Text
This article will detail the major behind-the-scenes upgrade of the hardware that powers OCAU, that took place in mid 2009. For historical info about OCAU's various servers and a quick summary of the current configuration, check out the OCAU Server Hardware page in the Wiki. This article will provide a lot more info about the decisions behind the upgrade and the details of the upgrade. It will be long and waffly and contain far too much detail, because it's largely a mechanism for me to organise my own thoughts and notes for future reference. But I don't get to play with enterprise-class hardware too often and I figure other enthusiasts might enjoy a peek at this unusual hardware too.
So, if you're just here for the nice pictures of shiny new hardware and don't want to read lots of text, go to page 2!
OCAU is, in simple terms, a collection of PHP scripts that access several MySQL databases. That's true of the forums, the news page, the Wiki, etc etc. PHP is a scripting language for making webpages, and is handled by the webserver software, in our case Apache. MySQL is a database system which stores, organises and retrieves information - for example the text of this article, or a private message in the forums. You can run Apache and MySQL on many operating systems including Windows, but we have always used Linux in various forms over the years.
So, you can split the "work" of producing an OCAU page into two separate tasks. One is Apache, which takes requests from the internet, runs the PHP scripts, talks to the MySQL task, gets the data it needs, builds the page that was requested and sends it back out to the internet. The second is MySQL, which answers requests from Apache and either stores information given to it by Apache (perhaps a new forum message), or retrieves the information that Apache has asked for (perhaps a search result, or the content of a thread). These two tasks are so separate that you can quite literally run Apache on one computer, and run MySQL on a separate computer, and they will, if configured correctly, continue to spit out webpages just as if they were running on the same PC.
That separation of tasks is historically how things were set up for OCAU. We would have a relatively powerful "database server", which primarily or exclusively handled the MySQL side of things. This would have lots of memory for caching the database contents, nice fast disks so that any information could be stored or retrieved very quickly, and powerful CPUs for searching and sorting through the data itself. Generally we had a less-powerful machine handling the Apache/PHP side of things. Because it doesn't deal with as large amounts of data as the database server, this "web server" could have less memory, less powerful CPUs and much slower disks. In historical generations of OCAU server hardware the "database server" role has been taken by bbq and previously pie, and the "web server" role has been taken by beer and previously chips. Before the "pie and chips" upgrade OCAU was small enough that a single server could handle both tasks (mostly) fine, so we were on thor, and before that, odin.
Things get Complicated
However, during the life of the "beer and bbq" servers, this separation of roles became quite blurred. Firstly, the "database server" role of handling mysql spread onto the other, less-powerful server. This was because over time we found that the forum search engine was, due to issues related to file locking and database update priority issues, causing the entire forum to have to wait for long searches to finish. The best solution to this is to employ a process called replication, which puts an identical, largely read-only, copy of the database onto another server. Any changes to the database, such as new forum posts, are written to the "master" database which is on the more powerful server, as usual. But those changes are "replicated" automatically and behind the scenes to the "slave" copy on the less-powerful server. These updates are given low priority on the slave server. The slave copy is used by the search engine and other forum tasks that don't need to modify the database. Obviously if we were changing both copies at the same time, but with different information, there would be a conflict and the databases would rapidly become out of sync. But, importantly, if a long search query is running on the slave server, it does not make everyone else wait until it's finished. People can continue using the forums without even realising that the slave server is chugging away doing a huge search for someone - or even several at the same time.
The "web server" role was also blurred, and spread onto the server originally intended to only handle mysql. This is because the webserver was being overwhelmed, both in terms of CPU power and available memory, by the sheer number of visitors during peak times - especially once it started performing more work for mysql as well, as described in the previous paragraph. In fact, eventually the majority of the forum webserver load was moved over to the "database server" machine, because it had CPU time and memory to spare. So the nice neat roles of "web server" and "database server" became quite muddied, and the quite significant difference in specifications between the two servers was becoming an issue.
Things get Old
By the beginning of 2009, our previous servers "beer" and "bbq" were showing their age. We were regularly having over 1000 people online in the forums and there were sometimes issues with response times under heavy load, even spreading both the database and apache load across both servers. BBQ was outdated hardware when we got it, never mind nearly four years later - although we were and still are very grateful to Pluscorp (in a previous incarnation) for donating most of it. Beer was newer hardware, but less powerful, and not able to fully take on half of the load.
They were also outdated from a software perspective - we were running SLES 9 on both machines, which limited us to PHP 4 and mySQL 4, both now outdated. SLES 11 is out now, to give you an idea of how far out of date they were. Not being able to use PHP 5 or mySQL 5 also limited what hardware-extending tweaks we could use like accelerators and caching programs. MySQL 5 also handles locking better, so would suffer less from the search-related issue described earlier. But to upgrade the operating system on either machine meant moving the website completely over onto only one machine, while the other was taken out of service and upgraded. As we've seen, it's unlikely either machine would have coped very well with the entire website running on it.
What to do?
So, I rolled a few ideas around in my head. I could move everything over to BBQ, the more powerful of the two servers. Maybe add more memory at the time, to help it cope. I could turn off the search engine during this transitional period, but that disables handy features like "Find Your Posts" and "Find New Posts", not to mention the ability to search for things!
Or, I could buy a new server, install the new software on it, and swap it for one of the live servers. That way there'd only be a short outage while we switched the hardware over. Then I could upgrade the software on the swapped-out server, and swap it with the second old server. But rack space is limited, and there are issues replicating between different versions of MySQL, so deciding which server to swap out first was difficult.
Another option would be to move to a fully managed setup, where the hardware is someone else's problem. You pay more because you're leasing their hardware, but if a hard drive dies, it's someone else's job to replace it. That's kinda tempting, because it would limit the expense of upgrading the hardware, and remove the worry of hardware failures from the back of my mind. But over time you do end up paying quite a lot for that kind of hosting. For example if you upgrade the memory from the default configuration - as we would certainly have to do - you pay that upgrade fee every month with most providers, even though the upgrade was of course only performed once.
Speaking of which, I should mention at this point that BBQ and Beer were very kindly hosted by AusGamers. Their sponsorship of our hosting for many years was a vital part of our success and growth. But if I wanted to go to a leased setup, I would have to move away from AusGamers to a much less economical arrangement.
So, I pondered these options for some time. I scribbled project plans on post-it notes, working out the best order in which to swap out and upgrade things. I bought a test server on eBay, bourbon, which was intended to be a drop-in replacement for beer, before I decided it wasn't powerful enough for the future. So in its place I bought a more powerful server from someone in the forums, again intending to do a quite complicated swap-and-upgrade project over the course of a month or so. But I felt no real rush and kept considering the best way to handle things.
But then, a decision was almost made for me, by external forces. For a variety of reasons, AusGamers were unable to continue sponsoring our hosting. It was out of their control and we have no hard feelings. They're a great bunch of guys and OCAU is forever in their debt. But we had 30 days to find a new home - hardware AND hosting.
I gave myself two weeks to find a new host in Australia. I figured I'd need the remaining two weeks to actually move everything over to the new host. A few feelers, both on a commercial and a sponsorship basis, failed to turn up anything too solid, or more importantly, affordable. So with time pressure building, I had to move us to the USA, land of cheap hosting, to give us more time to examine options closer to home. I leased a two-server "virtual rack" setup from ThePlanet in Dallas, Texas, and moved OCAU over to those servers.
Three months passed. BBQ and Beer were decommissioned. I felt quite strange pulling them out of the rack and bringing them home in the van. It seems bizarre to have an emotional attachment to hardware, but you have to remember I had seen these servers in the flesh maybe twice during their service life. They existed more as a concept I can SSH to, and coax performance out of, and cross fingers that they start responding to pings after rebooting them, rather than the cold silent metal objects in the back of the van. Not to mention that between them they had hosted OCAU, a huge part of my life, for years. It was strange that they weren't running anymore, but the website was, on some alien hardware I would never see and didn't own.
But ThePlanet's hosting was costing me a small fortune each month. I had a few options available from several Australian companies - and I'm sorry I couldn't accept them all! But in the end Internode came through with pretty much the perfect setup for us, and an offer I couldn't refuse. So now we had hosting, but no hardware. BBQ and Beer were retired, and it was time to upgrade.
Credit Card Time
The very basic finger-in-the-air goals for our two new servers were:
So, read on to see what hardware we ended up on, with lots of photos of nice shiny new hardware etc!
- Either server can run the entire website at its current level of traffic. This is obviously for redundancy reasons, but also makes for a pretty clear "this is where we are starting again from" line in the sand.
- The two-server configuration should at installation be able to handle 2,000 forum users during peak time - roughly double our current normal peak and 500 people more than we have ever had online, giving us room to grow.
- The two-server configuration should be able to meet our needs for the next three years at least, with minimal or preferably no upgrades during that time.