High-Availability Failover w/ Apache and Red Hat Enterprise Linux

Networking, Microsoft, Linux, Open Source Add comments

A few days ago, I wrote about the beginning of a web site migration from IIS5 on Windows 2000 Server to Apache2 on Red Hat Enterprise Linux 4. For those who didn’t read the original article (above), the machine that will serve as the “new” web server is a dual Pentium 3 600MHz box — quite old, but it should serve us well.

Because the machine is so old and I value uptime and availability of our various web sites, I wanted to find a nice, easy, cheap way to provide redundancy in the event that something happens to the machine. I found this in “heartbeat”.

In my case, I’m using Red Hat Enterprise Linux. We have a site license for this, so the cost isn’t a factor. For others, the cost may very well be a factor. In that case, I’d encourage you to look into CentOS. CentOS is about as close as you can get to RHEL, without being Red Hat. Everything here should work perfectly on CentOS, as well.

I started out with two completely identical PCs. Both are dual P3 650MHz boxes with 512MB of RAM and 20GB IDE hard drives — exact same model of RAM, exact same model of hard drive. I then installed RHEL AS 4 on the first one and configured it how I wanted it — network settings, hostname, software packages, etc. When everything was set up on what I’ll call the “primary” server (”zeus”), I shut it down and installed the 2nd of the two hard drives in the primary server as well. I then booted up the primary with a Knoppix CD. Once I got in to a shell, I simply ran

dd if=/dev/hda of=/dev/hdb
This gave me a 2nd hard drive that was an exact, bit-for-bit copy of the first. I reinstalled the 2nd drive in the “secondary” server (”hera”). I booted up hera, disconnected from the network, so that I could change the hostname, IP address, etc. Once that was done, here’s what I had:
zeus - 192.168.43.51 hera - 192.168.43.52
I’ve obfuscated the actual IPs, because the real servers have public IP addresses, but that’s irrelevant at this point. Using heartbeat, the servers will “share” a “virtual IP address”, 192.168.43.53. This is the IP address that will published in DNS as “www.example.com”. When a client surfs to “www.example.com”, they will actually be connecting to 192.168.43.53, the virtual IP address. This virtual IP address will be “passed back and forth” between the two servers if or when one goes down.

We need a way for the servers to monitor each other, to detect when the other goes down. In this case, I’m done so via two different methods. First off, I’ve installed an extra network interface card (NIC) in each server and a crossover cable connects them. zeus is configured as 10.10.10.1 and hera is configured as 10.10.10.2 on these secondary NICs. In addition, a null modem cable connects /dev/ttyS1 on zeus to /dev/ttyS1 on hera. This provides us a secondary way to monitor, in case the crossover cable were to go bad, for example.

To get started, download the following files:

I obtained these files from the Ultra Monkey web site. I’ve mirrored them locally for your convenience, but feel free to grab them from the link above if you don’t trust me.

Download each of the above files and when you’re done, run “rpm -Uvh filename.rpm” to install each one. Helpful hint: place them all in one directory, run “rpm -Uvh *.rpm” and rpm will figure out which order they need to be installed in and install them in order.

First off, let’s set up a software watchdog. A “watchdog” runs in the background and monitors the server. If the server becomes “sick”, it reboots it after one minute. This can be very handy. Feel free to skip this part if you’re not interested — it’s optional.

Load the software watchdog module:

modprobe softdog
See if the device node is already there:
ls -l /dev/watchdog
If it’s already there (and looks like “crw——- 1 root root 10, 130 Apr 19 13:05 /dev/watchdog “), then you’re good to go. If not, then:
mknod /dev/watchdog c 10 130
Create the /etc/ha.d/ directory if it doesn’t already exist (”mkdir /etc/ha.d/”). Let’s start by creating /etc/ha.d/ha.cf using your favorite text editor. Here’s what my complete ha.cf file looks like:
serial /dev/ttyS1 watchdog /dev/watchdog bcast eth1 keepalive 2 warntime 6 deadtime 12 initdead 150 baud 9600 udpport 694 autofailback on node zeus.example.com node hera.example.com respawn hacluster /usr/lib/heartbeat/ccm
Let’s go through it line by line:
serial /dev/ttyS1
This tells heartbeat that we’ll be using a serial port (/dev/ttyS1, in this case) to monitor the heartbeat.
bcast eth1
In addition, we’ll also be using broadcasts on the eth1 network device to monitor the heartbeat.
keepalive 2
We want heartbeats being sent every 2 seconds.
warntime 6
This is the time value (in seconds) before a “late heartbeat” will be logged.
deadtime 12
A node will be considered “dead” after 12 seconds.
initdead 150
On some systems, the network takes some time to come up after a reboot. This should be at least twice the normal deadtime.
baud 9600
This controls the speed we’ll be communicating at over the serial port.
udpport 694
This defines the UDP port we’ll be sending heartbeats over (on “eth1″, see above).
autofailback on
With auto_failback set to on, the primary server will resume control if/when it comes back up (after “dying”). If this is off, the secondary server will stay “in charge” after the primary goes away.
node zeus.example.com node hera.example.com
This defines the nodes in our cluster. This should be the exact names as reported by “uname -n”.
respawn hacluster /usr/lib/heartbeat/ccm
This tells heartbeat to run the command “/usr/lib/heartbeat/ccm” as the user “hacluster”, to monitor it, and restart it (”respawn”) if it dies.

Make sure to create this file on both nodes. It is exactly identical in my case, though it could potentially be different (different ethX devices, for example).

Next up is /etc/ha.d/haresources. This file defines what services we’re running and who the “owner” is of those services. In our case, the only service is “httpd”.

zeus.example.com 192.168.43.53 httpd
This file must be identical on all nodes! “zeus.example.com” refers to the primary server, 192.168.43.53 is the virtual IP address (the one that’s shared) and “httpd” refers to the service we’re monitoring.

Last up is /etc/ha.d/authkeys. This provides a bit of security to the cluster and controls the “secret keys” needed to join the cluster. There are a few different options here, but since we’re operating over a crossover cable (about as secure as we can get), we’re going to use “crc”.

This file must also be the same on all nodes, and looks something like this:

auth 42 42 crc
This file must also be readable and writeable by root. You can ensure this by running
chmod 600 /etc/ha.d/authkeys
Next up, let’s set up heartbeat to actually start when the system boots up. We want to make sure that the “softdog” module loads during system startup:
echo “/sbin/modprobe softdog” >> /etc/init.d/rc.sysinit
And activate the startup scripts:
/sbin/chkconfig –levels 345 heartbeat on
These commands must be run on all nodes, by the way. Now, let’s start up heartbeat:
/sbin/service heartbeat start
You can now monitor the logs in /var/log/ha-log to verify proper startup. I had a small issue here that took me a took to figure out. The nodes couldn’t communicate with each other. This turned out to be due to my fairly tight firewall, which wasn’t allowing the UDP packets in on the eth1 interfaces. :/

If everything starts up okay, the primary server should start up the Apache webserver, while it won’t be running on the secondary. As soon as the primary goes away, however, the secondary server will take over the virtual IP address and start up Apache.

In my case, I can ping the virtual IP address, 192.168.43.53, from another PC. While this ping is running, I can run “/sbin/shutdown now -r” on the primary, and watch it shut down and the secondary take over. Monitoring the output from ping shows that the failover is so quick that not even a single ping packet is lost. SUCCESS!

Later articles will show how I keep the data on the servers (authentication files — /etc/{group,passwd,shadow} — and the web files — under /var/www) in sync using rsync.

No Responses to “High-Availability Failover w/ Apache and Red Hat Enterprise Linux”

  1. Elfshadow Says:

    Nice write up! Looking foward to seeing how this project moved forward.

    I did notice that you listed the command:

    dd if=/dev/hda of=/dev/hda

    Shouldn’t that be:

    dd if=/dev/hda of=/dev/hdb (or c, depending on which controller the drive was put on?)

Leave a Reply

WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Login


Copyright © 2007 Jeremy L. Gaddis.
25 monkeys, 0.443 seconds.