Building a Load-Balancing Cluster with LVS
This article is part 3 of a series about building a high-performance web cluster powerful enough to handle 3 million requests per second. For information on finding that perfect load-testing tool, and tuning your web servers, see the previous articles.
So you’ve tuned your web servers, and their network stacks. You’re getting the best raw network performance you’ve seen using ‘iperf’ and ‘netperf’, and Tsung is reporting that your web servers are serving up 500,000+ static web pages per second. That’s great!
Now you’re ready to begin the cluster install.
Redhat has some excellent documentation on this already, so I recommend checking that out if you get lost. Don’t worry though, we’ll cover every step you need to make the cluster in this tutorial.
LVS Router installation
This single machine will act as a router, balancing tcp traffic evenly across all the nodes in your cluster. Choose a machine to act in this role, and complete the steps below. You can afford to use your weakest machine for this purpose, since routing IP traffic requires very little system resources.
1. Install LVS on the LVS Router machine.
yum groupinstall "Load Balancer" chkconfig piranha-gui on chkconfig pulse on
2. Set a password for the web ui
3. Allow ports in iptables
vim /etc/sysconfig/iptables -A INPUT -m state --state NEW -m tcp -p tcp --dport 3636 -j ACCEPT
4. Start the web gui
service piranha-gui start
-> Don’t start pulse until Piranha configuration is complete!
5. Turn on packet forwarding.
vim /etc/sysctl.conf net.ipv4.ip_forward = 1
sysctl -p /etc/sysctl.conf
6. Configure services on the Real Servers (webservers).
[[email protected] ~] service nginx start
Direct Routing Configuration
1. On the LVS Router, log in to the Piranha web ui to begin configuration.
In the Global Settings section, notice that Direct Routing is default. This is the option we’ll want to use, in order to achieve best performance. This allows our web servers to directly reply to requests sent to the cluster IP address (Virtual IP).
2. Click the VIRTUAL SERVERS tab to create the virtual web server. This “server” is actually your collective web cluster. It allows your nodes to act as one, responding together as if they were a single web server, hence the name “virtual server”.
Click ADD, then EDIT.
3. Editing the Virtual Server. Choose a cluster IP to use for the Virtual IP (not the IP of any real machine). And choose a device to attach that Virtual IP to.
Click ACCEPT when finished. The webpage will not refresh, but your data will be saved.
Click REAL SERVER to configure the next part.
4. Real Server configuration. This page allows you to define the physical machines, or Real Servers, behind the web cluster.
ADD all of your http servers here, then EDIT those Real Servers to insert the details.
Click ACCEPT when finished.
To get back to the previous page, click VIRTUAL SERVER, then REAL SERVER.
After all nodes are added to the REAL SERVER section, select each one and click (DE)ACTIVATE to activate them.
5. Now that all the Real Servers have been added and activated, return to the VIRTUAL SERVERS page.
Click (DE)ACTIVATE to activate the Virtual Server.
Router configuration complete! You can now exit the web browser, start up pulse, and continue on to configure the physical nodes.
service pulse start
Check ‘ipvsadm’ to see the cluster come online.
[[email protected] ~]# ipvsadm IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 192.168.122.10:http wlc -> 192.168.122.1:http Route 1 0 0 -> 192.168.122.2:http Route 1 0 0 -> 192.168.122.3:http Route 1 0 0
Direct Routing – Real Server node configuration
Complete these steps on each of the web servers in the cluster.
1. Create a Virtual IP address on each of the Real Servers.
ip addr add 192.168.12.10 dev eth0:1
Since this IP address needs to be brought up after boot time, add this to /etc/rc.local.
vim /etc/rc.local ip addr add 192.168.12.10 dev eth2:1
2. Create an arptables entry for each Virtual IP address on each Real Server.
This will cause the Real Servers to ignore all ARP requests for the Virtual IP addresses, and change any outgoing ARP responses which might otherwise contain the Virtual IP, so that they contain the real IP of the server instead. The LVS Router is the only node that should respond to ARP requests for any of the cluster’s Virtual IPs (VIPs).
yum -y install arptables_jf arptables -A IN -d <cluster-ip-address> -j DROP arptables -A OUT -s <cluster-ip-address> -j mangle --mangle-ip-s <realserver-ip-address>
3. Once this has been completed on each Real Server, save the ARP table entries.
service arptables_jf save chkconfig --level 2345 arptables_jf on
If arptables is functioning as it should, the LVS Router is the only machine that should respond to ping. Make sure pulse is shut off, then ping the Virtual IP from any of the cluster nodes.
If a machine does respond to ping, you can look in your arp table to find the misbehaving node.
ping 192.168.122.10 arp | grep 192.168.122.10
This will reveal the node’s MAC address and allow you to track it down.
Another useful test is to simply request a page from the cluster using ‘curl’, and then watch for the traffic on the LVS Router with ‘ipvsadm’.
[[email protected] ~]# watch ipvsadm
[[email protected] ~]$ curl http://192.168.122.10/test.txt
Cluster load testing with Tsung
Now that the cluster is up and running, you can see just how powerful it is by putting it through a strenuous load test. See this article for information on setting up Tsung to generate the right amount of traffic for your cluster.
[[email protected] ~] tsung start Starting Tsung "Log directory is: /root/.tsung/log/20120421-1004"
Leave this for at least 2 hours. It takes a long time to ramp up all those connections to achieve the peak amount of http requests per second. During that time, you can watch the load of your cluster machines using htop, to see individual core utilization.
Assuming you have EPEL & RPMforge repos installed…
yum -y install htop cluster-ssh cssh node1 node2 node3 ... htop
You’ll be able to see that the LVS Router is actually doing very little work, while the http servers are chugging along at top speed, responding to requests as quickly as they can.
Be sure to keep your Load Average slightly less than the number of CPUs in the system. (For example, on my 24-core systems, I try to keep the load at 23 or less.) That will ensure that CPUs are being well-utilized without getting backed up.
After Tsung has finished, view the report to see the details of your cluster’s performance.
cd /root/.tsung/log/20120421-1004 /usr/lib/tsung/bin/tsung_stats.pl firefox report.html