Performance tuning: Intel 10-gigabit NIC
By default, Linux networking is configured for best reliability, not performance. With a 10GbE adapter, this is especially apparent. The kernel’s send/receive buffers, TCP memory allocations, and packet backlog are much too small for optimal performance. This is where a little testing & tuning can give your NIC a big boost.
There are three performance-tuning changes you can make, as listed in the Intel ixgb driver documentation. Here they are in order of greatest impact:
- Enabling jumbo frames on your local host(s) and switch.
- Using sysctl to tune kernel settings.
- Using setpci to tune PCI settings for the adapter.
Keep in mind that any tuning listed here is only a suggestion. Much of performance tuning is done by changing one setting, then benchmarking and seeing if it worked for you. So your results may vary.
Before starting any benchmarks, you may also want to disable irqbalance and cpuspeed. Doing so will maximize network throughput and allow you to get the best results on your benchmarks.
service irqbalance stop service cpuspeed stop chkconfig irqbalance off chkconfig cpuspeed off
Method #1: jumbo frames
In Linux, setting up jumbo frames is as simple as running a single command, or adding a single field to your interface config.
ifconfig eth2 mtu 9000 txqueuelen 1000 up
For a more permanent change, add this new MTU value to your interface config, replacing “eth2” with your interface name.
vim /etc/sysconfig/network-scripts/ifcfg-eth2 MTU="9000"
Method #2: sysctl settings
There are several important settings that impact network performance in Linux. These were taken from Mark Wagner’s excellent presentation at the Red Hat Summit in 2008.
Core memory settings:
- net.core.rmem_max – max size of rx socket buffer
- net.core.wmem_max – max size of tx socket buffer
- net.core.rmem_default – default rx size of socket buffer
- net.core.wmem_default – default tx size of socket buffer
- net.core.optmem_max – maximum amount of option memory
- net.core.netdev_max_backlog – how many unprocessed rx packets before kernel starts to drop them
# -- tuning -- # # Increase system file descriptor limit fs.file-max = 65535 # Increase system IP port range to allow for more concurrent connections net.ipv4.ip_local_port_range = 1024 65000 # -- 10gbe tuning from Intel ixgb driver README -- # # turn off selective ACK and timestamps net.ipv4.tcp_sack = 0 net.ipv4.tcp_timestamps = 0 # memory allocation min/pressure/max. # read buffer, write buffer, and buffer space net.ipv4.tcp_rmem = 10000000 10000000 10000000 net.ipv4.tcp_wmem = 10000000 10000000 10000000 net.ipv4.tcp_mem = 10000000 10000000 10000000 net.core.rmem_max = 524287 net.core.wmem_max = 524287 net.core.rmem_default = 524287 net.core.wmem_default = 524287 net.core.optmem_max = 524287 net.core.netdev_max_backlog = 300000
Method #3: PCI bus tuning
If you want to take your tuning even further yet, here’s an option to adjust the PCI bus that the NIC is plugged into. The first thing you’ll need to do is find the PCI address, as shown by lspci:
[[email protected] ~]$ lspci 07:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01)
Here 07.00.0 is the pci bus address. Now we can grep for that in /proc/bus/pci/devices to gather even more information.
[[email protected] ~]$ grep 0700 /proc/bus/pci/devices 0700 808610fb 28 d590000c 0 ecc1 0 d58f800c 0 0 80000 0 20 0 4000 0 0 ixgbe
Various information about the PCI device will display, as you can see above. But the number we’re interested in is the second field, 808610fb. This is the Vendor ID and Device ID together. Vendor: 8086 Device: 10fb. You can use these values to tune the PCI bus MMRBC, or Maximum Memory Read Byte Count.
This will increase the MMRBC to 4k reads, increasing the transmit burst lengths on the bus.
setpci -v -d 8086:10fb e6.b=2e
About this command:
The -d option gives the location of the NIC on the PCI-X bus structure;
e6.b is the address of the PCI-X Command Register,
and 2e is the value to be set.
These are the other possible values for this register (although the one listed above, 2e, is recommended by the Intel ixgbe documentation).
|MM||value in bytes|
And finally, testing
Testing is something that should be done in between each configuration change, but for the sake of brevity I’ll just show the before and after results. The benchmarking tools used were ‘iperf’ and ‘netperf’.
Here’s how your 10GbE NIC might perform before tuning…
[ 3] 0.0-100.0 sec 54.7 GBytes 4.70 Gbits/sec bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 60.00 5012.24
[ 3] 0.0-100.0 sec 115 GBytes 9.90 Gbits/sec bytes bytes bytes secs. 10^6bits/sec 10000000 10000000 10000000 30.01 9908.08
Wow! What a difference a little tuning makes. I’ve seen great results from my Hadoop HDFS cluster after just spending a couple hours getting to know my server’s network hardware. Whatever your application for 10GbE might be, this is sure to be of benefit to you as well.