As I mentioned on Friday, I’ve recently built a multi-server web hosting environment around lighttpd, MySQL, and Ubuntu Linux. Ironically, my lighttpd web server slowed to a crawl that very evening! It turns out that I had not properly tuned lighttpd to function in a Linux environment. I was surprised to find that the Ubuntu package did not include basic Linux settings! I referred to the lighttpd performance documentation for help.
Anyone familiar with its internals will tell you that everything is a file to a UNIX operating system. It’s the philosophy behind the system: Network connections, storage systems, system parameters, and processes all have file interfaces, and each of these pseudo-files needs a unique file descriptor.
What does this mean for lighttpd? Well, every time a visitor accesses a page, lighttpd uses three file descriptors: An IP socket to the client, a fastCGI process socket, and a filehandle for the document accessed. Lighttpd stops accepting new connections when 90% of the available sockets are in use, restarting again when usage has fallen to 80%. With the default setting of 1024 file descriptors, lighttpd can handle a maximum of 307 connections. This is a lot. But it is possible to exceed this number under times of high load.
To prevent this from happening, we can double the limit without any trouble. Simply set “server.max-fds” to 2048 in /etc/lighttpd/lighttpd.conf.
Contrary to much of the advice I found on the Internet, lighttpd spawned by root does not appear to use the “nofile” limits set in /etc/security/limits.conf, since these are for PAM and only apply to full interactive logins. There is a system-wide limit that can be set in /etc/sysctl.conf, however. Check your default with “cat /proc/sys/fs/file-max” and make sure it’s over 10,000. Mine was set to 12640 so I left that alone.
One reason that file descriptors get used up so quickly is HTTP keep-alive. To improve performance, modern web servers keep client connections alive to handle multiple requests instead of building up and tearing down connections for each item in a page. Keep-alive is tremendously beneficial to performance, but tends to keep unnecessary connections alive, too. By default, lighttpd allows 16 keep-alive requests per connection, allows idle sessions to remain alive for 5 seconds, and gives reads and writes 1 minute and 6 minutes to complete, respectively.
Although lighttpd has pretty aggressive defaults (especially compared to Apache), a period of heavy traffic and a few slow clients could see many unused connections sticking around. The server.max-keep-alive-idle setting default of 5 seconds can be reduced to as low as 2, if you assume your clients are reasonably quick about requesting data, but a value of 3 or 4 is probably realistic. You may want to increase the server.max-keep-alive-requests value from the default of 16, but you probably don’t need to. The server.max-read-idle and server.max-write-idle settings are tempting targets, but these situations are usually fairly rare so let’s not monkey with them.
Mechanics: Polling and Sending
The best bang for your lighttpd buck is to tune the server to use better kernel resources to check for file changes and write data to the network. There are three critical items here, each of which is set to a conservative universal setting by default.
One of the major areas of UNIX development over the last decade was how to handle the tens of thousands of connections experienced by Internet servers. This “C10K Problem” is documented in excruciating detail if you’re interested, but the net of it is that each version of UNIX has an advanced mechanism to handle I/O events. Since kernel version 2.6, Linux has sys_epoll, a so-called edge-triggered polling mechanism which scales linearly with the number of connections. But lighttpd runs on many different flavors of UNIX, so it has to default to the older and less-scalable “level-triggered” poll system. To remedy this, set “server.event-handler” to “linux-sysepoll”.
Another mechanism that varies widely across UNIX systems is how to actually read and write data from the disk to the network. All systems include basic read() and write() calls, which transfer data into and out of system memory. Lighttpd defaults to using these to move data around. But Linux includes a more advanced call, sendfile, which can move data around without copying it into memory. We can enable this by setting “server.network-backend” to “linux-sendfile”, which ought to improve performance for larger (multi-megabyte) files without impacting smaller ones.
Lighttpd attempts to improve performance further by caching the output of the UNIX stat() command. It includes a basic (“simple”) cache which keeps the result of file system calls in memory for one second. But many Linux distributions include more advanced accelerators: FAM was the original, and a lighter-weight workalike called Gamin is now included by default in Ubuntu’s lighttpd install. Therefore, we can improve stat calls simply by allowing lighttpd to use Gamin: Set “server.stat-cache-engine” to “fam” and you’re rolling!
One more useful tweak to consider, although it’s not included in the official lighttpd performance document, is not updating the “atime” parameter on served pages. This is a bit of a religious issue among some UNIX administrators, but I feel safe in saying that since my web server logs all accesses and I’m not using any kind of hierarchical storage system to store them, I don’t care when each php, html, and png file was last accessed. We can stop writing atime values by mounting the entire filesystem with “noatime”, but I like the more granular approach offered by lighttpd: Simply set “server.use-noatime” to “enable” and it won’t bother keeping this updated for the files it accesses. Everything else will continue as it always has but with reduced disk I/O.
Lighttpd has pretty good default settings, but a few might be tweaked if we need to respond to higher server loads. The more important area of tuning is simply enabling the advanced features of the 2.6.x Linux kernel and Ubuntu system we are using: Enable sys_epoll, sendfile, and Gamin and disable atime updates.
I’ll post more information as I stumble across it. I’m still learning, but my server performance as improved dramatically: Pingdom tools reports that it used to take upwards of half a minute to load my blog’s home page and it now loads in under seven seconds! That’s progress!
# Maximum number of file descriptors, default = 1024 server.max-fds = 2048 # Maximum number of request within a keep-alive session before the server terminates the connection, default = 16 server.max-keep-alive-requests = 16 # Maximum number of seconds until an idling keep-alive connection is dropped, default = 5 server.max-keep-alive-idle = 4 # Maximum number of seconds until a waiting, non keep-alive read times out and closes the connection, default = 60 server.max-read-idle = 60 # Maximum number of seconds until a waiting write call times out and closes the connection, default = 360 server.max-write-idle = 360 # Which event handler to use, default = poll server.event-handler = "linux-sysepoll" # How to handle network writes, default = writev server.network-backend = "linux-sendfile" # Requires FAM or Gamin to be installed, default = simple server.stat-cache-engine = "fam" # Whether to update the atime setting on file access, default = disable server.use-noatime = "enable"