raw IO performance

February 03, 2007

In lighttpd 1.5.0 we support several network backends. Their job is to fetch static data from disk and send it to the client.

We want to compare the different backends for their performance and when you want to use which.

writev
linux-sendfile
gthread-aio
posix-aio
linux-aio-sendfile

The impact of the stat-threads shall also be checked.

We use a minimal configuration files:

server.document-root     = "/home/jan/wwwroot/servers/grisu.home.kneschke.de/pages/"

server.port              = 1025
server.errorlog          = "/home/jan/wwwroot/logs/lighttpd.error.log"

server.network-backend   = "linux-aio-sendfile"
server.event-handler     = "linux-sysepoll"
server.use-noatime       = "enable"

server.max-stat-threads  = 2
server.max-read-threads = 64

iostat, vmstat and http_load

We used iostat and vmstat to see how the system is handling the load.

$ vmstat 5
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 0  0 506300  12900  20620 437968    0    0 14204    17 5492  3323  4 23  9 63
 0  1 506300  11212  20620 439888    0    0 17720     4 6713  3966  3 29  3 66
 0  1 506300  11664  20632 440356    0    0 14460     8 5416  3120  2 24  2 71
 1  0 506300  18916  20612 433168    0    0 13180    50 5505  3088  2 23  2 72
 0  1 506300  11960  20628 440188    0    0 15860     6 5485  3307  2 24  3 71

$ iostat -xm 5
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
       2.20    0.00   24.40   70.40    0.00    3.00

Device:    rrqm/s wrqm/s    r/s   w/s  rsec/s  wsec/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sda         67.40   0.40  68.00  0.80 13600.00   14.40     6.64     0.01   197.88    12.87  176.83  11.23  77.28
sdb         82.80   0.40  84.60  0.80 17000.00   14.40     8.30     0.01   199.23    23.18  280.16  11.61  99.12
md0          0.00   0.00 302.20  0.80 30520.00    6.40    14.90     0.00   100.75     0.00    0.00   0.00   0.00

Our http_load process returned:

>http_load -verbose -timeout 40 -parallel 100 -seconds 60 urls.100k
9117 fetches, 100 max parallel, 9.33581e+008 bytes, in 60 seconds
102400 mean bytes/connection
151.95 fetches/sec, 1.55597e+007 bytes/sec
msecs/connect: 5.47226 mean, 31.25 max, 0 min
msecs/first-response: 144.433 mean, 3156.25 max, 0 min
HTTP response codes:
  code 200 -- 9117

We will use the same benchmark and the same configuration to compare the different back-ends.

Comparision

As comparision I tried the add other web-servers to the ring. As always, benchmark have to taken with a grain of salt. Don’t trust them, try to repeat them yourself.

lighttpd 1.5.0-svn, config is above
litespeed 3.0rc2
- epoll and sendfile are enabled. All the other options are defaults.
Apache 2.2.4 event-mpm from the OpenSUSE 10.2 packages
- MinSpareThreads = 25
- MaxSpareThreads = 75
- ThreadLimit = 64
- ThreadsPerChild = 25

I’ll try to get a shrinked down, text based config-file which only contains the necessary options for others to repeat.

Expectations

The benchmark is supposed to show that async file-IO for single-threaded webservers is good. We expect that:

the blocking network-backends are slow
that Apache 2.2.4 offers the best performance as it is threaded + event-based
lighttpd + async file-io gets into the range of Apache2

The problem with the blocking file-IO is that a single-threaded server can do nothing else while it is waiting for a syscall to finish.

Benchmarks

Running the http_load against different backends shows the impact of async-io vs. sync-io.

100k files

lighttpd
backend	throughput	requests/s
writev	6.11MByte/s	59.77
linux-sendfile	6.50MByte/s	63.62
posix-aio	12.88MByte/s	125.75
gthread-aio	15.04MByte/s	147.08
linux-aio-sendfile	15.56MByte/s	151.95
others
litespeed 3.0rc2 (writev)	4.35MByte/s	42.78
litespeed 3.0rc2 (sendfile)	5.49MByte/s	53.68
apache 2.2.4	15.04MByte/s	146.93

For small files you can gain around 140% more throughput.

without no-atime

To show the impact of the server.use-noatime = "enable" we compare the vmstat output for the gthread-aio output with and without noatime:

With O_NOATIME enabled:

procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 1 62 506300   9192  20500 470064    0    0 12426     5 7005  6324  3 27  7 63
 0 63 506300  10188  19768 469732    0    0 14154     2 8252  7614  3 30  0 67
 0 64 506300  10488  19124 470492    0    0 13589     0 8261  7483  3 27  0 69
 0 64 506300  10196  17952 473092    0    0 13062     8 7388  6560  3 25  8 65
 0 64 506300  10656  16836 474720    0    0 11790     0 6378  5074  2 23 11 64

With O_NOATIME disabled:

procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 0 21 506300  10408  15452 491680    0    0 10515   326 5362  1619  2 22 19 57
 3  7 506300  11116  17888 487588    0    0 11020   493 6056  2400  7 25 10 58
 0  0 506300  10200  19704 488004    0    0  8840   365 4506  1622  2 20 29 49
 2 14 506300  10460  21624 485288    0    0 12422   428 6464  1986  2 26 11 60
 0  1 506300   9436  24116 485316    0    0 12640   513 7159  2109  3 28  2 67
 0 21 506300  11864  25768 481588    0    0  8760     5 4436  1571  2 19 39 41
 0 21 506300  10352  24892 483412    0    0 11941   339 6005  1913  3 24 12 6

You see how the bo (blocks out) goes up and how bi (blocks in) goes down in the same way. As you usually don’t need the atime (file access file), you should either mount the file-system with noatime, nodiratime or use the setting server.use-noatime = "enable". By default this setting is disabled to be backward compatible.

10MByte files

lighttpd
backend	throughput	requests/s	%disk-util	user + sys
writev	17.59MByte/s	2.35	50 %	25 %
linux-sendfile	33.13MByte/s	3.77	70 %	30 %
posix-aio	50.61MByte/s	5.69	98%	60%
gthread-aio	47.97MByte/s	5.51	100%	50%
linux-aio-sendfile	44.15MByte/s	4.95	90%	40 %
others
litespeed 3.0rc2 (sendfile)	22.18MByte/s	2.72	65%	35 %
Apache 2.2.4	42.81MByte/s	4.73	95%	40 %

For larger files the win that you have with async-io is still around 50%.

stat() threads

For small files the performance is largely influenced by the stat() sys-call. Before a file is open()ed for reading it is first checked, if the file exists, if it is a regular file and if we can read from it. This sys-call is not async by itself.

We will use the gthread-aio backend and the set of 100k files and run the benchmark again, this time changing the server.max-stat-threads from 0 to 16.

threads	throughput
0	8.55MByte/s
1	13.60MByte/s
2	14.18MByte/s
4	12.33MByte/s
8	12.62MByte/s
12	13.10MByte/s
16	12.71MByte/s

You should set the number of stat-threads equal to your number of disks for optimal performance.

read-threads

You can also tune the number of read-threads. Each disk-read request is queue and then executed in parallel by a pool of readers. The goal is to keep the disk utilized at 100% and hiding the seeks for a stat() and a read() in the time that lighttpd spends in sending the data to the network.

threads	throughput
1	6.83MByte/s
2	11.61MByte/s
4	13.02MByte/s
8	13.61MByte/s
16	13.81MByte/s
32	14.04MByte/s
64	14.87MByte/s

It looks like 2 reads per disk are already a good value.

Posted by jan