Symptom:The unbound caching DNS Server can be optimized for various different workloads. While most performance optimizations can be done at runtime using the configuration file or the unbound-control tool, some more basic decisions that influence the performance are done during the compile-time of the unbound binaries.
Problem:This article will explain the technical differences of the possible "flavors" of unbound and will give proposals under which type of DNS workload a specific flavor will perform best.
The performance differences of these flavors depend on the DNS query traffic patterns. To select the right flavor for a real world implementation of unbound, it is mandatory to either have a good knowledge on the DNS traffic patterns or to run careful benchmark tests with the real data.
Men & Mice unbound binaries package flags:
- "" - compiled in threaded mode with buildin-eventlib
- e - compiled against LibEvent in /usr/local/libevent
- f - compiled in forked mode (non-threaded)
- v - compiled against LibEv in /usr/local/libev
Forked vs. threaded operationTo be able to utilize modern multi-core and multi-processor systems, the unbound DNS server can run in multiple processes. This can be either lightweight processes (called threads) or full processes. To create full processes in a Unix/Posix system, the "fork" system call is used, thus the unbound version using full processes is called the "forked" mode (see
http://en.wikipedia.org/wiki/Thread_(computer_science) and http://en.wikipedia.org/wiki/Fork_(operating_system)).
The default configuration setting for the unbound compilation process will compile unbound in threaded mode. In threaded mode, all threads will share the same cache memory. Once a certain query has been resolved and is stored in the cache, it can be accessed by every other thread.
In "forked" mode, each unbound process has its own cache memory. Cache entries are not shared. For example if there are four unbound processes running on a system, and one process resolves "www.example.com. IN A", it will store the answer in its private cache. If the same query comes in a second later but is given to a different unbound process, this process can not make use of the data in the cache of the other process, it will need to resolve the same query again and save the data independently in its own cache.
Because every process has its own private cache, the memory consumption is linear to the number of processes running. An unbound server in threaded mode and configured with 4GB of cache memory will use 4GB of memory, regardless of the number of threads running. An unbound server in forked mode and configured with 4GB of cache memory will allocate 4GB per process (thread), so if 4 process threads are configured, it will use 4x4GB = 16GB of RAM memory.
The benefits of running in forked mode is the performance of cache write and delete actions. Because the caches are not shared, every process has full control over the cache memory and does not need to coordinate with the other processes. In a threaded unbound configuration, the unbound threads must coordinate write access to the cache. For example it could happen that once process is about to delete a cache entry (because the TTL has expired) while another tries to update the very same entry. These situations are resolved by locks that can block the cache memory of one process. Other process that want to access the same part of the cache have to wait until the access to the cached is free again.
The performance difference between threaded and forked mode depend on the chance that concurrent access to the cache happen.
"forked" mode unbound is used in situations where memory is plenty and the cache entries have rather long TTLs, so the caches in all processes will over time "learn" the same data and return this to the clients. Or the "forked" scheme is used in situations where the delegation information is being cached, the actual query data not. This can be in DNS systems that resolve for mail server anti-spam blacklist systems.
To compile unbound in forked mode, use the configure switches "--without-pthreads" and "--without-solaris-threads".
Internal event lib vs. Libevent/LibevModern DNS servers like unbound do UDP port randomization for security. They use a new, random UDP port for every new outgoing query. There is certain performance costs with TCP and UDP communication. Unbound uses event based I/O so that incoming queries can be processed while waiting for data to be send out.
By default unbound uses a build in event system that works very fast but is limited to 512 file descriptors (outgoing ports, outgoing queries, as of unbound 1.4.6). The limit of 512 outgoing queries works fine in small- to medium size installations, or where most incoming queries can be answered from the cache (as the limit is on outgoing DNS queries). For installations where a higher number of outgoing queries is required, a unbound DNS binary compiled against Libevent or Libev will help. Both Libevent and Libev are slower per query, but they can manage a much higher number of outgoing traffic, so if more that 512 simultaneous queries are needed, the perceived performance of unbound is higher, because no query need to go into the request list to wait for a UDP socket to be available.
To compile unbound with LibEvent or Libev, install a know good version of one of these libraries (from binary package or from source) and then use the configure switch "--with-libevent=/path/to/libevent/or/libev"
Libevent vs. LibevOne author of Libev has tested Libev against Libevent and found Libev faster: http://libev.schmorp.de/bench.html
However this benchmark was a snapshot of the versions of Libevent and Libev available in May 2008. Also the performance seen in unbound can be different. Performance can also be different between the event "backends" and operating systems. Men & Mice recommends to do careful benchmark testing between Libevent and Libev to find the solution with the best performance for a given set of Unbound/Operating System/DNS traffic.
Using unbound statisticsthe unbound statistics can help making the decision on which unbound binary to use. The unbound statistics can be printed with the "unbound-controll stats" command. First test for "threaded" vs. "forked", then for "buildin-eventlib" vs. "libevent/Libev".
To decide between "threaded" and "forked" mode, test both flavors and look at the "total.num.cachehits" and "total.num.cachemiss" values. If they are similar between forked and threaded mode, and you have enough RAM memory, use the "forked" mode. If "cachemiss" is significantly higher in forked mode, use the threaded mode.
To decide between "buildin-eventlib" and "libevent/libev", test both flavors and monitor the "total.requestlist.max" and "total.requestlist.exceeded" values. If "total.requestlist.max" reaches 512, or if "total.requestlist.exceeded" is higher zero, use Libevent/Libev.
Packages:Unbound packages for Debian Linux, RedHat Linux and Solaris 10 can be