Mass-blocking IP addresses with ipset
Using ipset to block many IP addresses
I was sponsoring an upload of ipset to Debian the other day. This reminded me of ipset, a very cool program as I am going to show. It makes administering related netfilter (that is: firewall) rules easy along with a good performance. This is achieved by changing how rules match in iptables. Traditionally, an iptables rule matches a single network identity, for example a single IP address or a single network only. With ipsets you can operate on a bunch of (otherwise unrelated) addresses at once easily. If you happen to need bulk actions in your firewall, for example you want to blacklist a long list of IP addresses at once, you will love IP sets. I promise.
Drawbacks of netfilter
IP sets do exist for a longer time, however they made it into the upstream Linux kernel as of version 2.6.39. That means Debian Wheezy will support it as is; for Debian Squeeze you can either use a backported kernel or compile yourself the module shipped in the ipset-source package. In any case you additionally need the command line utilities named ipset. Thus, install that package before you can start. Having that said, Squeeze users should note the ipset syntax I am demonstrating below slightly differs from the syntax supported by the Squeeze utilities. The big picture remains the same, though.
IP utils do not conflict with iptables, but extend it in a useful way. You can combine both as you like. In fact, you still need iptables to turn IP sets into something really useful. Nonetheless you will be hitting iptables‘ limitation soon if you exceed a certain number of rules. You can combine as many conditions within a filter rule as you like, however you can only specify a single pattern for each condition. You figure, this does not scale very well if a pattern to match against does not follow a very tight definition such as a CIDR pattern.
This means you can happily filter whole network blocks such as 192.168.0.0/24 (which translates to 255 hosts) in iptables, but there is no way to specify a particular not specially connected set of IP addresses within this range if it cannot be expressed with a CIDR prefix. For example, there is no way to block, say, 192.168.0.5, 192.168.0.100 and 192.168.0.220 in a single statement only. You really need to declare three rules which only differ by the IP address. Pretend, you want to prevent these three addresses from accessing your host. You would probably do something like this:
iptables -A INPUT -s 192.168.0.5 -p TCP -j REJECT iptables -A INPUT -s 192.168.0.100 -p TCP -j REJECT iptables -A INPUT -s 192.168.0.220 -p TCP -j REJECT
Alternatively you could do
iptables -A INPUT -s 192.168.0.0/24 -p TCP -j REJECT
but this would block 251 unrelated hosts, too. Not a good deal I’d say. Now, while the former alternative is annoying, what’s the problem with it? The problem is: It does not scale. Netfilter rules work like a fall-through trapdoor. This means whenever a packet enters the system, it passes through several chains and in each of these chains, netfilter checks all rules until either one rule matches or there are no rules left. In the latter case the default action applies. In netfilter terminology a chain determines when an interception to the regular packet flow occurs. In the example above the specified chain is INPUT, which applies to any incoming packet.
In the end this means every single packet which is sent to your host needs to be checked whether it matches the patterns specified in every rule of yours. And believe me, in a server setup, there are lots of packets flowing to your machine. Consider for example a single HTTP requests, which requires at very least four packets sent from a client machine to your web server. Thus, each of these packets is passing through your rules in your INPUT chain as a bare minimum, before it is eventually accepted or discarded.
This requires a substantial computing overhead for every rule you are adding to your system. This isn’t so much of a problem if you haven’t many rules (for some values of “many” as I am going to show). However, you may end up in a situation where you end up with a large rule set. For example, if you suffer from a DDoS attack, you may be tempted to block drone clients in your firewall (German; Apache2. Likewise: for Lighttpd). In such a situation you will need to add thousands of rules easily.
Being under attack, the performance of your server is poor already. However, by adding many rules to your firewall you are actually further increasing computing overhead for every request significantly. To illustrate my point, I’ve made some benchmarks. Below you find the response times of a HTTP web server while doing sequential requests for a single file of 10 KiB in size. I am explaining my measurement method in detail further below. For now, look the graph. It shows the average response time of an Apache 2 web server, divided into four parts:
- connect time: this is the time passed by until the server completed the initial TCP handshake
- send time: this is the time passed by which I needed to reliably send a HTTP request over the established TCP connection reliably (that means: one packet sent and waiting for acknowledged by the server)
- first byte: this is time passed by until the server sent the first byte from the corresponding HTTP response
- response complete: this is time passed by until the server sent all of the remaining bytes of the corresponding HTTP response (remaining HTTP header + 10 KiB of payload)