Getting Data In

Splunk Universal/Heavy Forwarder with compression has low CPU utilisation

Explorer

I'm planning a Splunk deployment that will involve 2 indexers, 1 search head and 4 forwarders spread across various networks in various geographic locations. We are using a forwarder (UF or HWF, we're not settled on which one yet) to collect syslog traffic on private networks and forward this traffic over a VPN to a remote site where the index and search heads will be located.

In our labs we've spent time testing the forwarding speed we can expect to achieve and found that compression severely affects the maximum EPS results, however Splunk appears to have very low CPU utilisation when compression is enabled.

Servers:

  • Dell R410
  • 2x Xeon E5606 Quad-core processors
  • 16GB DDR3 RAM
  • 2x 600GB SAS 15k RAID-1 disks
  • 2x Bonded gigabit network cards
  • Centos 6.2 64-bit minimal
  • Connected via dedicated gigabit switch

Testing scenario:

  • 2 servers generating UDP syslog datagrams at an EPS level we can control, which forwards to:
  • 1 Splunk forwarder, we've tried both UF and HWF with the same results, which forwards to:
  • 1 full Splunk indexing the events showing us the incoming EPS on a 30s average graph

The most representative example I can give of the problem is this:

  • Set the two syslog generators to 20k EPS each (40k EPS total)
  • Disable compression on the forwarder
  • End Splunk server receives 40k EPS
  • top reports CPU usage around 250-300%, about what we expect for the hardware based on reading other splunk base posts. We top out around 100k EPS in our testing.

However:

  • Keep the two syslog generators at 20k EPS each (40k EPS total, still)
  • Enable compression on the forwarder
  • End Splunk server receives around 13.5-14.5k EPS, never above 15k EPS
  • top reports CPU usage around 120%, rarely above 150%

The concerning factor for us is the relatively light CPU usage. We will be deploying forwarders on dedicated hardware and we want Splunk to utilise as much of their power as possible.

Are there any settings in Splunk that we can tune or tweak to make Splunk UF/HWF more "greedy" and help increase our EPS rate with compression? We've tweaked all the options we can find in Splunk inputs.conf, outputs.conf and at a kernel level for UDP buffers and queues. However this just delays the appearance of the problem of being limited to ~14k EPS when compression is enabled, whilst using relatively low resource utilisation.

Thanks!

Tags (2)
0 Karma
1 Solution

Splunk Employee
Splunk Employee

Suggestions to improve thruput:

  • Make sure you have increased the maxKBps setting in the [thruput] section of limits.conf. The Light and Universal Forwarders by default cap at 256 kilobytes/sec unless otherwise overridden, and I believe that cap is pre-compression.
  • Do not capture UDP directly with a Splunk forwarder. Instead, capture using syslogd, rsyslog, syslog-ng, or whatever, write to a file, and use the forwarder to monitor the file instead. (If you have a very high data rate, have syslog split the data to mulitple files, preferably by host/source.)
  • Use SSL to forward. SSL forwarding compresses the stream, while standard compression compresses each "chunk" of data read by a forwarder. In the case of receiving UDP, each syslog packet is it's own chunk, and thus you'll have rather inefficient compression.
  • Use LWF or UF, not HWF. (However, if you have multiple Splunk indexers and a very high data rate of a single thru a single forwarder, you might want to make that one forwarder HWF. You should do further testing in that case to decide.)
  • Don't worry so much about low CPU utilization. You're correct to worry about thruput, but the solution is not to try to use more CPU, but to eliminate or improve what is using the CPU.

View solution in original post

Splunk Employee
Splunk Employee

Suggestions to improve thruput:

  • Make sure you have increased the maxKBps setting in the [thruput] section of limits.conf. The Light and Universal Forwarders by default cap at 256 kilobytes/sec unless otherwise overridden, and I believe that cap is pre-compression.
  • Do not capture UDP directly with a Splunk forwarder. Instead, capture using syslogd, rsyslog, syslog-ng, or whatever, write to a file, and use the forwarder to monitor the file instead. (If you have a very high data rate, have syslog split the data to mulitple files, preferably by host/source.)
  • Use SSL to forward. SSL forwarding compresses the stream, while standard compression compresses each "chunk" of data read by a forwarder. In the case of receiving UDP, each syslog packet is it's own chunk, and thus you'll have rather inefficient compression.
  • Use LWF or UF, not HWF. (However, if you have multiple Splunk indexers and a very high data rate of a single thru a single forwarder, you might want to make that one forwarder HWF. You should do further testing in that case to decide.)
  • Don't worry so much about low CPU utilization. You're correct to worry about thruput, but the solution is not to try to use more CPU, but to eliminate or improve what is using the CPU.

View solution in original post

Explorer

Thank you! SSL forwarding did the trick, event rates were back up toward 100k EPS using slightly less bandwidth than having normal compression on.

Based on your recommendation I've also put rsyslog in place to capture the events. Thanks for the tip, it should save us losing events should Splunk crash.

0 Karma

Explorer

Thank you very much for the detailed reply, I'll be sure to try each of these points when I'm back in the office on Monday. We have tried rsyslog to capture UDP syslog which dropped our top EPS down to ~85k EPS, but still couldn't top 15k compressed.

It sounds like SSL could be a good option for us, I'm aware that UDP syslog -> TCP forwarding has a big overhead with the 64k blocks and SSL could help a lot.

0 Karma