Getting Data In

How to improve universal forwarder performance

stamstam
Explorer

Hi all,
we forward about 300GB per day from a single forwarder instance to an indexer cluster.
the forwarder is on a strong machine(24 cores, 130GB RAM, ssd) and we already configured limits.conf and server.conf for "unlimited" thruput (800MB/s) and parallelPipelines=2. also we increased the size of the parsing queue and structuredParsingQueue.
Normally we do ok, but sometimes, due to some kind of a problem data is being stacked in the machine and than the forwarder can't seem to close the gap.
the problem is that the universal forwarder lists all of the files first, and only then he forwards them to the indexers. what i want is for it to bo lazy, to list files and forward them at the same time.

Is there any config field i didn't change yet that will help me, it would be great.
Also, if there is anyone from Splunk reading this, you should know that this feature will drastically improve the universal forwarder and will change our life forever.
Thanks in advance,

0 Karma

MuS
SplunkTrust
SplunkTrust

Hi stamstam,

some time ago a customer had a similar problem when they used an universal forwarder on a 2 CPU server to read over 10 millions directories - long story short: because it took the UF too long to list all directories the actual log files never got read because they hit the ignoreOlderThan options set.

The main reason for this is the universal forwarder uses a so called Breadth-first search to discover any files and directories (Technical details can be found here https://en.wikipedia.org/wiki/Breadth-first_search). An easy understandable image of the process looks like this (it is from the wikipedia page)

alt text
The cause for this behaviour was the use of a single wildcard monitor stanza on the top level directory.
The solution was to use very specific monitor stanzas instead, and the performance increased immediately and the customer never had a similar issue again.

Hope this helps ...

cheers, MuS

stamstam
Explorer

We already increased the number of open file descriptors limit, also we use batch input and we can't have a more specific root directory.
we have many many sub-directories and more are add constantly. we can't write an input stanza for each one.
will it actually increase our performance? in the end it will list the same folders and files.

0 Karma

MuS
SplunkTrust
SplunkTrust

Okay, this sounds like a conceptual/design problem here rather than a real Splunk problem.

There are options that could help you to increase the performance of the universal forwarder, like splitting the inputs stanzas to be more specific on the path and add specific sourcetypes, index names to each stanza for example. Because using just one single input stanza for such a mount of data brings in a lot of other concerns like is all data going into the same index, if not who is doing the parsing and re-writes the index name? This can be handled by the UF in a very efficient way.

Also, by using a batch stanza you rely on the IOPs performance of the disk, because each file will be deleted after it has been read adding wait-time to the overall performance.

The current concept makes it really hard for the universal forwarder to perform the best, and by adding more and more data it will not work better.

Try to question if this is the best way for the universal forwarder to read the amount of data you produce and then change the way the data gets ingested.

cheers, MuS

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Making your monitor paths as specific as possible by pushing wildcards to the end will help the forwarder spend less time looking for files and more time forwarding them.

---
If this reply helps you, Karma would be appreciated.
0 Karma

ddrillic
Ultra Champion

Right, we had similar cases when running the forwarder on HDFS with huge number of files. ulimit -n should be very high. Btw, how large is your parsing queue? - Universal Forwarder ParsingQueue KB Size

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...