For the most part my universal forwarders are working as expected.
But when I try to monitor this one log, it only sends the stream to one indexer and I think it's due to the hourly batch. It's not real time like the other logs, but it's an aggregate of other different syslogs consolidated into one huge log every hour. So once it streams on top of the hour, it spews a good amount of data.
My question is: how can I configure splunk to load balance this type of logging so that I can take advantage of the distributed search model? Do I have to separate the loggin by filtering a fields to a specific index?
Any info /direction would be really appreciated. Thanks.
To understand why this happens, one has to know that a universal or light-weight forwarder does not parse the data it forwards. This means that it is not aware of event boundaries present in the data it gathers.
As a result, incoming sustained data streams can only be pieced out (and therefore, load-balanced to the indexers) when the forwarder detects an interruption in the said stream.
For a file, this is typically an EOF.
As you can imagine, when you set Splunk to read a large file instead of a live one, it reads all the way to EOF in an uninterrupted manner, and will send the entire file to a given indexer. If the forwarder attempted to respect the autolb interval, it would run the risk to cut mid-event.
The work-around to this problem is to use a heavy-forwarder (which parses the data it reads, and is therefore aware of event boundaries) instead of a universal forwarder.
Do note that in the long run, data should be distributed somewhat evenly over your indexers given that every time we read a large file, we have 1 chance out of N (N being your number of indexers) to send it to a given indexer.