Monitoring Splunk

From forwarder to index to search is taking too long -- roughly 10 to 15 minutes

Communicator

Hey all,

I have a system that is generating a log that I need to have indexed and pull into Splunk. The system is on several individual boxes--so it spits out output and was set up with 3.4.5 to go to our central server.

The only problem is that it is now taking 10+ minutes to get from the system to the saved searches on the Splunk server. The system times are in sync, and there's no time zones to screw up the timing. The entries appear in order so it doesn't appear to be a problem with entries being lost or anything.

I am thinking that by adjusting some of the configuration, I can help reduce the problem. I saw these things that looked somewhat promising:

  • maxKBps = increasing this size
  • Making sure that we are tailing the file, rather than trying to do the entire file
  • indexAndForward = false to prevent the local 3.4.5 forwarder indexing it
  • not sending cooked data and sending raw data instead
  • Using tcpout....

If anyone else has any suggestions on how we might improve the speed from soup to nuts I'm very interested in hearing about it.

Thanks!

0 Karma

Splunk Employee
Splunk Employee

By default, light (universal) forwarders will usually limit themselves to transferring data at a maximum of 256 kbps. In these scenarios, increasing the limit may help with more real-time results.

You should find out the true delay of the data and check for any indexing problems.

If Splunk is behind with respect to indexing, you will see a delay like this. To check if Splunk is behind on indexing, look for blocked or filled queues:

index=_internal source=*metrics.log blocked

OR

index=_internal source=*metrics.log group=queue | timechart avg(current_size) by name

If you have consistently blocked queues or they are filled (1000 is the max value) then you will need to debug why Splunk is queue-ing data.

Communicator

Back again with an update.....

We're down to about 5 to 7 minutes of delay in getting from the log to the forwarder to the index. Our times are all synchronized, so there are no issues from there.

We are looking at settings and tweaking. Ideally, we want it to go down to about 3 minutes.

Thanks for allowing me to pick your great brains.
Wwhitener

0 Karma

Communicator

Thanks everyone.

We're looking at the issue of possible time lags and time zone difficulties right now.

0 Karma

Champion

Is there any reason why you couldn't update the system to a more up to date universal forwarder?
The system footprint will be less (although it still clearly shouldn't take as long as you have indicated).
The number of files shouldn't slow down the forwarder due to its system of CRC checking a file, have you looked in the splunkd.log to see if there are any errors or issues happening?

I guess with stuff like this you want to verify how often these logs are being written, be sure that they are updating very frequently. Perhaps even do a manual run of the log to track through the system?
If it is a large log file you can play around with the maxkbps however this shouldn't "restrict" events from showing up, it may delay some but I suppose if it is a large file this could have an adverse effect if too low.

You're doing extractions on the data before sending? this is going to slow things down and I believe the speed of this has been improved on 4+ but in most circumstances it is best to simply define a target index for the data and let the indexer handle the rest.

Esteemed Legend

You probably have too many files on the forwarder and Splunk is getting bogged down in the housekeeping of checking each one of them for changes (changes that will probably never happen). Try moving/deleting the old files and see if this helps.

0 Karma