I am trying to determine what the frequency is that Splunk reads log files. I have Data Inputs setup against 5 web servers in a farm reading W3C logs. It seems that it can take hours for Splunk to get to some of the logs. While one server is read pretty much instantly. I don't understand why there is a difference.
The log files are created hourly and rotated on a ~monthly basis. When I originally setup the Inputs I didn't specify to tail the file, I wanted all of the history with the initial load. It's been 2.5 hours and I still only have one server's data from the thirty minute time span I am looking at.
I guess my question is, does Splunk have the ability to tail multiple files at once? Is there a way to configure Splunk so I have more up to date log files index across my web farms?
It has been five hours and I still only have the one server's logs. These Inputs have been setup for more than a month, it is not a case of an initial load delay. I have verified the logs are all in GMT, and they appear to be setup the same on the Data input (files) screen.
Can more than one file be tailed at a time?
It could be that Splunk just hasn't got through the volume of data that your asking it to go through. Do you have an idea of how much data actually exists within these folders in terms of size and number of files?
You may be able to get some additional information from the file input status monitor:
It may give you an idea of what has been read so far and help you determine how things are progressing.
Splunk does have the capability to tail multiple files at one time, but there are limits in that it does take time to process data.
I am not sure what your input configuration looks like. Could you perhaps paste it?
Sometimes it is helpful to do an Real-Time(All Time) search on the hosts to see which data is coming into Splunk real time via the time selector. The first thing I'll do when I don't see the data I think I should be seeing is an all time, real time search for the source/sourcetype/host.