We noticed that, right after a log rotation, the data is not being indexed until the next log rotation. That is, lets say, one file was rotated at 8 AM (until which the data was already indexed). The next file is written from 8 AM to 7 PM. But this file is not indexed until around 7 PM.
We are on a Universal forwarder 7.0.3
Below is the monitoring stanza
[monitor:///opt/mapr/hadoop/hadoop/logs/*nodemanager*]
sourcetype = my_st
index = my_index
disabled = 0
ignoreOlderThan = 2h
We added ignoreOlderThan = 2h
recently to see if it helps. But the issue still persists.
The latest file will be with yarn-mapr-nodemanager-host_name.log
and the latest archived file be with yarn-mapr-nodemanager-host_name.log.1
.
What is interesting is intermittently on certain servers, the current file gets indexed only at the time of its roll/archival i.e. (lets say after 10-11 hours) but with actual file name but not archive file name. And the issue of live/current file not getting indexed on time does not happen all the time. The next live file might get indexed on time. There should be an ideal settings to avoid this.
Any insights on this will be helpful.
Whatever Splunk says about handling log rotation files, seems to have some bug. Are we missing anything here? Please suggest.
There was a point at the beginning where everything was working fine, right? And if you restart Splunk, it starts to get caught up but then it falls behind again, right? That is what happens when there thousands of files in the directory which Splunk has to dig through. You can either install housekeeping rules that move/delete files that have not been modified for X days/hours OR create soft links. Check out my answer here:
https://answers.splunk.com/answers/309910/how-to-monitor-a-folder-for-newest-files-only-file.html
So in this case, will there be a delay of 5 minutes? or may be not? Please clarify. We will have to check/work with different team to put this cron on 500+ nodes
So you do have thousands of files in that same directory?
No. We monitor around 500+ nodes/hosts. Each node will have 20 (archived .log.*)+1(latest .log)+1(latest .out) i.e. total of 22 files in each node/host. We still feel that there should be a straightforward setting/solution for this. It will be very difficult to have work around(soft links) on 500+ nodes that too convincing other team.
The problem occurs on around 5-10 nodes each day.
It doesn't matter how many files are being monitored there, it matters how many files total exist there. Are your 22 the only files there, or are there hundreds/thousands of others?
Yeah, sure. Just 22 files in the monitored directory with the name nodemanager. There is a sub-directory and it contains around 57 sub-directories, but the name does not contain nodemanager. So probably around 57*5*3=855 unwanted files.
Should we try adding recursive = false
, just to avoid scanning sub-directory?
Definitely add that setting, but it should not be necessary because you have no wildcards in your path, right?
Have the wildcards for the file name. But I think Splunk adds the WATCH on that path, which means it might look for sub-directories by default? Anyways, we will add the setting recursive = false
and monitor for a few days.
The ignoreOlderThan
is definitely not going to help and will certainly cause other problems so definitely take that out.
Okay. Wanted to see if it helps to reduce load on Forwarder as there are 20 files archived.
Also I forgot to mention that there is other file with extension .out yarn-mapr-nodemanager-host_name.out
, which seem to be ingesting fine under the same sourcetype, when the other file(s) has issue.