I have just added two new logs to be monitored on one of my servers but the data is not coming back for those files. A sample of the inputs.conf file follows:
[monitor:///Path/SubPath/App_logs/*]
whitelist=(?:App1_Name_*.log|App2_Name_*.log)$
sourcetype=SourceType
index=Index
The serverclass.conf file follows:
[serverClass:Apps]
restartSplunkd=true
whitelist.0=hostname
[serverClass:Apps:app:inputs_apps]
[serverClass:Apps:app:props_apps]
The log files are named with the current date_time in the area of the name where '*
' shows in the stanza above. The 'interesting' thing about these log files is they are not rolling. They are created every 30 minutes and have a fixed format/size for each. I was wondering if because these log files do not change and are only created, is Splunk somehow not getting these files because of their 'unique' nature of creation?
Please advise.
UPDATE: I came to the conclusion that the RegEx expression I was using may work OK to list the files from a directory listing (ls -al /Path/SubPath/App_logs/App1Name_*.log
) but the "*
" works differently in a RegEx process than in a directory listing. So I modified it to expand the expression to a date pattern as follows:
[monitor:///Path/SubPath/Applogs/*]
whitelist=(?:App1Name_[09-]{4}-[09-]{2}-[0-9]{2}.log|App2Name_[09-]{4}-[09-]{2}-[0-9]{2}.log)$
sourcetype=SourceType
index=Index
I tested that change and I get the results I expected, so I feel confident that this part of the monitoring config is correct. But I still am not getting my data pulled back. There are no error messages in the splunkd.log. In another, similar instance on a different Forwarder, I have similar results. The inputs.conf looks like this:
[monitor:///usr/Path/logs/*]
Whitelist=(?:[^_]+_access_log.[0-9]{4}-[0-9]{2}-[0-9]{2}-00_00_00|[^_]+_error_log.[0-9]{4}-[0-9]{2}-[0-9]{2}-00_00_00|admin_access_log|admin_error.log)$
sourcetype=SourceType
index=Index
NOTE: the sourcetype and index parms are faked to protect the innocent.
After making the date pattern change for this group of files and getting it deployed to the Forwarders, I saw this in the splunkd.log:
11-29-2011 15:00:38.248 INFO TailingProcessor - Parsing configuration stanza: monitor:///usr/Path/logs/*.
11-29-2011 15:00:38.685 INFO WatchedFile - Will begin reading at offset=13528 for file='/usr/Path/logs/com_access_log.2011-11-29-00_00_00'.
11-29-2011 15:00:38.783 INFO TcpOutputProc - Connected to idx=10.175.229.188:8002
11-29-2011 15:00:38.946 INFO WatchedFile - Will begin reading at offset=73 for file='/usr/Path/logs/mob_access_log.2011-11-29-00_00_00'.
Those log files listed were updated throughout the day but the Forwarder never sent back the data to the indexer. Reading deeper into the doc, I noticed something about how Splunk recognizes that a file has been altered and the date/time stamp is not the only thing. In at least one of these logs, the messages are VERY similar and it is possible that the hash Splunk is generating does not cause it to determine the file has changed. The other log however, although it has duplicated messages also grows quite large with each update so I am thinking its hash should calcualte differently. But maybe not.
I am not clear on the cscSalt parameter and how that works. Would someone provide an example of how I might try this to resolve this problem?
I resolved this by 'upping' the RegEx in the whitelist phrase. I converted the '*' to '[0-9]{8}' and it picked up the logs.
I resolved this by 'upping' the RegEx in the whitelist phrase. I converted the '*' to '[0-9]{8}' and it picked up the logs.
I suspect the files are too similar. Splunk checks the beginning and end of a file. If splunk thinks the file is the same as another one it has already indexed, it will not re-index it even if the name and timestamp are different.
You can overrule this by adding the line below to your inputs.conf stanza
CRCSALT = <SOURCE>
Adding a missing path does cause a slight overhead as splunk has to try to scan it and wait for a response and this is worse if the drive is not local. If it is only one folder, I wouldn't worry but if more, I would consider having separate files for each server type.
If you are going down that path, I'd consider using deployment server to manage which devices get which config.
Thank you for your response Bob. I started suspecting that as I read more on this topic in the doc. I was also wondering if having a path and files that will not be found on some of the servers to which this inputs.conf is deployed can cause a problem. I am using this one file to go to two sets of servers, each with its own logs but which may be consolidated later and all would be on the servers at that point. Can including a missing path and log files cause a problem?