We are using Splunk to monitor server.log file from a JBoss instance that rolls over daily (we use the logrotate utility to gz server.log daily)
The folder looks like this inside :
//var/log
//server.log
//server.log.June-12.gz
//server.log.June-13.gz
//server.log.June-14.gz
//server.log.June-15.gz
//
We use the universal forwarder on this linux box to push data out to the indexer.
Currently: Our configuration in the inputs.conf on the forwarder side looks like this.
[monitor://var/log/jboss_logs/server*]
disabled=0
index=os
sourcetype=serverlog
What this does unfortunately is that it gets the daily server.log (which its supposed to because of the server* wildcard) -- and then, everyday it indexes the uncompressed content of the server.*.gz files that are out there
Based on what is described here - apparently log rotation does not apply to the .gz and .tar file formats because they are treated as new files:
http://docs.splunk.com/Documentation/Splunk/latest/Data/MonitorFilesAndDirectories
Does this mean that we will definitely see duplicates ? Has anybody seen a problem like this previously ?
You can add whitelists/blacklists to your inputs.conf to filter out unwanted files:
blacklist = \.(gz)$
Should filter out anything in the folder with a .gz extension. (Or you could just whitelist .log files to get the same result. Depends on what else is in there)
http://docs.splunk.com/Documentation/Splunk/4.3.2/Data/Whitelistorblacklistspecificincomingdata
You can add whitelists/blacklists to your inputs.conf to filter out unwanted files:
blacklist = \.(gz)$
Should filter out anything in the folder with a .gz extension. (Or you could just whitelist .log files to get the same result. Depends on what else is in there)
http://docs.splunk.com/Documentation/Splunk/4.3.2/Data/Whitelistorblacklistspecificincomingdata