our log path looks like this
/var/www/webapp/application/logs/2014/09/13/03.log
where 2014 is the year, 09 is the month, 13 is the day, and 03 is the hour.
How can i capture this path pattern in input.conf so all auto generated starting with the year, month, day, hour are captured and the logs are sent to splunkstorm index?
Use regex under stanza
[monitor:///var/www/webapp/logs]
whitelist=\/var\/www\/webapp\/application\/logs\/\d{4}\/\d{2}\/\d{2}.log
Please change regex if it does not work 🙂
This will definitely limit the stanza to only match filenames like that (though I recommend anchoring the regex with ^ and $, but it won't make the numbers available elsewhere.
EDIT: I may have misunderstood your goal, and perahaps the other answer is the one you want.
If you just want to index those files, a wildcards or regex whitelist will do the job.
If you want to find out the times from the path, the rest of my answer is relevant.
Splunk will attempt to guess the date from the filename first by TIME_FORMAT and then falling back to regexes as an initial seed/guess value before running the time extraction per-event logic. In other words the filename can influence timestamping.
However, I'm unclear whether the full path is passed into this logic. I think it is not.
The remaining options are:
Timestamps in the file is definitely the best outcome, but it might not be an availble choice to you.
have you looked at the wildcard characters? Either of the following should work - take a look at the docs for inputs.conf in the Search Reference manual.
[monitor:///var/www/webapp/logs/.../*.log]
[monitor:///var/www/webapp/logs/2014/*/*/*.log]