We are using Hunk in a POC and the way our HDFS file structure is set up is we have a folder for every date, so for example our firewall logs are set up like:
/logs/fwsm (parent dir)
--/2015-11-06
--/2015-11-05
--/2015-11-04
…
--/2015-10-31
We set up a main virtual index at the parent so we’re searching all logs under /logs/fwsm. An issue we’re running into is there is a need to search per day so I find myself creating a virtual index for every date, and with that I had two questions
• Is there any other way to search by date using the virtual indexes?
• Is there any limit to the amount of virtual indexes that can be created (as one can imagine, this will get real ugly when we start creating virtual indexes by date for multiple sourcetypes)?
Thx
Have you tried to use the Time Capturing Regex as shown in this document?
http://docs.splunk.com/Documentation/Hunk/latest/Hunk/Addavirtualindex
Working with Splunk Support, the solution was to change the 'Time Range' setting under the Time section to 1 day. Once this change was applied, the date/time picker worked.
Thx for everyone's feedback and help
Have you tried to use the Time Capturing Regex as shown in this document?
http://docs.splunk.com/Documentation/Hunk/latest/Hunk/Addavirtualindex
I did not see that option/document - I assume the time capturing regex means I'd be able to search by date/time within the main virtual index? Am I basing the regex on the file structure, or the log's date/time format?
Thx
The option to capture the Regex is part of the Virtual Index UI, Select the Customize Timestamp Format button.
Your assumption is correct, once you set it up you can use the search and the search time picker to select a specific day within the HDFS data.
Here is an example:
path = /logs/fwsm/...
accept =
regex = .?/fwsm/(\d+)-(\d+)-(\d+)/.
format = yyyyMMdd
Thx
Apologies as the actual dir structure is /LogCentral/Firewall, so I set my 'Time capturing regex' as follows - ?/Firewall/(d+)-(d+)-(d+)/. (leaving Time Format, Time Adjustment, and Time Zone untouched), but when I run a query - index=fwsm - using the Date/Time picker (I'm selecting Date Range|Before 11/3/2015), I'm getting 'No results found'
Thx
First, make certain there is a '.' char in front of your leading '?' char. (I realize that may just be a typo.)
Also, try setting the format to yyyyMMdd.
I added the '.' in front of the time leading '?', and added yyyyMMdd to the time format and it worked! I can't thank you enough!!
Greatly appreciated!
Happy it worked!
Was hoping to revisit this issue if possible as I'm seeing some weirdness with the time regex.
We have three directories on HDFS:
• /LogCentral/Firewall
• /LogCentral/ISE
• /LogCentral/ WindowsEvent
I have the following regex applied to our Firewall virtual index and I can use the time picker no problem (slightly modified from the original recommendation):
.?/Firewall/(d+)-(d+)-(d+)/.?)
However, applying the same format to the other two logs (below) I get no events at all no matter what dates I select in the time picker, yet I'm using the same format.
.?/ISE/(d+)-(d+)-(d+)/.?)
.?/WindowsEvent/(d+)-(d+)-(d+)/.?)
Tried the following regex and got a match on regex101.com:
.+ISE/(d+)-(d+)-(d+)
Yet when I enter that and try and run a search, it errors out:
[cdhprovider] Error while running external process, return_code=255. See search.log for more info
[cdhprovider] IOException - No input paths specified in job.
Thx
Yes, this will allow you to efficiently search by time within a single virtual index. The capturing regex will allow Hunk to choose which files to search based on the directories they are in, so it should match that, not the log structure.
Thx for the reply and info