So I am trying to configure Hunk 6.3.1 to search my avro files in Hadoop. Here is an example of these .avro files in HDFS
/user/root/avro/customer/2015/06/24/20150624.avro
/user/root/avro/customer/2015/06/25/20150625.avro
/user/root/avro/customer/2015/6/24/20150624.avro
/user/root/avro/customer/2015/6/25/20150625.avro
Notice that some months have the leading zero (06), and some don't.
Below is my current setting, but it's not grabbing all the months I need:
[avrodata]
vix.provider = hdp23provider
vix.input.1.path = /user/root/avro/customer/...
vix.input.1.accept = \.avro$
vix.input.1.et.regex = .*?/customer/(\d+)/(\d+)/(\d+)/.*
vix.input.1.et.format = yyyyMMdd
vix.input.1.lt.regex = .*?/customer/(\d+)/(\d+)/(\d+)/.*
vix.input.1.lt.format = yyyyMMdd
vix.input.1.lt.offset = 86400
Try to include the HDFS forward slash / as part of the Regex and then also include them in the format.
For example,
The below can deal with single and multiple digits in the path
[avrodata2]
vix.input.1.accept = \.avro$
vix.input.1.path = /user/root/avro/customer/...
vix.provider = hdp23provider
vix.input.1.et.format = y/M/d
vix.input.1.et.regex = .*?/customer/(\d+/\d+/\d+)/.*
vix.input.1.lt.format = y/M/d
vix.input.1.lt.offset = 86400
vix.input.1.lt.regex = .*?/customer/(\d+/\d+/\d+)/.*
Try to include the HDFS forward slash / as part of the Regex and then also include them in the format.
For example,
The below can deal with single and multiple digits in the path
[avrodata2]
vix.input.1.accept = \.avro$
vix.input.1.path = /user/root/avro/customer/...
vix.provider = hdp23provider
vix.input.1.et.format = y/M/d
vix.input.1.et.regex = .*?/customer/(\d+/\d+/\d+)/.*
vix.input.1.lt.format = y/M/d
vix.input.1.lt.offset = 86400
vix.input.1.lt.regex = .*?/customer/(\d+/\d+/\d+)/.*
Thanks @rdagan. That appears to work. I also looked at this doc: http://docs.splunk.com/Documentation/Hunk/6.3.1/Hunk/Addavirtualindex
Which points to this oracle page: http://docs.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html
So the time format used appeared to be java simple date format, slightly different than the strptime() format used in splunk: http://docs.splunk.com/Documentation/Splunk/6.2.0/Data/Configuretimestamprecognition