Splunk Search

Hunk is not filtering files based on timestamp

mik_cox
Explorer

I have a Hunk installation that is successfully (albeit slowly) pulling data from an s3:// filesystem. However, I'm having problems getting Hunk to only search relevant directories in s3. I see the correct results when running a search over a specific time range in the Hunk UI, but Hunk is still searching over all files in Hadoop to do so which is slow and wasteful.

For instance, I have my data in directories in s3 that follow this format:
s3://my-bucket/data/appname/2016/08/09/22/appname_22_30.log
which would correspond to the logs from my app that were collected on August 9th, 2016 for the minute of 22:30.

I have correspondingly set up my provider with the following properties:

vix.input.1.et.format = yyyyMMddHHmm
vix.input.1.et.offset = 0
vix.input.1.et.regex = .*?/appname/(\d+)?/?(\d+)?/?(\d+)?/?(\d+)?.*_?(\d{2}).*?
vix.input.1.lt.format = yyyyMMddHHmm
vix.input.1.lt.offset = 60
vix.input.1.lt.regex = .*?/appname/(\d+)?/?(\d+)?/?(\d+)?/?(\d+)?.*_?(\d{2}).*?

When running searches, I've noticed in my search.log that I get lines like this...

DEBUG ERP.s3-emr -  VirtualIndex - File meets time heuristic path=s3://my-bucket/data/myapp/2016/08/02/11/myapp_11_40.log, search.et=1470009600, search.lt=1470268800, file.et=0, file.lt=9223372036854775807, file.mtime=1470766383
08-09-2016 20:24:02.879
DEBUG ERP.s3-emr -  VirtualIndex - File meets the search criteria. Will consider it, path=s3://my-bucket/data/myapp/2016/08/02/11/myapp_11_40.log

...which indicate to me that the regex isn't doing its job as file.et and file.lt are not set propertly.

Does anyone have any idea as to why this might be happening?

Thanks in advance!!

0 Karma
1 Solution

mik_cox
Explorer

Answering my own question:

My major problem was that I had put the following properties...

vix.input.1.et.format = yyyyMMddHHmm
vix.input.1.et.offset = 0
vix.input.1.et.regex = .*?/appname/(\d+)?/?(\d+)?/?(\d+)?/?(\d+)?.*_?(\d{2}).*?
vix.input.1.lt.format = yyyyMMddHHmm
vix.input.1.lt.offset = 60
vix.input.1.lt.regex = .*?/appname/(\d+)?/?(\d+)?/?(\d+)?/?(\d+)?.*_?(\d{2}).*?

...on the provider, NOT on the virtual index as they should've been (in indexes.conf). Setting these properties up through the Hunk web interface on the Virtual Index editing page would've configured this properly.

View solution in original post

mik_cox
Explorer

Answering my own question:

My major problem was that I had put the following properties...

vix.input.1.et.format = yyyyMMddHHmm
vix.input.1.et.offset = 0
vix.input.1.et.regex = .*?/appname/(\d+)?/?(\d+)?/?(\d+)?/?(\d+)?.*_?(\d{2}).*?
vix.input.1.lt.format = yyyyMMddHHmm
vix.input.1.lt.offset = 60
vix.input.1.lt.regex = .*?/appname/(\d+)?/?(\d+)?/?(\d+)?/?(\d+)?.*_?(\d{2}).*?

...on the provider, NOT on the virtual index as they should've been (in indexes.conf). Setting these properties up through the Hunk web interface on the Virtual Index editing page would've configured this properly.

Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...