Solved: Splunk Analytics for Hadoop: Why is Hunk searching... - Page 2

EricLloyd79 · ‎01-11-2017

We are new to Hunk (or now called Splunk Analytics for Hadoop).
I am attempting to run a query on our HDFS directories for the last 5 mins.
Here is the query: index=foo | sort 0 _time
So just return all the entries from the last 5 mins in the index foo sorted without truncation.

But it searches through all 8 million + events in our HDFS directories even after it seems to have found the complete list for the last 5 mins.

Any reasons why it might be doing this?

kschon_splunk · ‎01-11-2017

It sounds like an issue with your "et" (earliest time) configurations. When you give a search a time range, Splunk Analytics for Hadoop (formerly called Hunk) decides whether to read a particular file on HDFS based on the earliest and latest times for that file, as read from it's path. (It may also skip files based on other field values, if you have configured other path field extractions.) The relevant configurations for your virtual index are:

vix.input.1.et.regex
vix.input.1.et.format
vix.input.1.et.offset
vix.input.1.lt.regex
vix.input.1.lt.format
vix.input.1.lt.offset

You can get more information about these properties here:
http://docs.splunk.com/Documentation/Splunk/6.5.1/Admin/Indexesconf

If you've already set these props and you don't know what's going wrong, please post the provider and vix stanzas for this vix from your indexes.conf file, and an example HDFS file path, after anonymizing any confidential portions.

View solution in original post

Splunk Analytics for Hadoop: Why is Hunk searching all of the HDFS files instead of restricting it to the selected the time range?

Introducing the Splunk Community Dashboard Challenge!

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer Certification at ...