We are new to Hunk (or now called Splunk Analytics for Hadoop).
I am attempting to run a query on our HDFS directories for the last 5 mins.
Here is the query:
index=foo | sort 0 _time
So just return all the entries from the last 5 mins in the index foo sorted without truncation.
But it searches through all 8 million + events in our HDFS directories even after it seems to have found the complete list for the last 5 mins.
Any reasons why it might be doing this?
It sounds like an issue with your "et" (earliest time) configurations. When you give a search a time range, Splunk Analytics for Hadoop (formerly called Hunk) decides whether to read a particular file on HDFS based on the earliest and latest times for that file, as read from it's path. (It may also skip files based on other field values, if you have configured other path field extractions.) The relevant configurations for your virtual index are:
You can get more information about these properties here:
If you've already set these props and you don't know what's going wrong, please post the provider and vix stanzas for this vix from your indexes.conf file, and an example HDFS file path, after anonymizing any confidential portions.
I don't actually see those properties : vix.input.* when browsing the properties listed under Additional Settings in Virtual Indexes in Splunk Web. Are these properties somewhere else or do we need to add them and what should they be set to?
You need find a file called indexes.conf within the directory splunk is installed on. The vix.input.* is inside. Post the contents here.
vix.command.arg.3 = $SPLUNKHOME/bin/jars/SplunkMR-hy2.jar
vix.env.HADOOPHOME = /usr/hdp/188.8.131.52-1245/hadoop
vix.env.HUNKTHIRDPARTYJARS = $SPLUNKHOME/bin/jars/thirdparty/common/avro-1.7.7.jar,$SPLUNKHOME/bin/jars/thirdparty/common/avro-mapred-1.7.7.jar,$SPLUNKHOME/bin/jars/thirdparty/common/commons-compress-1.10.jar,$SPLUNKHOME/bin/jars/thirdparty/common/commons-io-2.4.jar,$SPLUNKHOME/bin/jars/thirdparty/common/libfb303-0.9.2.jar,$SPLUNKHOME/bin/jars/thirdparty/common/parquet-hive-bundle-1.6.0.jar,$SPLUNKHOME/bin/jars/thirdparty/common/snappy-java-184.108.40.206.jar,$SPLUNKHOME/bin/jars/thirdparty/hive12/hive-exec-1.2.1.jar,$SPLUNKHOME/bin/jars/thirdparty/hive12/hive-metastore-1.2.1.jar,$SPLUNKHOME/bin/jars/thirdparty/hive12/hive-serde-1.2.1.jar
vix.env.JAVA_HOME = /usr/lib/jvm/jre-1.8.0
vix.family = hadoop
vix.fs.default.name = hdfs://10.x.x.x.:xxxx
vix.mapreduce.framework.name = yarn
vix.output.buckets.max.network.bandwidth = 0
vix.splunk.home.hdfs = /tmp/splunk
vix.yarn.resourcemanager.address = hdfs://10.x.x.x:xxxx
vix.input.1.path = /topics/firewall/...
vix.provider = XXX
Ok, so you are missing a bunch of vix.input definitions, particularly vix.input.1.et.regex and vix.input.1.lt.regex. These tell splunk how to interpret datetime from paths.
Yup, I figured that out - thanks for the heads up.
They don't get added by default when you create a new index, but you can add them via the "New Setting" link. Might be easier though to edit your indexes.conf file directly. It's probably in your /etc/apps/search/local/ directory (assuming you were in the search app when you created the vix).
As for what they need to be set to, there is a lot of detail on the page I linked to before, and there is an example here:
Briefly, "et" means "earliest time" and "lt" means latest time. Each one is extracted from the HDFS path via the regex, and interpreted via the date format. The offset is just that--it makes the et/lt more or less than what was obtained from the path by a fixed amount. BTW, another useful config is "timezone", as in:
Thanks for the insight.
I tried to change my indexes.conf accordingly below and it still searches throughout all the files. Perhaps my regex or wording is wrong. The file structure path of I'm using is /topics/foo/01-12-2017/
This is what I added to indexex.conf:
vix.input.1.et.regex = /topics/foo/(\d+)-(\d+)-(\d+)
vix.input.1.et.format = MMddyyyy
Then I ran this query: index=foo earliest=-5m | sort 0 _time
And unfortunately it still ran through all the files before finishing the search. Any ideas?