- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

We are running Splunk Analytics for Hadoop v6.5.1 with Hortonworks HDP v2.5.
I can search and results are returned within the timerange EXCEPT for the current file. There are no results returned if I am searching for events in the current hour. I'm not sure what the difference is. Can someone help me troubleshoot?
The file is being written to using webhdfs:
http://docs.fluentd.org/articles/out_webhdfs
There is a new file created on the hour, hdfs structure is as follows:
/syslogs/yyyy/yyyy-MM-dd_HH_datacollectorhostname.txt
eg.
/syslogs/2017/2017-01-11_16_datacollector2.txt
Here is some sample data:
hdfs@mynn1:~$ hadoop dfs -tail /syslogs/2017/2017-01-11_16_datacollector2.txt
2017-01-11T21:59:59Z syslog.tcp {"message":"<167>2017-01-11T21:59:59.976Z myhost.internal Vpxa: verbose vpxa[259FBB70] [Originator@6876 sub=hostdstats] Set internal stats for VM: 878 (vpxa VM id), 314 (vpxd VM id). Is FT primary? false","client_host":"10.0.0.30"}
Here are the contents of my indexes.conf:
[provider:myprovider]
vix.command.arg.3 = $SPLUNK_HOME/bin/jars/SplunkMR-hy2.jar
vix.env.HADOOP_HOME = /usr/hdp/2.5.0.0-1245/hadoop
vix.env.HUNK_THIRDPARTY_JARS = $SPLUNK_HOME/bin/jars/thirdparty/common/avro-1.7.7.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/avro-mapred-1.7.7.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/commons-compress-1.10.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/commons-io-2.4.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/libfb303-0.9.2.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/parquet-hive-bundle-1.6.0.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/snappy-java-1.1.1.7.jar,$SPLUNK_HOME/bin/jars/thirdparty/hive_1_2/hive-exec-1.2.1.jar,$SPLUNK_HOME/bin/jars/thirdparty/hive_1_2/hive-metastore-1.2.1.jar,$SPLUNK_HOME/bin/jars/thirdparty/hive_1_2/hive-serde-1.2.1.jar
vix.env.JAVA_HOME = /usr/lib/jvm/java-8-oracle
vix.family = hadoop vix.fs.default.name = hdfs://mynn1.internal:8020
vix.mapred.child.java.opts = -server -Xmx1024m -XX:ParallelGCThreads=4 -XX:+UseParallelGC -XX:+DisplayVMOutputToStderr
vix.mapreduce.framework.name = yarn
vix.output.buckets.max.network.bandwidth = 0 vix.splunk.home.hdfs = /user/splunk/splunk-srch/
vix.yarn.resourcemanager.address = mynn2.internal:8050
vix.yarn.resourcemanager.scheduler.address = mynn2.internal:8030
[hdp-syslog]
vix.input.1.et.format = yyyyMMddHH
vix.input.1.et.regex = /syslogs/(\d+)/\d+-(\d+)-(\d+)_(\d+)_\w+\.txt
vix.input.1.et.offset = 3600
vix.input.1.lt.format = yyyyMMddHH
vix.input.1.lt.regex = /syslogs/(\d+)/\d+-(\d+)-(\d+)_(\d+)_\w+\.txt
vix.input.1.lt.offset = 3600
vix.input.1.path = /syslogs/...
vix.provider = myprovider
Here is the contents of my props.conf
[source::/syslogs/...]
sourcetype = hadoop
priority = 100
ANNOTATE_PUNCT = false
SHOULD_LINEMERGE = false
MAX_TIMESTAMP_LOOKAHEAD = 30
TIME_PREFIX = ^
TIME_FORMAT = %Y-%m-%dT%H:%M:%SZ
TZ=UTC
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Did you mean for vix.input.1.et.offset and vix.input.1.lt.offset to be equal? I'm guessing vix.input.1.et.offset should be "0". It's possible that the VIX is interpreting each split as only having events for the minute "on the hour", and for any query that does not include such a minute, it's rejecting all splits. For queries that span more than an hour, it will read each split, and correctly interpret the timestamp for each event.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Did you mean for vix.input.1.et.offset and vix.input.1.lt.offset to be equal? I'm guessing vix.input.1.et.offset should be "0". It's possible that the VIX is interpreting each split as only having events for the minute "on the hour", and for any query that does not include such a minute, it's rejecting all splits. For queries that span more than an hour, it will read each split, and correctly interpret the timestamp for each event.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

For some reason I was thinking the et.offset is subtracted from the earliest time. Thanks, that was the fix!
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Glad it worked!
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I shutdown the hosts writing to hdfs suspecting the files are being locked somehow. The problem still persists.
