All Apps and Add-ons
Highlighted

Splunk Analytics for Hadoop: Why is Splunk not reading current active HDFS file?

Builder

We are running Splunk Analytics for Hadoop v6.5.1 with Hortonworks HDP v2.5.

I can search and results are returned within the timerange EXCEPT for the current file. There are no results returned if I am searching for events in the current hour. I'm not sure what the difference is. Can someone help me troubleshoot?


The file is being written to using webhdfs:
http://docs.fluentd.org/articles/out_webhdfs

There is a new file created on the hour, hdfs structure is as follows:

/syslogs/yyyy/yyyy-MM-dd_HH_datacollectorhostname.txt
eg.
/syslogs/2017/2017-01-11_16_datacollector2.txt

Here is some sample data:

hdfs@mynn1:~$ hadoop dfs -tail /syslogs/2017/2017-01-11_16_datacollector2.txt


2017-01-11T21:59:59Z    syslog.tcp  {"message":"<167>2017-01-11T21:59:59.976Z myhost.internal Vpxa: verbose vpxa[259FBB70] [Originator@6876 sub=hostdstats] Set internal stats for VM: 878 (vpxa VM id), 314 (vpxd VM id). Is FT primary? false","client_host":"10.0.0.30"}

Here are the contents of my indexes.conf:

[provider:myprovider] 
vix.command.arg.3 = $SPLUNK_HOME/bin/jars/SplunkMR-hy2.jar
vix.env.HADOOP_HOME = /usr/hdp/2.5.0.0-1245/hadoop
vix.env.HUNK_THIRDPARTY_JARS = $SPLUNK_HOME/bin/jars/thirdparty/common/avro-1.7.7.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/avro-mapred-1.7.7.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/commons-compress-1.10.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/commons-io-2.4.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/libfb303-0.9.2.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/parquet-hive-bundle-1.6.0.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/snappy-java-1.1.1.7.jar,$SPLUNK_HOME/bin/jars/thirdparty/hive_1_2/hive-exec-1.2.1.jar,$SPLUNK_HOME/bin/jars/thirdparty/hive_1_2/hive-metastore-1.2.1.jar,$SPLUNK_HOME/bin/jars/thirdparty/hive_1_2/hive-serde-1.2.1.jar
vix.env.JAVA_HOME = /usr/lib/jvm/java-8-oracle 
vix.family = hadoop vix.fs.default.name = hdfs://mynn1.internal:8020
vix.mapred.child.java.opts = -server -Xmx1024m -XX:ParallelGCThreads=4 -XX:+UseParallelGC -XX:+DisplayVMOutputToStderr 
vix.mapreduce.framework.name = yarn
vix.output.buckets.max.network.bandwidth = 0 vix.splunk.home.hdfs = /user/splunk/splunk-srch/
vix.yarn.resourcemanager.address = mynn2.internal:8050
vix.yarn.resourcemanager.scheduler.address = mynn2.internal:8030

[hdp-syslog] 
vix.input.1.et.format = yyyyMMddHH 
vix.input.1.et.regex = /syslogs/(\d+)/\d+-(\d+)-(\d+)_(\d+)_\w+\.txt
vix.input.1.et.offset = 3600
vix.input.1.lt.format = yyyyMMddHH
vix.input.1.lt.regex = /syslogs/(\d+)/\d+-(\d+)-(\d+)_(\d+)_\w+\.txt
vix.input.1.lt.offset = 3600
vix.input.1.path = /syslogs/... 
vix.provider = myprovider

Here is the contents of my props.conf

[source::/syslogs/...]
sourcetype = hadoop
priority = 100
ANNOTATE_PUNCT = false
SHOULD_LINEMERGE = false
MAX_TIMESTAMP_LOOKAHEAD = 30
TIME_PREFIX = ^ 
TIME_FORMAT = %Y-%m-%dT%H:%M:%SZ 
TZ=UTC
0 Karma
Highlighted

Re: Splunk Analytics for Hadoop: Why is Splunk not reading current active HDFS file?

Builder

I shutdown the hosts writing to hdfs suspecting the files are being locked somehow. The problem still persists.

0 Karma
Highlighted

Re: Splunk Analytics for Hadoop: Why is Splunk not reading current active HDFS file?

Splunk Employee
Splunk Employee

Did you mean for vix.input.1.et.offset and vix.input.1.lt.offset to be equal? I'm guessing vix.input.1.et.offset should be "0". It's possible that the VIX is interpreting each split as only having events for the minute "on the hour", and for any query that does not include such a minute, it's rejecting all splits. For queries that span more than an hour, it will read each split, and correctly interpret the timestamp for each event.

View solution in original post

Highlighted

Re: Splunk Analytics for Hadoop: Why is Splunk not reading current active HDFS file?

Builder

For some reason I was thinking the et.offset is subtracted from the earliest time. Thanks, that was the fix!

0 Karma
Highlighted

Re: Splunk Analytics for Hadoop: Why is Splunk not reading current active HDFS file?

Splunk Employee
Splunk Employee

Glad it worked!

0 Karma