All Apps and Add-ons

Splunk Analytics for Hadoop: Why is Splunk not reading current active HDFS file?

suarezry
Builder

We are running Splunk Analytics for Hadoop v6.5.1 with Hortonworks HDP v2.5.

I can search and results are returned within the timerange EXCEPT for the current file. There are no results returned if I am searching for events in the current hour. I'm not sure what the difference is. Can someone help me troubleshoot?


The file is being written to using webhdfs:
http://docs.fluentd.org/articles/out_webhdfs

There is a new file created on the hour, hdfs structure is as follows:

/syslogs/yyyy/yyyy-MM-dd_HH_datacollectorhostname.txt
eg.
/syslogs/2017/2017-01-11_16_datacollector2.txt

Here is some sample data:

hdfs@mynn1:~$ hadoop dfs -tail /syslogs/2017/2017-01-11_16_datacollector2.txt


2017-01-11T21:59:59Z    syslog.tcp  {"message":"<167>2017-01-11T21:59:59.976Z myhost.internal Vpxa: verbose vpxa[259FBB70] [Originator@6876 sub=hostdstats] Set internal stats for VM: 878 (vpxa VM id), 314 (vpxd VM id). Is FT primary? false","client_host":"10.0.0.30"}

Here are the contents of my indexes.conf:

[provider:myprovider] 
vix.command.arg.3 = $SPLUNK_HOME/bin/jars/SplunkMR-hy2.jar
vix.env.HADOOP_HOME = /usr/hdp/2.5.0.0-1245/hadoop
vix.env.HUNK_THIRDPARTY_JARS = $SPLUNK_HOME/bin/jars/thirdparty/common/avro-1.7.7.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/avro-mapred-1.7.7.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/commons-compress-1.10.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/commons-io-2.4.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/libfb303-0.9.2.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/parquet-hive-bundle-1.6.0.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/snappy-java-1.1.1.7.jar,$SPLUNK_HOME/bin/jars/thirdparty/hive_1_2/hive-exec-1.2.1.jar,$SPLUNK_HOME/bin/jars/thirdparty/hive_1_2/hive-metastore-1.2.1.jar,$SPLUNK_HOME/bin/jars/thirdparty/hive_1_2/hive-serde-1.2.1.jar
vix.env.JAVA_HOME = /usr/lib/jvm/java-8-oracle 
vix.family = hadoop vix.fs.default.name = hdfs://mynn1.internal:8020
vix.mapred.child.java.opts = -server -Xmx1024m -XX:ParallelGCThreads=4 -XX:+UseParallelGC -XX:+DisplayVMOutputToStderr 
vix.mapreduce.framework.name = yarn
vix.output.buckets.max.network.bandwidth = 0 vix.splunk.home.hdfs = /user/splunk/splunk-srch/
vix.yarn.resourcemanager.address = mynn2.internal:8050
vix.yarn.resourcemanager.scheduler.address = mynn2.internal:8030

[hdp-syslog] 
vix.input.1.et.format = yyyyMMddHH 
vix.input.1.et.regex = /syslogs/(\d+)/\d+-(\d+)-(\d+)_(\d+)_\w+\.txt
vix.input.1.et.offset = 3600
vix.input.1.lt.format = yyyyMMddHH
vix.input.1.lt.regex = /syslogs/(\d+)/\d+-(\d+)-(\d+)_(\d+)_\w+\.txt
vix.input.1.lt.offset = 3600
vix.input.1.path = /syslogs/... 
vix.provider = myprovider

Here is the contents of my props.conf

[source::/syslogs/...]
sourcetype = hadoop
priority = 100
ANNOTATE_PUNCT = false
SHOULD_LINEMERGE = false
MAX_TIMESTAMP_LOOKAHEAD = 30
TIME_PREFIX = ^ 
TIME_FORMAT = %Y-%m-%dT%H:%M:%SZ 
TZ=UTC
0 Karma
1 Solution

kschon_splunk
Splunk Employee
Splunk Employee

Did you mean for vix.input.1.et.offset and vix.input.1.lt.offset to be equal? I'm guessing vix.input.1.et.offset should be "0". It's possible that the VIX is interpreting each split as only having events for the minute "on the hour", and for any query that does not include such a minute, it's rejecting all splits. For queries that span more than an hour, it will read each split, and correctly interpret the timestamp for each event.

View solution in original post

kschon_splunk
Splunk Employee
Splunk Employee

Did you mean for vix.input.1.et.offset and vix.input.1.lt.offset to be equal? I'm guessing vix.input.1.et.offset should be "0". It's possible that the VIX is interpreting each split as only having events for the minute "on the hour", and for any query that does not include such a minute, it's rejecting all splits. For queries that span more than an hour, it will read each split, and correctly interpret the timestamp for each event.

suarezry
Builder

For some reason I was thinking the et.offset is subtracted from the earliest time. Thanks, that was the fix!

0 Karma

kschon_splunk
Splunk Employee
Splunk Employee

Glad it worked!

0 Karma

suarezry
Builder

I shutdown the hosts writing to hdfs suspecting the files are being locked somehow. The problem still persists.

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...