- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Splunk Analytics for Hadoop: How do you make a connection between Splunk and Hadoop?
Im trying to send splunk-index data to Hadoop using Hadoop Data roll.
However Im not able to establish connection between splunk and Hadoop at all...I get below error on my splunk indexer
bash-4.1$ /opt/hadoop-2.6.0-cdh5.9.1/bin/hdfs dfs -ls hdfs://hadoopnamenode.company.com:8020/ /user/splunkdevuser/
17/12/01 08:38:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
ls: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]
Warning: fs.defaultFS is not set when running "ls" command.
ls: `/user/splunkd1/': No such file or directory
Can someone kindly help
Below is my indexes.conf
[hadoopidx]
coldPath = $SPLUNK_DB/hadoopidx/colddb
enableDataIntegrityControl = 0
enableTsidxReduction = 0
homePath = $SPLUNK_DB/hadoopidx/db
maxTotalDataSizeMB = 20480
thawedPath = $SPLUNK_DB/hadoopidx/thaweddb
[provider:eihadoop]
vix.command.arg.3 = $SPLUNK_HOME/bin/jars/SplunkMR-hy2.jar
vix.dfs.namenode.kerberos.principal = hdfs/_HOST@HADOOP.company.COM
vix.env.HADOOP_HOME = /user/splunkdev
vix.env.HUNK_THIRDPARTY_JARS = $SPLUNK_HOME/bin/jars/thirdparty/common/avro-1.7.7.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/avro-mapred-1.7.7.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/commons-compress-1.10.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/commons-io-2.4.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/libfb303-0.9.2.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/parquet-hive-bundle-1.6.0.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/snappy-java-1.1.1.7.jar,$SPLUNK_HOME/bin/jars/thirdparty/hive_1_2/hive-exec-1.2.1.jar,$SPLUNK_HOME/bin/jars/thirdparty/hive_1_2/hive-metastore-1.2.1.jar,$SPLUNK_HOME/bin/jars/thirdparty/hive_1_2/hive-serde-1.2.1.jar
vix.env.JAVA_HOME = /usr/java/jdk1.8.0_102
vix.family = hadoop
vix.fs.default.name = hdfs://SLPP02.HADOOP.company.COM:8020
vix.hadoop.security.authentication = kerberos
vix.hadoop.security.authorization = 1
vix.javaprops.java.security.krb5.kdc = SLP013.HADOOP.company.COM
vix.javaprops.java.security.krb5.realm = HADOOP.company.COM
vix.mapreduce.framework.name = yarn
vix.output.buckets.max.network.bandwidth = 0
vix.splunk.home.hdfs = /user/splunkdev/hadoopanalytics/
vix.yarn.nodemanager.principal = yarn/_HOST@HADOOP.company.COM
vix.yarn.resourcemanager.address = https://SLPP08.HADOOP.company.COM:8090/cluster
vix.yarn.resourcemanager.principal = yarn/_HOST@HADOOP.company.COM
vix.yarn.resourcemanager.scheduler.address = https://SLPP015.HADOOP.company.COM:8090/cluster/scheduler
vix.mapreduce.jobtracker.kerberos.principal = mapred/_HOST@HADOOP.company.COM
vix.kerberos.keytab = /export/home/splunkdev/splunkdev.keytab
vix.kerberos.principal = splunkdev@TSS.company.COM
[splunk_index_archive]
vix.output.buckets.from.indexes = hadoopidx
vix.output.buckets.older.than = 172800
vix.output.buckets.path = /user/splunkdev/splunk_index_archive
vix.provider = eihadoop
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I have noticed that you may have some configuration issues:
vix.env.HADOOP_HOME = /user/splunkdev
It should be set to /opt/hadoop-2.6.0-cdh5.9.1
vix.yarn.resourcemanager.scheduler.address
It should not have https in the begining and the port is normally set to 8030
vix.yarn.resourcemanager.address
It should not have https in the begining and the port is normally set to 8032 or 8050
Instead of this:
/opt/hadoop-2.6.0-cdh5.9.1/bin/hdfs dfs -ls hdfs://SLPP02.HADOOP.company.com:8020/ /user/splunkdevuser
try this:
/opt/hadoop-2.6.0-cdh5.9.1/bin/hdfs dfs -ls hdfs://SLPP02.HADOOP.company.com:8020/user/splunkdevuser
Splunk is a normal Hadoop client, so at least validating all of your configurations using the normal hadoop conf files is recommended.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @rdagan ,
Thankyou for pointing out my mistakes...
I did make few errors..we have a HA CDH Cluster so I modified my below configs:
[provider:eihadoop]
vix.command.arg.3 = $SPLUNK_HOME/bin/jars/SplunkMR-hy2.jar
vix.dfs.namenode.kerberos.principal = hdfs/_HOST@HADOOP.company.COM
vix.env.HADOOP_HOME = /user/splunkdev
vix.env.HUNK_THIRDPARTY_JARS = $SPLUNK_HOME/bin/jars/thirdparty/common/avro-1.7.7.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/avro-mapred-1.7.7.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/commons-compress-1.10.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/commons-io-2.4.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/libfb303-0.9.2.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/parquet-hive-bundle-1.6.0.jar,$SPLUNK_HOME/bin/jars/thirdparty/common/snappy-java-1.1.1.7.jar,$SPLUNK_HOME/bin/jars/thirdparty/hive_1_2/hive-exec-1.2.1.jar,$SPLUNK_HOME/bin/jars/thirdparty/hive_1_2/hive-metastore-1.2.1.jar,$SPLUNK_HOME/bin/jars/thirdparty/hive_1_2/hive-serde-1.2.1.jar
vix.env.JAVA_HOME = /usr/java/jdk1.8.0_102
vix.family = hadoop
vix.fs.default.name = hdfs://nameservice1
vix.yarn.resourcemanager.ha.rm-ids = rm570,rm780
vix.yarn.resourcemanager.address.rm570 = A.company.com:8032
vix.yarn.resourcemanager.address.rm780 = B.company.com:8032
vix.yarn.resourcemanager.scheduler.address.rm570 = A.company.com:8030
vix.yarn.resourcemanager.scheduler.address.rm780 = B.company.com:8030
vix.yarn.resourcemanager.ha.enabled = true
vix.yarn.resourcemanager.cluster-id = yarnRM
vix.yarn.application.classpath = $HADOOP_CLIENT_CONF_DIR,$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/,$HADOOP_COMMON_HOME/lib/,$HADOOP_HDFS_HOME/,$HADOOP_HDFS_HOME/lib/,$HADOOP_YARN_HOME/,$HADOOP_YARN_HOME/lib/
vix.hadoop.security.authentication = kerberos
vix.hadoop.security.authorization = 1
vix.javaprops.java.security.krb5.kdc = SLP013.HADOOP.company.COM
vix.javaprops.java.security.krb5.realm = HADOOP.company.COM
vix.mapreduce.framework.name = yarn
vix.output.buckets.max.network.bandwidth = 0
vix.splunk.home.hdfs = /user/splunkdev/hadoopanalytics/
vix.yarn.nodemanager.principal = yarn/_HOST@HADOOP.company.COM
vix.yarn.resourcemanager.principal = yarn/_HOST@HADOOP.company.COM
vix.mapreduce.jobtracker.kerberos.principal = mapred/_HOST@HADOOP.company.COM
vix.kerberos.keytab = /export/home/splunkdev/splunkdev.keytab
vix.kerberos.principal = splunkdev@TSS.company.COM
However Im unable to run hadoop commands on my splunk host:
bash-4.1$ hdfs dfs -ls /
17/12/07 03:04:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.lang.ExceptionInInitializerError
...................................................................................
Caused by: java.lang.RuntimeException: Bailing out since native library couldn't be loaded
at org.apache.hadoop.security.JniBasedUnixGroupsMapping.
... 30 more
Since I quite new to cloudera , Kindly clarify my below doubts:
1. I copied all config files manually into the splunk server.
Since splunk is a normal-Hadoop client , does that mean the splunk libraries needs to installed on the Edge-Node that the CDH cluster has?
Do I need to get Cloudera Client parcels on teh splunk host? If yes, how Do I get them into the splunk server? Does it required to be added in the cloudera manager?
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Yes, Normally people install Hadoop Client Binary and Hadoop Client Configs on the Splunk Search Head.
In many cases you can use Cloudera Manager, or you can just install Hadoop using these simple steps: http://hadoop.apache.org/docs/r2.7.4/
In your configurations I see that the Hadoop Home seems wrong. vix.env.HADOOP_HOME = /user/splunkdev looks like you are pointing to Splunk binaries not Hadoop Binaries.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

It looks like you need to fix the reference to the name node location.
In the Provider I see: hdfs://SLPP02.HADOOP.company.COM:8020
But in your test I see: hdfs://hadoopnamenode.company.com:8020
In both cases they need to be the same
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry thats a typo...
Yes I used the same.
/opt/hadoop-2.6.0-cdh5.9.1/bin/hdfs dfs -ls hdfs://SLPP02.HADOOP.company.com:8020/ /user/splunkdevuser/
Should I update/add any Hadoop configuration files (core-site.xml etc..) on the splunk servers as well?
