All Apps and Add-ons

Splunk Analytics for Hadoop: How do you make a connection between Splunk and Hadoop?

Contributor

Im trying to send splunk-index data to Hadoop using Hadoop Data roll.

However Im not able to establish connection between splunk and Hadoop at all...I get below error on my splunk indexer

bash-4.1$ /opt/hadoop-2.6.0-cdh5.9.1/bin/hdfs dfs -ls hdfs://hadoopnamenode.company.com:8020/ /user/splunkdevuser/

17/12/01 08:38:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
ls: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]
Warning: fs.defaultFS is not set when running "ls" command.
ls: `/user/splunkd1/': No such file or directory

Can someone kindly help

Below is my indexes.conf

[hadoopidx]
coldPath = $SPLUNKDB/hadoopidx/colddb
enableDataIntegrityControl = 0
enableTsidxReduction = 0
homePath = $SPLUNK
DB/hadoopidx/db
maxTotalDataSizeMB = 20480
thawedPath = $SPLUNK_DB/hadoopidx/thaweddb

[provider:eihadoop]
vix.command.arg.3 = $SPLUNKHOME/bin/jars/SplunkMR-hy2.jar
vix.dfs.namenode.kerberos.principal = hdfs/
HOST@HADOOP.company.COM
vix.env.HADOOPHOME = /user/splunkdev
vix.env.HUNK
THIRDPARTYJARS = $SPLUNKHOME/bin/jars/thirdparty/common/avro-1.7.7.jar,$SPLUNKHOME/bin/jars/thirdparty/common/avro-mapred-1.7.7.jar,$SPLUNKHOME/bin/jars/thirdparty/common/commons-compress-1.10.jar,$SPLUNKHOME/bin/jars/thirdparty/common/commons-io-2.4.jar,$SPLUNKHOME/bin/jars/thirdparty/common/libfb303-0.9.2.jar,$SPLUNKHOME/bin/jars/thirdparty/common/parquet-hive-bundle-1.6.0.jar,$SPLUNKHOME/bin/jars/thirdparty/common/snappy-java-1.1.1.7.jar,$SPLUNKHOME/bin/jars/thirdparty/hive12/hive-exec-1.2.1.jar,$SPLUNKHOME/bin/jars/thirdparty/hive12/hive-metastore-1.2.1.jar,$SPLUNKHOME/bin/jars/thirdparty/hive12/hive-serde-1.2.1.jar
vix.env.JAVA
HOME = /usr/java/jdk1.8.0102
vix.family = hadoop
vix.fs.default.name = hdfs://SLPP02.HADOOP.company.COM:8020
vix.hadoop.security.authentication = kerberos
vix.hadoop.security.authorization = 1
vix.javaprops.java.security.krb5.kdc = SLP013.HADOOP.company.COM
vix.javaprops.java.security.krb5.realm = HADOOP.company.COM
vix.mapreduce.framework.name = yarn
vix.output.buckets.max.network.bandwidth = 0
vix.splunk.home.hdfs = /user/splunkdev/hadoopanalytics/
vix.yarn.nodemanager.principal = yarn/
HOST@HADOOP.company.COM
vix.yarn.resourcemanager.address = https://SLPP08.HADOOP.company.COM:8090/cluster
vix.yarn.resourcemanager.principal = yarn/HOST@HADOOP.company.COM
vix.yarn.resourcemanager.scheduler.address = https://SLPP015.HADOOP.company.COM:8090/cluster/scheduler
vix.mapreduce.jobtracker.kerberos.principal = mapred/
HOST@HADOOP.company.COM
vix.kerberos.keytab = /export/home/splunkdev/splunkdev.keytab
vix.kerberos.principal = splunkdev@TSS.company.COM

[splunkindexarchive]
vix.output.buckets.from.indexes = hadoopidx
vix.output.buckets.older.than = 172800
vix.output.buckets.path = /user/splunkdev/splunkindexarchive
vix.provider = eihadoop

0 Karma

Splunk Employee
Splunk Employee

I have noticed that you may have some configuration issues:
vix.env.HADOOP_HOME = /user/splunkdev
It should be set to /opt/hadoop-2.6.0-cdh5.9.1

vix.yarn.resourcemanager.scheduler.address
It should not have https in the begining and the port is normally set to 8030

vix.yarn.resourcemanager.address
It should not have https in the begining and the port is normally set to 8032 or 8050

Instead of this:
/opt/hadoop-2.6.0-cdh5.9.1/bin/hdfs dfs -ls hdfs://SLPP02.HADOOP.company.com:8020/ /user/splunkdevuser
try this:
/opt/hadoop-2.6.0-cdh5.9.1/bin/hdfs dfs -ls hdfs://SLPP02.HADOOP.company.com:8020/user/splunkdevuser

Splunk is a normal Hadoop client, so at least validating all of your configurations using the normal hadoop conf files is recommended.

0 Karma

Contributor

Hi @rdagan ,
Thankyou for pointing out my mistakes...
I did make few errors..we have a HA CDH Cluster so I modified my below configs:

[provider:eihadoop]
vix.command.arg.3 = $SPLUNKHOME/bin/jars/SplunkMR-hy2.jar
vix.dfs.namenode.kerberos.principal = hdfs/
HOST@HADOOP.company.COM
vix.env.HADOOPHOME = /user/splunkdev
vix.env.HUNK
THIRDPARTYJARS = $SPLUNKHOME/bin/jars/thirdparty/common/avro-1.7.7.jar,$SPLUNKHOME/bin/jars/thirdparty/common/avro-mapred-1.7.7.jar,$SPLUNKHOME/bin/jars/thirdparty/common/commons-compress-1.10.jar,$SPLUNKHOME/bin/jars/thirdparty/common/commons-io-2.4.jar,$SPLUNKHOME/bin/jars/thirdparty/common/libfb303-0.9.2.jar,$SPLUNKHOME/bin/jars/thirdparty/common/parquet-hive-bundle-1.6.0.jar,$SPLUNKHOME/bin/jars/thirdparty/common/snappy-java-1.1.1.7.jar,$SPLUNKHOME/bin/jars/thirdparty/hive12/hive-exec-1.2.1.jar,$SPLUNKHOME/bin/jars/thirdparty/hive12/hive-metastore-1.2.1.jar,$SPLUNKHOME/bin/jars/thirdparty/hive12/hive-serde-1.2.1.jar
vix.env.JAVA
HOME = /usr/java/jdk1.8.0102
vix.family = hadoop
vix.fs.default.name = hdfs://nameservice1
vix.yarn.resourcemanager.ha.rm-ids = rm570,rm780
vix.yarn.resourcemanager.address.rm570 = A.company.com:8032
vix.yarn.resourcemanager.address.rm780 = B.company.com:8032
vix.yarn.resourcemanager.scheduler.address.rm570 = A.company.com:8030
vix.yarn.resourcemanager.scheduler.address.rm780 = B.company.com:8030
vix.yarn.resourcemanager.ha.enabled = true
vix.yarn.resourcemanager.cluster-id = yarnRM
vix.yarn.application.classpath = $HADOOP
CLIENTCONFDIR,$HADOOPCONFDIR,$HADOOPCOMMONHOME/,$HADOOPCOMMONHOME/lib/,$HADOOPHDFSHOME/,$HADOOPHDFSHOME/lib/,$HADOOPYARNHOME/,$HADOOPYARNHOME/lib/
vix.hadoop.security.authentication = kerberos
vix.hadoop.security.authorization = 1
vix.javaprops.java.security.krb5.kdc = SLP013.HADOOP.company.COM
vix.javaprops.java.security.krb5.realm = HADOOP.company.COM
vix.mapreduce.framework.name = yarn
vix.output.buckets.max.network.bandwidth = 0
vix.splunk.home.hdfs = /user/splunkdev/hadoopanalytics/
vix.yarn.nodemanager.principal = yarn/HOST@HADOOP.company.COM
vix.yarn.resourcemanager.principal = yarn/
HOST@HADOOP.company.COM
vix.mapreduce.jobtracker.kerberos.principal = mapred/_HOST@HADOOP.company.COM
vix.kerberos.keytab = /export/home/splunkdev/splunkdev.keytab
vix.kerberos.principal = splunkdev@TSS.company.COM

However Im unable to run hadoop commands on my splunk host:

bash-4.1$ hdfs dfs -ls /
17/12/07 03:04:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.lang.ExceptionInInitializerError
...................................................................................
Caused by: java.lang.RuntimeException: Bailing out since native library couldn't be loaded
at org.apache.hadoop.security.JniBasedUnixGroupsMapping.(JniBasedUnixGroupsMapping.java:46)
... 30 more

Since I quite new to cloudera , Kindly clarify my below doubts:
1. I copied all config files manually into the splunk server.

Since splunk is a normal-Hadoop client , does that mean the splunk libraries needs to installed on the Edge-Node that the CDH cluster has?

Do I need to get Cloudera Client parcels on teh splunk host? If yes, how Do I get them into the splunk server? Does it required to be added in the cloudera manager?

0 Karma

Splunk Employee
Splunk Employee

Yes, Normally people install Hadoop Client Binary and Hadoop Client Configs on the Splunk Search Head.
In many cases you can use Cloudera Manager, or you can just install Hadoop using these simple steps: http://hadoop.apache.org/docs/r2.7.4/
In your configurations I see that the Hadoop Home seems wrong. vix.env.HADOOP_HOME = /user/splunkdev looks like you are pointing to Splunk binaries not Hadoop Binaries.

0 Karma

Splunk Employee
Splunk Employee

It looks like you need to fix the reference to the name node location.
In the Provider I see: hdfs://SLPP02.HADOOP.company.COM:8020
But in your test I see: hdfs://hadoopnamenode.company.com:8020
In both cases they need to be the same

0 Karma

Contributor

Sorry thats a typo...

Yes I used the same.

/opt/hadoop-2.6.0-cdh5.9.1/bin/hdfs dfs -ls hdfs://SLPP02.HADOOP.company.com:8020/ /user/splunkdevuser/

Should I update/add any Hadoop configuration files (core-site.xml etc..) on the splunk servers as well?

0 Karma