I follow the instructions in [the documentation for archiving to S3 in 6.5.0 http://docs.splunk.com/Documentation/Splunk/6.5.0/Indexer/ArchivingSplunkindexestoS3
but Splunk still can't find the jars it wants. How to I properly configure the jars for searching S3 archived buckets?
I ran the | archivebuckets
command and it worked fine and archived the buckets, but the search errors out saying it can't find the jars:
[HadoopProvider] Error in 'ExternalResultProvider': Hadoop CLI may not be set correctly. Please check HADOOP_HOME and Default Filesystem in the provider settings for this virtual index. Running /opt/hadoop/bin/hadoop fs -stat s3a://bucketname/prefix/ should return successfully, rc=255, error=-stat: Fatal internal error java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
I ran the command that I wanted and I could only get it to work if I provide the -libjars option.
$ /opt/hadoop/bin/hadoop fs -libjars $HADOOP_TOOLS/hadoop-aws-2.7.2.jar,$HADOOP_TOOLS/aws-java-sdk-1.7.4.jar,$HADOOP_TOOLS/jackson-databind-2.2.3.jar,$HADOOP_TOOLS/jackson-core-2.2.3.jar,$HADOOP_TOOLS/jackson-annotations-2.2.3.jar -Dfs.s3a.access.key=value -Dfs.s3a.secret.key=value -stat s3a://bucketname/prefix/
1970-01-01 00:00:00
$ export HADOOP_CLASSPATH=$HADOOP_TOOLS/hadoop-aws-2.7.2.jar,$HADOOP_TOOLS/aws-java-sdk-1.7.4.jar,$HADOOP_TOOLS/jackson-databind-2.2.3.jar,$HADOOP_TOOLS/jackson-core-2.2.3.jar,$HADOOP_TOOLS/jackson-annotations-2.2.3.jar
$ /opt/hadoop/bin/hadoop classpath
/opt/hadoop/etc/hadoop:/opt/hadoop/share/hadoop/common/lib/*:/opt/hadoop/share/hadoop/common/*:/opt/hadoop/share/hadoop/hdfs:/opt/hadoop/share/hadoop/hdfs/lib/*:/opt/hadoop/share/hadoop/hdfs/*:/opt/hadoop/share/hadoop/yarn/lib/*:/opt/hadoop/share/hadoop/yarn/*:/opt/hadoop/share/hadoop/mapreduce/lib/*:/opt/hadoop/share/hadoop/mapreduce/*:/opt/hadoop/share/hadoop/tools/lib/hadoop-aws-2.7.2.jar,/opt/hadoop/share/hadoop/tools/lib/aws-java-sdk-1.7.4.jar,/opt/hadoop/share/hadoop/tools/lib/jackson-databind-2.2.3.jar,/opt/hadoop/share/hadoop/tools/lib/jackson-core-2.2.3.jar,/opt/hadoop/share/hadoop/tools/lib/jackson-annotations-2.2.3.jar:/opt/hadoop/contrib/capacity-scheduler/*.jar
$ /opt/hadoop/bin/hadoop fs -Dfs.s3a.access.key=value -Dfs.s3a.secret.key=value -stat s3a://bucketname/prefix/
-stat: Fatal internal error
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
... 16 more
Here is my provider configuration:
[provider:HadoopProvider]
vix.family = hadoop
vix.splunk.setup.package = /opt/splunk_package.tgz
vix.env.JAVA_HOME = /usr/lib/jvm/java-7-openjdk-amd64
vix.env.HADOOP_HOME = /opt/hadoop
vix.env.HADOOP_TOOLS = /opt/hadoop/share/hadoop/tools/lib
vix.splunk.home.datanode = /opt/splunk
vix.splunk.home.hdfs = /working-dir
vix.splunk.jars = $HADOOP_TOOLS/hadoop-aws-2.7.2.jar,$HADOOP_TOOLS/aws-java-sdk-1.7.4.jar,$HADOOP_TOOLS/jackson-databind-2.2.3.jar,$HADOOP_TOOLS/jackson-core-2.2.3.jar,$HADOOP_TOOLS/jackson-annotations-2.2.3.jar
vix.mapreduce.framework.name = yarn
vix.yarn.resourcemanager.address = <%= ENV['HADOOP_MASTER'] %>:8032
vix.yarn.resourcemanager.scheduler.address = <%= ENV['HADOOP_MASTER'] %>:8030
vix.fs.s3a.access.key = <%= ENV['S3_ARCHIVE_ACCESS_KEY'] %>
vix.fs.s3a.secret.key = <%= ENV['S3_ARCHIVE_SECRET_KEY'] %>
vix.fs.default.name = s3a://<%= ENV['SPLUNK_HADOOP_BUCKET'] %>/prefix
[main_archive]
vix.provider = HadoopProvider
vix.output.buckets.from.indexes = main
vix.output.buckets.older.than = 1
vix.output.buckets.path = s3a://<%= ENV['SPLUNK_HADOOP_BUCKET'] %>/prefix
I'm running against a vanilla apache hadoop tarball, version 2.7.2. I'm not sure which commands are trying to run against the hadoop cluster, but I'm working against an AWS EMR cluster of the same hadoop version.
http://docs.splunk.com/Documentation/Splunk/6.5.0/Indexer/ArchivingSplunkindexestoS3
Are you sure the Name Node flag vix.fs.default.name is correct?
Normally you will see vix.fs.default.name = hdfs://" master-private-ip " :8020
Are you sure the Name Node flag vix.fs.default.name is correct?
Normally you will see vix.fs.default.name = hdfs://" master-private-ip " :8020
This splunk blog post indicated that I could use S3 as the default FS, but switching to HDFS did solve the problem.
http://blogs.splunk.com/2013/11/13/analyze-data-with-hunk-on-amazon-emr/
For anybody that comes looking, I also had to add the following to my provider configs to get splunk to use the Hadoop 2 compatible splunkMR jars:
vix.command.arg.3 = $SPLUNK_HOME/bin/jars/SplunkMR-hy2.jar
vix.splunk.impersonation = 0
I fixed the CLASSPATH
to be colon separated and it works fine now. So the command that the search error says should work does work, but the search still doesn't.
$ export HADOOP_CLASSPATH=$HADOOP_TOOLS/hadoop-aws-2.7.2.jar:$HADOOP_TOOLS/aws-java-sdk-1.7.4.jar:$HADOOP_TOOLS/jackson-databind-2.2.3.jar:$HADOOP_TOOLS/jackson-core-2.2.3.jar:$HADOOP_TOOLS/jackson-annotations-2.2.3.jar
$ /opt/hadoop/bin/hadoop classpath
/opt/hadoop/etc/hadoop:/opt/hadoop/share/hadoop/common/lib/*:/opt/hadoop/share/hadoop/common/*:/opt/hadoop/share/hadoop/hdfs:/opt/hadoop/share/hadoop/hdfs/lib/*:/opt/hadoop/share/hadoop/hdfs/*:/opt/hadoop/share/hadoop/yarn/lib/*:/opt/hadoop/share/hadoop/yarn/*:/opt/hadoop/share/hadoop/mapreduce/lib/*:/opt/hadoop/share/hadoop/mapreduce/*:/opt/hadoop/share/hadoop/tools/lib/hadoop-aws-2.7.2.jar:/opt/hadoop/share/hadoop/tools/lib/aws-java-sdk-1.7.4.jar:/opt/hadoop/share/hadoop/tools/lib/jackson-databind-2.2.3.jar:/opt/hadoop/share/hadoop/tools/lib/jackson-core-2.2.3.jar:/opt/hadoop/share/hadoop/tools/lib/jackson-annotations-2.2.3.jar:/opt/hadoop/contrib/capacity-scheduler/*.jar
$ /opt/hadoop/bin/hadoop fs -Dfs.s3a.access.key=value -Dfs.s3a.secret.key=value -stat s3a://bucketname/prefix/
1970-01-01 00:00:00