I follow the instructions in [the documentation for archiving to S3 in 6.5.0 http://docs.splunk.com/Documentation/Splunk/6.5.0/Indexer/ArchivingSplunkindexestoS3
but Splunk still can't find the jars it wants. How to I properly configure the jars for searching S3 archived buckets?
I ran the | archivebuckets command and it worked fine and archived the buckets, but the search errors out saying it can't find the jars:
[HadoopProvider] Error in 'ExternalResultProvider': Hadoop CLI may not be set correctly. Please check HADOOP_HOME and Default Filesystem in the provider settings for this virtual index. Running /opt/hadoop/bin/hadoop fs -stat s3a://bucketname/prefix/ should return successfully, rc=255, error=-stat: Fatal internal error java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
I ran the command that I wanted and I could only get it to work if I provide the -libjars option.
$ /opt/hadoop/bin/hadoop fs -libjars $HADOOP_TOOLS/hadoop-aws-2.7.2.jar,$HADOOP_TOOLS/aws-java-sdk-1.7.4.jar,$HADOOP_TOOLS/jackson-databind-2.2.3.jar,$HADOOP_TOOLS/jackson-core-2.2.3.jar,$HADOOP_TOOLS/jackson-annotations-2.2.3.jar -Dfs.s3a.access.key=value -Dfs.s3a.secret.key=value -stat s3a://bucketname/prefix/
1970-01-01 00:00:00
$ export HADOOP_CLASSPATH=$HADOOP_TOOLS/hadoop-aws-2.7.2.jar,$HADOOP_TOOLS/aws-java-sdk-1.7.4.jar,$HADOOP_TOOLS/jackson-databind-2.2.3.jar,$HADOOP_TOOLS/jackson-core-2.2.3.jar,$HADOOP_TOOLS/jackson-annotations-2.2.3.jar
$ /opt/hadoop/bin/hadoop classpath
/opt/hadoop/etc/hadoop:/opt/hadoop/share/hadoop/common/lib/*:/opt/hadoop/share/hadoop/common/*:/opt/hadoop/share/hadoop/hdfs:/opt/hadoop/share/hadoop/hdfs/lib/*:/opt/hadoop/share/hadoop/hdfs/*:/opt/hadoop/share/hadoop/yarn/lib/*:/opt/hadoop/share/hadoop/yarn/*:/opt/hadoop/share/hadoop/mapreduce/lib/*:/opt/hadoop/share/hadoop/mapreduce/*:/opt/hadoop/share/hadoop/tools/lib/hadoop-aws-2.7.2.jar,/opt/hadoop/share/hadoop/tools/lib/aws-java-sdk-1.7.4.jar,/opt/hadoop/share/hadoop/tools/lib/jackson-databind-2.2.3.jar,/opt/hadoop/share/hadoop/tools/lib/jackson-core-2.2.3.jar,/opt/hadoop/share/hadoop/tools/lib/jackson-annotations-2.2.3.jar:/opt/hadoop/contrib/capacity-scheduler/*.jar
$ /opt/hadoop/bin/hadoop fs -Dfs.s3a.access.key=value -Dfs.s3a.secret.key=value -stat s3a://bucketname/prefix/
-stat: Fatal internal error
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
... 16 more
Here is my provider configuration:
[provider:HadoopProvider]
vix.family = hadoop
vix.splunk.setup.package = /opt/splunk_package.tgz
vix.env.JAVA_HOME = /usr/lib/jvm/java-7-openjdk-amd64
vix.env.HADOOP_HOME = /opt/hadoop
vix.env.HADOOP_TOOLS = /opt/hadoop/share/hadoop/tools/lib
vix.splunk.home.datanode = /opt/splunk
vix.splunk.home.hdfs = /working-dir
vix.splunk.jars = $HADOOP_TOOLS/hadoop-aws-2.7.2.jar,$HADOOP_TOOLS/aws-java-sdk-1.7.4.jar,$HADOOP_TOOLS/jackson-databind-2.2.3.jar,$HADOOP_TOOLS/jackson-core-2.2.3.jar,$HADOOP_TOOLS/jackson-annotations-2.2.3.jar
vix.mapreduce.framework.name = yarn
vix.yarn.resourcemanager.address = <%= ENV['HADOOP_MASTER'] %>:8032
vix.yarn.resourcemanager.scheduler.address = <%= ENV['HADOOP_MASTER'] %>:8030
vix.fs.s3a.access.key = <%= ENV['S3_ARCHIVE_ACCESS_KEY'] %>
vix.fs.s3a.secret.key = <%= ENV['S3_ARCHIVE_SECRET_KEY'] %>
vix.fs.default.name = s3a://<%= ENV['SPLUNK_HADOOP_BUCKET'] %>/prefix
[main_archive]
vix.provider = HadoopProvider
vix.output.buckets.from.indexes = main
vix.output.buckets.older.than = 1
vix.output.buckets.path = s3a://<%= ENV['SPLUNK_HADOOP_BUCKET'] %>/prefix
I'm running against a vanilla apache hadoop tarball, version 2.7.2. I'm not sure which commands are trying to run against the hadoop cluster, but I'm working against an AWS EMR cluster of the same hadoop version.
http://docs.splunk.com/Documentation/Splunk/6.5.0/Indexer/ArchivingSplunkindexestoS3
... View more