What are the requirements from the CDH Team to set up Hadoop data roll?
1) No need for Cloudera Manager agent on the Splunk Search Head and Indexers since they are just Hadoop clients. You will need the Hadoop binaries and Java to be distributed to all of the Splunk nodes.
Here is the link to the documentation: http://docs.splunk.com/Documentation/Splunk/latest/Indexer/ArchivingindexestoHadoop
2) Using Cloudera Manager to generate and install Hadoop client on all the Splunk servers (Indexers and Search Heads) could make your life much easier from a Hadoop management point of view. However, it is not required.
Hi @rdagan,
I did follow the same steps , installed Hadoop binaries and copied teh configs into splunk hosts.
But Im unable to run simple hadoop commands on the splunk hosts. I get the below error.
bash-4.1$ hdfs dfs -ls /
17/12/07 03:04:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.lang.ExceptionInInitializerError
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2138)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2103)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2197)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2223)
at org.apache.hadoop.security.Groups.
at org.apache.hadoop.security.Groups.
at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:420)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:284)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:261)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:806)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:776)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:649)
at org.apache.hadoop.fs.FileSystem$Cache$Key.
at org.apache.hadoop.fs.FileSystem$Cache$Key.
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2729)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:385)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:184)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:369)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:325)
at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:235)
at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:218)
at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:102)
at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:315)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:372)
Caused by: java.lang.RuntimeException: Bailing out since native library couldn't be loaded
at org.apache.hadoop.security.JniBasedUnixGroupsMapping.
... 30 more
What could be the reason?
I would recommend that for at least the Splunk Search Head, you get your Hadoop team to setup a full Hadoop Client environment. That will eliminate many configuration issues.
Once that is done, for all the indexers, you will have the knowledges of the right configurations.