All Apps and Add-ons

Hadoop executable and classpath issues

dneth
Engager

This is my first time trying to setup Hadoop Connect, so I may be making some rookie mistakes, but I've hit two different issues that I can't seem to get around while configuring a new HDFS cluster.

The first issue looks to be some kind of classpath issue while running the hadoop command:

Could not find or load main class org.apache.hadoop.fs.FsShell.

That class is provided by one of the jars installed by Cloudera alongside the CLI and works when running the command in the terminal, so it seems to be a classpath issue. It doesn't seem like any of the python code is intentionally setting the classpath differently, but I'm not that familiar with Python so there could be some minutia being missed.

Unfortunately, I'm not seeing that failure any longer, as the second issue has presented and doesn't allow Hadoop Connect to get this far in the process...

The second issue is that Hadoop Connect can't seem to find the hadoop executable:
Unable to connect to Hadoop cluster 'hdfs://metroid/' with principa

Unable to connect to Hadoop cluster 'hdfs://metroid/' with principal 'None': Invalid HADOOP_HOME. Cannot find Hadoop command under bin directory HADOOP_HOME=' /opt/cloudera/parcels/CDH'.

I've configured HADOOP_HOME on the configuration screen to be /opt/cloudera/parcels/CDH. On the same node, these work:

14:17:35 $ ls -l  /opt/cloudera/parcels/CDH/bin/hadoop
-rwxr-xr-x 1 root root 621 Aug 30 16:02 /opt/cloudera/parcels/CDH/bin/hadoop

14:17:42 $  /opt/cloudera/parcels/CDH/bin/hadoop
Usage: hadoop [--config confdir] COMMAND

So the executable is there, with appropriate permissions and it works. Just in case the log message was misleading, I looked in the hadooputils.py file, on line 35 it pieces the path together as follows:
hadoop_cli = os.path.join(env["HADOOP_HOME"], "bin", "hadoop")
That looks correct as well, so I'm not sure what's going on. The CDH folder is actually a symlink, so just in case Python was getting confused there I tried the direct path and got the same failure.

Does anyone have a suggestion for how to solve either or (preferably) both of those?

rdagan_splunk
Splunk Employee
Splunk Employee

In the file core-site.xml what is the value for fs.defaultFS? Normally we see something like hdfs:// ip : 8020
Are you able to access HDFS from the command line? For example, are you able to run the command hadoop fs -ls hdfs:// ip:8020/users ?

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Community Content Calendar, September edition

Welcome to another insightful post from our Community Content Calendar! We're thrilled to continue bringing ...

Splunkbase Unveils New App Listing Management Public Preview

Splunkbase Unveils New App Listing Management Public PreviewWe're thrilled to announce the public preview of ...

Leveraging Automated Threat Analysis Across the Splunk Ecosystem

Are you leveraging automation to its fullest potential in your threat detection strategy?Our upcoming Security ...