All Apps and Add-ons

Hadoop executable and classpath issues

dneth
Engager

This is my first time trying to setup Hadoop Connect, so I may be making some rookie mistakes, but I've hit two different issues that I can't seem to get around while configuring a new HDFS cluster.

The first issue looks to be some kind of classpath issue while running the hadoop command:

Could not find or load main class org.apache.hadoop.fs.FsShell.

That class is provided by one of the jars installed by Cloudera alongside the CLI and works when running the command in the terminal, so it seems to be a classpath issue. It doesn't seem like any of the python code is intentionally setting the classpath differently, but I'm not that familiar with Python so there could be some minutia being missed.

Unfortunately, I'm not seeing that failure any longer, as the second issue has presented and doesn't allow Hadoop Connect to get this far in the process...

The second issue is that Hadoop Connect can't seem to find the hadoop executable:
Unable to connect to Hadoop cluster 'hdfs://metroid/' with principa

Unable to connect to Hadoop cluster 'hdfs://metroid/' with principal 'None': Invalid HADOOP_HOME. Cannot find Hadoop command under bin directory HADOOP_HOME=' /opt/cloudera/parcels/CDH'.

I've configured HADOOP_HOME on the configuration screen to be /opt/cloudera/parcels/CDH. On the same node, these work:

14:17:35 $ ls -l  /opt/cloudera/parcels/CDH/bin/hadoop
-rwxr-xr-x 1 root root 621 Aug 30 16:02 /opt/cloudera/parcels/CDH/bin/hadoop

14:17:42 $  /opt/cloudera/parcels/CDH/bin/hadoop
Usage: hadoop [--config confdir] COMMAND

So the executable is there, with appropriate permissions and it works. Just in case the log message was misleading, I looked in the hadooputils.py file, on line 35 it pieces the path together as follows:
hadoop_cli = os.path.join(env["HADOOP_HOME"], "bin", "hadoop")
That looks correct as well, so I'm not sure what's going on. The CDH folder is actually a symlink, so just in case Python was getting confused there I tried the direct path and got the same failure.

Does anyone have a suggestion for how to solve either or (preferably) both of those?

rdagan_splunk
Splunk Employee
Splunk Employee

In the file core-site.xml what is the value for fs.defaultFS? Normally we see something like hdfs:// ip : 8020
Are you able to access HDFS from the command line? For example, are you able to run the command hadoop fs -ls hdfs:// ip:8020/users ?

0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...