All Apps and Add-ons

Hadoop executable and classpath issues

dneth
Engager

This is my first time trying to setup Hadoop Connect, so I may be making some rookie mistakes, but I've hit two different issues that I can't seem to get around while configuring a new HDFS cluster.

The first issue looks to be some kind of classpath issue while running the hadoop command:

Could not find or load main class org.apache.hadoop.fs.FsShell.

That class is provided by one of the jars installed by Cloudera alongside the CLI and works when running the command in the terminal, so it seems to be a classpath issue. It doesn't seem like any of the python code is intentionally setting the classpath differently, but I'm not that familiar with Python so there could be some minutia being missed.

Unfortunately, I'm not seeing that failure any longer, as the second issue has presented and doesn't allow Hadoop Connect to get this far in the process...

The second issue is that Hadoop Connect can't seem to find the hadoop executable:
Unable to connect to Hadoop cluster 'hdfs://metroid/' with principa

Unable to connect to Hadoop cluster 'hdfs://metroid/' with principal 'None': Invalid HADOOP_HOME. Cannot find Hadoop command under bin directory HADOOP_HOME=' /opt/cloudera/parcels/CDH'.

I've configured HADOOP_HOME on the configuration screen to be /opt/cloudera/parcels/CDH. On the same node, these work:

14:17:35 $ ls -l  /opt/cloudera/parcels/CDH/bin/hadoop
-rwxr-xr-x 1 root root 621 Aug 30 16:02 /opt/cloudera/parcels/CDH/bin/hadoop

14:17:42 $  /opt/cloudera/parcels/CDH/bin/hadoop
Usage: hadoop [--config confdir] COMMAND

So the executable is there, with appropriate permissions and it works. Just in case the log message was misleading, I looked in the hadooputils.py file, on line 35 it pieces the path together as follows:
hadoop_cli = os.path.join(env["HADOOP_HOME"], "bin", "hadoop")
That looks correct as well, so I'm not sure what's going on. The CDH folder is actually a symlink, so just in case Python was getting confused there I tried the direct path and got the same failure.

Does anyone have a suggestion for how to solve either or (preferably) both of those?

rdagan_splunk
Splunk Employee
Splunk Employee

In the file core-site.xml what is the value for fs.defaultFS? Normally we see something like hdfs:// ip : 8020
Are you able to access HDFS from the command line? For example, are you able to run the command hadoop fs -ls hdfs:// ip:8020/users ?

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Event Series: Telemetry Pipeline Management

Balancing Scale and Spend: Gaining Control Over High-Volume Metrics in Splunk Observability Cloud As ...

Kick the Tires Before You Commit: A Hands-On Tour of the Splunk Observability Cloud ...

Evaluating an enterprise observability platform usually goes like this: fill out a form, get a free trial with ...

Deep insights, no barriers: Splunk Observability Cloud Free Edition

As software delivery cycles continue to accelerate, observability shouldn’t be a luxury — it should be a ...