All Apps and Add-ons

Hadoop Connect with Apache Hadoop 1.0.3

ramonpin
New Member

I'm configuring my cluster on latest version of Hadoop Connect following the video on that application page on splunkbase : http://www.splunk.com/view/SP-CAAAHBZ

Even while the Hadoop version I'm using is the same as the one used on that video I'm getting an error when trying to save the cluster configuration.

After filling in all the cluster information I'm getting a "Failed to get remote Hadoop version (namenode=headnode, port=50070): 'Version' keyword is not found."

I'm running on CentOS 5.5.

Is there any known reason for this?

Thanks,
Ramón Pin

0 Karma

ramonpin
New Member

Hi every one. Finally this problem seems to be resolved. Our hadoop machines are not listed on our DNS, we are using /etc/hosts to asign a name to them. It seems that the aplication is issuing a DNS request for the machine name and not getting the name from /etc/hosts. All the Hadopo commands and processes use /etc/hosts normally. We have configured the Hadoop URL using headnode's IP and it register the cluster.

0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

Can you please file a support case and include a diag so we can take a look at the log files as well?

0 Karma

kosako2007
New Member

We'll do it as soon as we can. Thank you for your support.

0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

Hadoop Connect tries to find the version of the cluster by using jmx or a fallback mechanism. What does this URL return in your evinronment:

 http://[namenode-host]:50070/jmx?qry=*adoop:service=NameNode,name=NameNodeInfo
0 Karma

ramonpin
New Member

{
"beans" : [ {
"name" : "Hadoop:service=NameNode,name=NameNodeInfo",
"modelerType" : "org.apache.hadoop.hdfs.server.namenode.FSNamesystem",
"Threads" : 27,
"HostName" : "headnode",
"Used" : 362116714496,
"Version" : "1.0.3, r1335192",
"Total" : 570697924608,
"UpgradeFinalized" : true,
"Free" : 175745486848,
"Safemode" : "",
"NonDfsUsedSpace" : 32835723264,
"PercentUsed" : 63.451557,
"PercentRemaining" : 30.794834,
"TotalBlocks" : 4495,
"TotalFiles" : 7345,
...}

I cut the result to fit the comment's size.

0 Karma

kosako2007
New Member

I tried with nc from Splunk's machine:

$ nc -vz headnode 50070
Connection to headnode 50070 port [tcp/*] succeeded!

Also tried hdfs access as described on the video-tutorial:

$ /hadoop-1.0.3/bin/hadoop dfs -ls /

drwxr-xr-x - hadoop supergroup 0 2012-05-21 13:29 /_distcp_logs_y8txnu

drwxr-xr-x - hadoop supergroup 0 2012-07-25 09:00 /benchmarks

drwxr-xr-x - hadoop supergroup 0 2013-03-11 16:05 /user

0 Karma

csharp_splunk
Splunk Employee
Splunk Employee

You've verified you have connectivity to that host on that port? No firewall issues? Setup is generally pretty straightforward. I've not seen that error previously, so I'm guessing this is due to a basic environment issue like connectivity. By verified connectivity, I mean using telnet or nc to verify you can actually open a TCP connection to that host and port.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...

Design, Compete, Win: Submit Your Best Splunk Dashboards for a .conf26 Pass

Hello Splunkers,  We’re excited to kick off a Splunk Dashboard contest! We know that dashboards are a primary ...

May 2026 Splunk Expert Sessions: Security & Observability

Level Up Your Operations: May 2026 Splunk Expert Sessions Whether you are refining your security posture or ...