Archive

How to set up ERP for hadoop in hunk

Explorer

Hi,

I am trying to set up ERP for apache hadoop remotely connected.
I have followed the steps described in docs.
When i run the basic search on virtual index i am getting following error-

10-10-2014 05:13:52.111 ERROR ERP.EbgProvider - /bin/bash: /home/hduser/hadoop-1.0.2/bin/hadoop: No such file or directory
10-10-2014 05:13:52.115 ERROR SearchOperator:stdin - Cannot consume data with unset stream_type
10-10-2014 05:13:52.115 ERROR ExternalResultProvider - Error in 'SearchOperator:stdin': Cannot consume data with unset stream_type

Hunk version-6.1.2
Apache hadoop version-1.0.2

Tags (3)
1 Solution

Splunk Employee
Splunk Employee

Can you do some basic testing to make sure that the Hunk node has connectivity to HDFS

/home/hduser/hadoop-1.0.2/bin/hadoop fs -ls hdfs://namenode-host:port/ 

If that works then you should set the provider's "Hadoop Home" to "/home/hduser/hadoop-1.0.2/" -
if not please let us know what command works.

> How does Hunk connect to remote hdfs?
For your use case I would recommend that you simply run Hunk as the 'hduser'

View solution in original post

0 Karma

Explorer

Thank you very much for your help.

0 Karma

Explorer

The basic testing worked as per your suggestion.

My Learnings:
In provider, under Hadoop Home - the location given was /home/hkuser/hadoop-1.0.2. This is the local system hadoop location where even hunk is installed. Even the java home location is from the local machine and not from the remote hadoop system.

My initial understanding was that, we would just need hunk to be installed in the local machine. We would not need hadoop in the local system. But based on what I've done, I can see that we need the local hadoop and java installable to access to remote HDFS system.

So, now I'm able to query a virtual index which is on a different HDFS system when compared to Hunk.

Thank you for those inputs. They came in handy.

Now the question is : When I do a search on a virtual index in Hunk. The MR runs on the local hadoop system? If yes, then I'm not able to see the job in the job tracker. Does Hunk hide this information?

0 Karma

Splunk Employee
Splunk Employee

Right, we need the Hadoop libraries and Java installed on the system where Hunk is installed so we can communicate with the Hadoop cluster.

Hunk submits MR jobs into the Hadoop cluster you have configured it to connect to (ie provider). Not all Hunk searches submit MR jobs though, the following should: index=<vix> | stats count by source

0 Karma

Splunk Employee
Splunk Employee

Can you do some basic testing to make sure that the Hunk node has connectivity to HDFS

/home/hduser/hadoop-1.0.2/bin/hadoop fs -ls hdfs://namenode-host:port/ 

If that works then you should set the provider's "Hadoop Home" to "/home/hduser/hadoop-1.0.2/" -
if not please let us know what command works.

> How does Hunk connect to remote hdfs?
For your use case I would recommend that you simply run Hunk as the 'hduser'

View solution in original post

0 Karma

Splunk Employee
Splunk Employee

Try /home/hduser/hadoop-1.0.2 without the bin. Hunk add the bin/hadoop to your path
Hunk connect to the remote hdfs using Hadoop client on the Hunk node. No need for user impersonation.
The best way to confirm you are all set is by using the command line type ' hadoop fs -ls hdfs://Name_Node_Machine:PORT/ ' Then you can use the same values inside the Provider

0 Karma

Explorer

the path - /home/hduser/hadoop-1.0.2/bin/hadoop was not present so i changed it to /home/hduser/hadoop-1.0.2/bin Still it is not working.

Another question is: How does Hunk connect to remote hdfs? Does it have anything to do with the user impersonation? My remote hdfs is setup using hduser as the user name and I've used these settings in the user impersonation tab: Provider --> Providername -- Users -- admin -- Hadoop User --> hduser -- Queue --> 1. Is there any way to confirm if these settings are right?

0 Karma

Explorer

Yes it is already set. Following things are set-
Namenode: xyz:9000
Jobtracker: abc:9001
HADOOP_HOME=/home/hduser/hadoop-1.0.2
JAVA_HOME=/home/hduser/jdk1.7.0_21
/user/hduser/hunk
/user/hduser/hunksearchhead

0 Karma

Splunk Employee
Splunk Employee

Does the following file exits /home/hduser/hadoop-1.0.2/bin/hadoop ? That is what bash is complaining about ...

0 Karma

Splunk Employee
Splunk Employee

Where is Hadoop installed? In the provider you need to set "Hadoop Home" to point to the location where Hunk can find Hadoop, such that we can call $HADOOP_HOME/bin/hadoop .....

0 Karma