All Apps and Add-ons

Anyone ever tried using Hunk with Datastax Enterprise?

dcparker
Path Finder

Hello,

I was playing around and trying to set up my Datastax Enterprise Analytics nodes with Hunk.
Info about DSE: DSE Hadoop

I got to creating the index but unfortunately Datastax uses CFS rather than HDFS. I tried setting up the provider as HDFS anyway and that didn't work and when I try CFS I get:

 [test hadoop] RuntimeException - Failed to create a virtual index filesystem connection: No FileSystem for scheme: cfs. Advice: Verify that your vix.fs.default.name is correct and available. 

Using HDFS:

 Failed to create a virtual index filesystem connection: Call to hostname/192.168.31.1:9160 failed on local exception: java.io.EOFException.

Not sure if this is really possible but was curious if anyone else had tried this. Thanks for any help/advice!

Tags (2)
0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

Thanks for clarifying what's happening. Can you try:

(1) add the following to the provider stanza

vix.fs.cfs.impl = com.datastax.bdp.hadoop.cfs.CassandraFileSystem

(2) use vix.fs.default.name = cfs://cassandrahost/

Ledion_Bitincka
Splunk Employee
Splunk Employee

Hmm, unfortunately we don't log the entire stacktrace so we'd have to guess at this point - maybe http://stackoverflow.com/questions/19534811/cassandra-startup-java-lang-reflect-invocationtargetexce...?

I would recommend that you try to get the Hadoop CLI tools to work with cfs:// filesystem directly (ie no dse hadoop fs ..) and then we can apply those conf changes to Hunk - which is fs insensitive, for example we work with Amazon's s3/n out of the box.

0 Karma

dcparker
Path Finder

Here's search.log
http://pastebin.com/5ucVrKrb

0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

Can you provide search.log?

0 Karma

dcparker
Path Finder

actually I found it, dse.jar.

I put it in there and see:

[test hadoop] RuntimeException - Failed to create a virtual index filesystem connection: java.lang.reflect.InvocationTargetException. Advice: Verify that your vix.fs.default.name is correct and available.

0 Karma

dcparker
Path Finder

Thanks, I briefly checked but couldn't find any jar with that in it. I will look more closely tomorrow. I might just be able to put the cassandra jars in there, but I doubt it would work since grep didn't find anything.

I did find this too: http://www.datastax.com/support-forums/topic/how-can-we-enable-hdfs-and-cfs-too

It looks like I can make HDFS the default, but that sort of defeats the purpose of having CFS.

Thanks for your help! I will let you know if I find the jar.

0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

Correct - indexes.conf. Now you're running into a classpath issue. Can you try to find the cassandra jar where this class com.datastax.bdp.hadoop.cfs.CassandraFileSystem is defined and then add that jar to the following field in the provider: vix.env.HADOOP_CLASSPATH

Command to list the contents of the jar: unzip -l [jar-file] | grep CassandraFileSystem

0 Karma

dcparker
Path Finder

Ok. Assuming I did this right, here's what it shows now:

[test hadoop] RuntimeException - Failed to create a virtual index filesystem connection: java.lang.ClassNotFoundException: com.datastax.bdp.hadoop.cfs.CassandraFileSystem. Advice: Verify that your vix.fs.default.name is correct and available.

I added
vix.fs.cfs.impl = com.datastax.bdp.hadoop.cfs.CassandraFileSystem

to indexes.conf. Is that correct?

0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

Can you access CFS from the hadoop CLI? e.g

hadoop fs -ls cfs://....

dcparker
Path Finder

search.log with cfs:
http://pastebin.com/3y2pFd8s

search.log with hdfs:
http://pastebin.com/STUEHxnV

0 Karma

dcparker
Path Finder

I tried to use CFS in the indexes config but got the " [test hadoop] RuntimeException - Failed to create a virtual index filesystem connection: No FileSystem for scheme: cfs. Advice: Verify that your vix.fs.default.name is correct and available. " That's when I changed to hdfs to see if it worked.

Trying that command returns a: failed on local exception: java.io.EOFException

0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

Hmmm, so you're not using cfs:// - I wonder where the cfs is coming from. Can you also share the contents of search.log? Also, does "hadoop fs -ls hdfs://hostname:9160/" return anything?

0 Karma

dcparker
Path Finder

Here is indexes.conf
[provider:test hadoop]
vix.env.HADOOP_HOME = /usr/share/dse/hadoop/
vix.env.JAVA_HOME = /usr
vix.family = hadoop
vix.fs.default.name = hdfs://hostname:9160
vix.mapred.job.tracker = hostname:8012
vix.splunk.home.hdfs = /data1/hunk/

[testing]
vix.input.1.path = /...
vix.provider = test hadoop

Here is core-site:
http://pastebin.com/LL69Hn4a

0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

Can you share the content of indexes.conf and core-site.xml? This is most likely a config issue wrt cfs not being a registered filesystem in the hadoop conf

0 Karma

dcparker
Path Finder

oh - sorry, I misunderstood.

If I go to the dir where the hadoop binary is and try to run it I get:
bash-4.1$ ./hadoop fs -ls cfs://
ls: No FileSystem for scheme: cfs

However, I can do:
./hadoop fs -ls /data1
drwx------ - root root 16384 2014-02-20 17:04 /data1/lost+found
drwxr-xr-x - cassandra cassandra 4096 2014-03-05 18:30 /data1/hunk
drwxrwxr-x - cassandra cassandra 4096 2014-03-05 18:01 /data1/cassandra

0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

Great! What about calling the CLI directly as in: "hadoop fs -ls cfs://" ?

0 Karma

dcparker
Path Finder

Yes, here's a sample:

bash-4.1$ dse hadoop fs -ls cfs://
Found 2 items
drwxrwxrwx - cassandra cassandra 0 2014-03-05 18:09 /data1
drwxrwxrwx - cassandra cassandra 0 2014-03-05 18:01 /tmp

0 Karma
Get Updates on the Splunk Community!

New Release | Splunk Enterprise 9.3

Admins and Analyst can benefit from:  Seamlessly route data to your local file system to save on storage ...

2024 Splunk Career Impact Survey | Earn a $20 gift card for participating!

Hear ye, hear ye! The time has come again for Splunk's annual Career Impact Survey!  We need your help by ...

Optimize Cloud Monitoring

  TECH TALKS Optimize Cloud Monitoring Tuesday, August 13, 2024  |  11:00AM–12:00PM PST   Register to ...