Hello,
I was playing around and trying to set up my Datastax Enterprise Analytics nodes with Hunk.
Info about DSE: DSE Hadoop
I got to creating the index but unfortunately Datastax uses CFS rather than HDFS. I tried setting up the provider as HDFS anyway and that didn't work and when I try CFS I get:
[test hadoop] RuntimeException - Failed to create a virtual index filesystem connection: No FileSystem for scheme: cfs. Advice: Verify that your vix.fs.default.name is correct and available.
Using HDFS:
Failed to create a virtual index filesystem connection: Call to hostname/192.168.31.1:9160 failed on local exception: java.io.EOFException.
Not sure if this is really possible but was curious if anyone else had tried this. Thanks for any help/advice!
Thanks for clarifying what's happening. Can you try:
(1) add the following to the provider stanza
vix.fs.cfs.impl = com.datastax.bdp.hadoop.cfs.CassandraFileSystem
(2) use vix.fs.default.name = cfs://cassandrahost/
Hmm, unfortunately we don't log the entire stacktrace so we'd have to guess at this point - maybe http://stackoverflow.com/questions/19534811/cassandra-startup-java-lang-reflect-invocationtargetexce...?
I would recommend that you try to get the Hadoop CLI tools to work with cfs:// filesystem directly (ie no dse hadoop fs ..) and then we can apply those conf changes to Hunk - which is fs insensitive, for example we work with Amazon's s3/n out of the box.
Here's search.log
http://pastebin.com/5ucVrKrb
Can you provide search.log?
actually I found it, dse.jar.
I put it in there and see:
[test hadoop] RuntimeException - Failed to create a virtual index filesystem connection: java.lang.reflect.InvocationTargetException. Advice: Verify that your vix.fs.default.name is correct and available.
Thanks, I briefly checked but couldn't find any jar with that in it. I will look more closely tomorrow. I might just be able to put the cassandra jars in there, but I doubt it would work since grep didn't find anything.
I did find this too: http://www.datastax.com/support-forums/topic/how-can-we-enable-hdfs-and-cfs-too
It looks like I can make HDFS the default, but that sort of defeats the purpose of having CFS.
Thanks for your help! I will let you know if I find the jar.
Correct - indexes.conf. Now you're running into a classpath issue. Can you try to find the cassandra jar where this class com.datastax.bdp.hadoop.cfs.CassandraFileSystem is defined and then add that jar to the following field in the provider: vix.env.HADOOP_CLASSPATH
Command to list the contents of the jar: unzip -l [jar-file] | grep CassandraFileSystem
Ok. Assuming I did this right, here's what it shows now:
[test hadoop] RuntimeException - Failed to create a virtual index filesystem connection: java.lang.ClassNotFoundException: com.datastax.bdp.hadoop.cfs.CassandraFileSystem. Advice: Verify that your vix.fs.default.name is correct and available.
I added
vix.fs.cfs.impl = com.datastax.bdp.hadoop.cfs.CassandraFileSystem
to indexes.conf. Is that correct?
Can you access CFS from the hadoop CLI? e.g
hadoop fs -ls cfs://....
search.log with cfs:
http://pastebin.com/3y2pFd8s
search.log with hdfs:
http://pastebin.com/STUEHxnV
I tried to use CFS in the indexes config but got the " [test hadoop] RuntimeException - Failed to create a virtual index filesystem connection: No FileSystem for scheme: cfs. Advice: Verify that your vix.fs.default.name is correct and available. " That's when I changed to hdfs to see if it worked.
Trying that command returns a: failed on local exception: java.io.EOFException
Hmmm, so you're not using cfs:// - I wonder where the cfs is coming from. Can you also share the contents of search.log? Also, does "hadoop fs -ls hdfs://hostname:9160/" return anything?
Here is indexes.conf
[provider:test hadoop]
vix.env.HADOOP_HOME = /usr/share/dse/hadoop/
vix.env.JAVA_HOME = /usr
vix.family = hadoop
vix.fs.default.name = hdfs://hostname:9160
vix.mapred.job.tracker = hostname:8012
vix.splunk.home.hdfs = /data1/hunk/
[testing]
vix.input.1.path = /...
vix.provider = test hadoop
Here is core-site:
http://pastebin.com/LL69Hn4a
Can you share the content of indexes.conf and core-site.xml? This is most likely a config issue wrt cfs not being a registered filesystem in the hadoop conf
oh - sorry, I misunderstood.
If I go to the dir where the hadoop binary is and try to run it I get:
bash-4.1$ ./hadoop fs -ls cfs://
ls: No FileSystem for scheme: cfs
However, I can do:
./hadoop fs -ls /data1
drwx------ - root root 16384 2014-02-20 17:04 /data1/lost+found
drwxr-xr-x - cassandra cassandra 4096 2014-03-05 18:30 /data1/hunk
drwxrwxr-x - cassandra cassandra 4096 2014-03-05 18:01 /data1/cassandra
Great! What about calling the CLI directly as in: "hadoop fs -ls cfs://" ?
Yes, here's a sample:
bash-4.1$ dse hadoop fs -ls cfs://
Found 2 items
drwxrwxrwx - cassandra cassandra 0 2014-03-05 18:09 /data1
drwxrwxrwx - cassandra cassandra 0 2014-03-05 18:01 /tmp