Does Hunk takes .snappy files from hadoop as input...when we are trying to do so...we are getting the following error message
09-08-2014 19:28:25.530 INFO ERP.MyHadoopProvider - Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
09-08-2014 19:28:25.530 INFO ERP.MyHadoopProvider - at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method)
09-08-2014 19:28:25.530 INFO ERP.MyHadoopProvider - at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:62)
09-08-2014 19:28:25.530 INFO ERP.MyHadoopProvider - at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:185)
09-08-2014 19:28:25.530 INFO ERP.MyHadoopProvider - at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:131)
09-08-2014 19:28:25.530 INFO ERP.MyHadoopProvider - at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:91)
09-08-2014 19:28:25.530 INFO ERP.MyHadoopProvider - at com.splunk.mr.input.SplunkLineRecordReader.vixInitialize(SplunkLineRecordReader.java:17)
09-08-2014 19:28:25.530 INFO ERP.MyHadoopProvider - at com.splunk.mr.input.BaseSplunkRecordReader.initialize(BaseSplunkRecordReader.java:76)
09-08-2014 19:28:25.530 INFO ERP.MyHadoopProvider - at com.splunk.mr.JobSubmitterInputFormat.createRecordReader(JobSubmitterInputFormat.java:64)
09-08-2014 19:28:25.530 INFO ERP.MyHadoopProvider - at com.splunk.mr.SplunkBaseMapper.stream(SplunkBaseMapper.java:319)
09-08-2014 19:28:25.530 INFO ERP.MyHadoopProvider - at com.splunk.mr.SplunkMR$SearchHandler.streamData(SplunkMR.java:604)
09-08-2014 19:28:25.530 INFO ERP.MyHadoopProvider - at com.splunk.mr.SplunkMR$SearchHandler$1.accept(SplunkMR.java:616)
09-08-2014 19:28:25.530 INFO ERP.MyHadoopProvider - at com.splunk.mr.SplunkMR$SearchHandler$1.accept(SplunkMR.java:613)
09-08-2014 19:28:25.530 INFO ERP.MyHadoopProvider - at com.splunk.mr.input.FileSplitGenerator.sendSplitToAcceptor(FileSplitGenerator.java:27)
09-08-2014 19:28:25.530 INFO ERP.MyHadoopProvider - at com.splunk.mr.input.FileSplitGenerator.generateSplits(FileSplitGenerator.java:81)
09-08-2014 19:28:25.530 INFO ERP.MyHadoopProvider - at com.splunk.mr.input.VirtualIndex$FileSplitter.accept(VirtualIndex.java:992)
09-08-2014 19:28:25.530 INFO ERP.MyHadoopProvider - at com.splunk.mr.input.VirtualIndex$FileSplitter.accept(VirtualIndex.java:970)
09-08-2014 19:28:25.531 INFO ERP.MyHadoopProvider - at com.splunk.mr.input.VirtualIndex$VIXPathSpecifier.addStatus(VirtualIndex.java:269)
09-08-2014 19:28:25.531 INFO ERP.MyHadoopProvider - at com.splunk.mr.input.VirtualIndex$VIXPathSpecifier.listStatus(VirtualIndex.java:381)
09-08-2014 19:28:25.531 INFO ERP.MyHadoopProvider - at com.splunk.mr.input.VirtualIndex.generateSplits(VirtualIndex.java:1050)
09-08-2014 19:28:25.531 INFO ERP.MyHadoopProvider - at com.splunk.mr.input.VixSplitGenerator.generateSplits(VixSplitGenerator.java:55)
09-08-2014 19:28:25.531 INFO ERP.MyHadoopProvider - at com.splunk.mr.SplunkMR$SearchHandler.streamData(SplunkMR.java:634)
09-08-2014 19:28:25.531 INFO ERP.MyHadoopProvider - at com.splunk.mr.SplunkMR$SearchHandler.executeImpl(SplunkMR.java:850)
09-08-2014 19:28:25.531 INFO ERP.MyHadoopProvider - at com.splunk.mr.SplunkMR$SearchHandler.execute(SplunkMR.java:695)
09-08-2014 19:28:25.531 INFO ERP.MyHadoopProvider - at com.splunk.mr.SplunkMR.runImpl(SplunkMR.java:1295)
09-08-2014 19:28:25.531 INFO ERP.MyHadoopProvider - at com.splunk.mr.SplunkMR.run(SplunkMR.java:1087)
09-08-2014 19:28:25.531 INFO ERP.MyHadoopProvider - at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
09-08-2014 19:28:25.531 INFO ERP.MyHadoopProvider - at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
Yes, Hunk supports snappy and all other compressions supported by Hadoop. In your particular case however, the Hadoop libraries are having an issue with loading the native code for snappy
09-08-2014 19:28:25.530 INFO ERP.MyHadoopProvider - Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
What version of Hadoop are you using?
Are the snappy native libs in your: $HADOOP_HOME/lib/native ?
HI
We have still not resolved the issue...please help us in resolving this snappy connectivity issue....
Is there a way for us to test one of your snappy files? Do you have an email we can connect?
We are still facing with snappy conversion....actually when we got the Block decompressor error...we added the following property in core-site.xml to add snappycodec class
<name>io.compression.codecs</name>
<value>
org.apache.hadoop.io.compress.SnappyCodec
</value>
and now we are getting the below error now....the error is saying org.apache.hadoop.io.compress.SnappyCodec class not found...but we have the actuall hadoop-commnsxxx.jar in the classpath...not sure why we are getting this error...Any help on this is greatly appreciated...
09-18-2014 20:42:58.184 INFO ERP.Hadoop_Provider - ClusterInfoLogger - Hadoop cluster spec: provider=Hadoop_Provider, tasktrackers=2, map_inuse=1, map_slots=20, reduce_inuse=1, reduce_slots=4
09-18-2014 20:42:58.246 ERROR ERP.Hadoop_Provider - SplunkMR - Compression codec
09-18-2014 20:42:58.246 ERROR ERP.Hadoop_Provider - org.apache.hadoop.io.compress.SnappyCodec
09-18-2014 20:42:58.246 ERROR ERP.Hadoop_Provider - not found.
09-18-2014 20:42:58.246 ERROR ERP.Hadoop_Provider - java.lang.IllegalArgumentException: Compression codec
09-18-2014 20:42:58.246 ERROR ERP.Hadoop_Provider - org.apache.hadoop.io.compress.SnappyCodec
09-18-2014 20:42:58.246 ERROR ERP.Hadoop_Provider - not found.
09-18-2014 20:42:58.246 ERROR ERP.Hadoop_Provider - at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:134)
09-18-2014 20:42:58.246 ERROR ERP.Hadoop_Provider - at org.apache.hadoop.io.compress.CompressionCodecFactory.
What Hadoop version are you using? What does the following command return:
jar -tf $HADOOP_HOME/hadoop-core-*.jar | grep Snappy
We are using Hadoop version as: Hadoop 2.0.0-cdh4.7.0
We are using hadoop yarn based mapreduce, There is no hadoop-core-.jar in our Cdh distribution but we found Snappy in hadoop-common-.jar. Below are the results...
$ jar -tf $HADOOP_HOME/lib/hadoop-common-*.jar | grep Snappy
org/apache/hadoop/io/compress/SnappyCodec.class
org/apache/hadoop/io/compress/snappy/SnappyCompressor.class
org/apache/hadoop/io/compress/snappy/SnappyDecompressor.class
What version of Hunk are you using?
This definitely seems like a classpath issue - what is your HADOOP_HOME pointing to? I just downloaded CDH4.7.0 and snappy seems to be in HADOOP_HOME/share/hadoop/common/hadoop-common-2.0.0-cdh4.7.0.jar
We are using hunk 6.1.2 version.
HADOOP_HOME is pointing to installation directory of Hadoop. and as you mentioned, snappy is included in hadoop-common-*.jar file only. We have also copied the same jar into lib directory of hadoop installation. So, could you please suggest how to setup the classpath for accessing this snappy codec during run time of hunk.
I'm sure you've checked this, but double checking - are you sure the file is not corrupt? ie can the file be fully read by hadoop fs -text ...
This issue is also resolved...now getting the following error
09-09-2014 12:46:34.704 ERROR ERP.Hadoop_Provider - SplunkMR$SearchHandler$1 - Unexpected end of block in input stream
09-09-2014 12:46:34.704 ERROR ERP.Hadoop_Provider - java.io.EOFException: Unexpected end of block in input stream
09-09-2014 12:46:34.704 ERROR ERP.Hadoop_Provider - at org.apache.hadoop.io.compress.BlockDecompressorStream.getCompressedData(BlockDecompressorStream.java:121)
We have tried to increase hadoop heap size and java heap size to almost 4 to 5 GB issue still remains the same
This issue is solved as we moved snappy native libs to $HADOOP_HOME/lib/native. But now we are getting the error
09-09-2014 12:21:50.492 INFO ERP.Hadoop_Provider - Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
09-09-2014 12:21:50.493 INFO ERP.Hadoop_Provider - at org.apache.hadoop.io.compress.BlockDecompressorStream.getCompressedData(BlockDecompressorStream.java:115)
09-09-2014 12:21:50.493 INFO ERP.Hadoop_Provider - at org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:95)
Yes native libs are in our $HADOOP_HOME/lib/native and in our hadoop env "hadoop fs -text *.snappy" is working fine. We are using hadoop-2.0.0-cdh4.7.0 version.