Getting Data In

Does Hunk take .snappy files from Hadoop as an input?

balagovardhan
Explorer

Does Hunk takes .snappy files from hadoop as input...when we are trying to do so...we are getting the following error message

09-08-2014 19:28:25.530 INFO ERP.MyHadoopProvider - Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
09-08-2014 19:28:25.530 INFO ERP.MyHadoopProvider - at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method)
09-08-2014 19:28:25.530 INFO ERP.MyHadoopProvider - at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:62)
09-08-2014 19:28:25.530 INFO ERP.MyHadoopProvider - at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:185)
09-08-2014 19:28:25.530 INFO ERP.MyHadoopProvider - at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:131)
09-08-2014 19:28:25.530 INFO ERP.MyHadoopProvider - at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:91)
09-08-2014 19:28:25.530 INFO ERP.MyHadoopProvider - at com.splunk.mr.input.SplunkLineRecordReader.vixInitialize(SplunkLineRecordReader.java:17)
09-08-2014 19:28:25.530 INFO ERP.MyHadoopProvider - at com.splunk.mr.input.BaseSplunkRecordReader.initialize(BaseSplunkRecordReader.java:76)
09-08-2014 19:28:25.530 INFO ERP.MyHadoopProvider - at com.splunk.mr.JobSubmitterInputFormat.createRecordReader(JobSubmitterInputFormat.java:64)
09-08-2014 19:28:25.530 INFO ERP.MyHadoopProvider - at com.splunk.mr.SplunkBaseMapper.stream(SplunkBaseMapper.java:319)
09-08-2014 19:28:25.530 INFO ERP.MyHadoopProvider - at com.splunk.mr.SplunkMR$SearchHandler.streamData(SplunkMR.java:604)
09-08-2014 19:28:25.530 INFO ERP.MyHadoopProvider - at com.splunk.mr.SplunkMR$SearchHandler$1.accept(SplunkMR.java:616)
09-08-2014 19:28:25.530 INFO ERP.MyHadoopProvider - at com.splunk.mr.SplunkMR$SearchHandler$1.accept(SplunkMR.java:613)
09-08-2014 19:28:25.530 INFO ERP.MyHadoopProvider - at com.splunk.mr.input.FileSplitGenerator.sendSplitToAcceptor(FileSplitGenerator.java:27)
09-08-2014 19:28:25.530 INFO ERP.MyHadoopProvider - at com.splunk.mr.input.FileSplitGenerator.generateSplits(FileSplitGenerator.java:81)
09-08-2014 19:28:25.530 INFO ERP.MyHadoopProvider - at com.splunk.mr.input.VirtualIndex$FileSplitter.accept(VirtualIndex.java:992)
09-08-2014 19:28:25.530 INFO ERP.MyHadoopProvider - at com.splunk.mr.input.VirtualIndex$FileSplitter.accept(VirtualIndex.java:970)
09-08-2014 19:28:25.531 INFO ERP.MyHadoopProvider - at com.splunk.mr.input.VirtualIndex$VIXPathSpecifier.addStatus(VirtualIndex.java:269)
09-08-2014 19:28:25.531 INFO ERP.MyHadoopProvider - at com.splunk.mr.input.VirtualIndex$VIXPathSpecifier.listStatus(VirtualIndex.java:381)
09-08-2014 19:28:25.531 INFO ERP.MyHadoopProvider - at com.splunk.mr.input.VirtualIndex.generateSplits(VirtualIndex.java:1050)
09-08-2014 19:28:25.531 INFO ERP.MyHadoopProvider - at com.splunk.mr.input.VixSplitGenerator.generateSplits(VixSplitGenerator.java:55)
09-08-2014 19:28:25.531 INFO ERP.MyHadoopProvider - at com.splunk.mr.SplunkMR$SearchHandler.streamData(SplunkMR.java:634)
09-08-2014 19:28:25.531 INFO ERP.MyHadoopProvider - at com.splunk.mr.SplunkMR$SearchHandler.executeImpl(SplunkMR.java:850)
09-08-2014 19:28:25.531 INFO ERP.MyHadoopProvider - at com.splunk.mr.SplunkMR$SearchHandler.execute(SplunkMR.java:695)
09-08-2014 19:28:25.531 INFO ERP.MyHadoopProvider - at com.splunk.mr.SplunkMR.runImpl(SplunkMR.java:1295)
09-08-2014 19:28:25.531 INFO ERP.MyHadoopProvider - at com.splunk.mr.SplunkMR.run(SplunkMR.java:1087)
09-08-2014 19:28:25.531 INFO ERP.MyHadoopProvider - at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
09-08-2014 19:28:25.531 INFO ERP.MyHadoopProvider - at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)

Tags (3)
0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

Yes, Hunk supports snappy and all other compressions supported by Hadoop. In your particular case however, the Hadoop libraries are having an issue with loading the native code for snappy

09-08-2014 19:28:25.530 INFO ERP.MyHadoopProvider - Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z

What version of Hadoop are you using?

Are the snappy native libs in your: $HADOOP_HOME/lib/native ?

0 Karma

balagovardhan
Explorer

HI

We have still not resolved the issue...please help us in resolving this snappy connectivity issue....

0 Karma

rdagan_splunk
Splunk Employee
Splunk Employee

Is there a way for us to test one of your snappy files? Do you have an email we can connect?

0 Karma

balagovardhan
Explorer

We are still facing with snappy conversion....actually when we got the Block decompressor error...we added the following property in core-site.xml to add snappycodec class

<name>io.compression.codecs</name>
<value>
  org.apache.hadoop.io.compress.SnappyCodec
</value>

and now we are getting the below error now....the error is saying org.apache.hadoop.io.compress.SnappyCodec class not found...but we have the actuall hadoop-commnsxxx.jar in the classpath...not sure why we are getting this error...Any help on this is greatly appreciated...

09-18-2014 20:42:58.184 INFO ERP.Hadoop_Provider - ClusterInfoLogger - Hadoop cluster spec: provider=Hadoop_Provider, tasktrackers=2, map_inuse=1, map_slots=20, reduce_inuse=1, reduce_slots=4
09-18-2014 20:42:58.246 ERROR ERP.Hadoop_Provider - SplunkMR - Compression codec
09-18-2014 20:42:58.246 ERROR ERP.Hadoop_Provider - org.apache.hadoop.io.compress.SnappyCodec
09-18-2014 20:42:58.246 ERROR ERP.Hadoop_Provider - not found.
09-18-2014 20:42:58.246 ERROR ERP.Hadoop_Provider - java.lang.IllegalArgumentException: Compression codec
09-18-2014 20:42:58.246 ERROR ERP.Hadoop_Provider - org.apache.hadoop.io.compress.SnappyCodec
09-18-2014 20:42:58.246 ERROR ERP.Hadoop_Provider - not found.
09-18-2014 20:42:58.246 ERROR ERP.Hadoop_Provider - at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:134)
09-18-2014 20:42:58.246 ERROR ERP.Hadoop_Provider - at org.apache.hadoop.io.compress.CompressionCodecFactory.(CompressionCodecFactory.java:174)

0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

What Hadoop version are you using? What does the following command return:

 jar -tf $HADOOP_HOME/hadoop-core-*.jar | grep Snappy
0 Karma

balagovardhan
Explorer

We are using Hadoop version as: Hadoop 2.0.0-cdh4.7.0

We are using hadoop yarn based mapreduce, There is no hadoop-core-.jar in our Cdh distribution but we found Snappy in hadoop-common-.jar. Below are the results...

$ jar -tf $HADOOP_HOME/lib/hadoop-common-*.jar | grep Snappy

org/apache/hadoop/io/compress/SnappyCodec.class
org/apache/hadoop/io/compress/snappy/SnappyCompressor.class
org/apache/hadoop/io/compress/snappy/SnappyDecompressor.class

0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

What version of Hunk are you using?

This definitely seems like a classpath issue - what is your HADOOP_HOME pointing to? I just downloaded CDH4.7.0 and snappy seems to be in HADOOP_HOME/share/hadoop/common/hadoop-common-2.0.0-cdh4.7.0.jar

0 Karma

balagovardhan
Explorer

We are using hunk 6.1.2 version.
HADOOP_HOME is pointing to installation directory of Hadoop. and as you mentioned, snappy is included in hadoop-common-*.jar file only. We have also copied the same jar into lib directory of hadoop installation. So, could you please suggest how to setup the classpath for accessing this snappy codec during run time of hunk.

0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

I'm sure you've checked this, but double checking - are you sure the file is not corrupt? ie can the file be fully read by hadoop fs -text ...

0 Karma

balagovardhan
Explorer

This issue is also resolved...now getting the following error

09-09-2014 12:46:34.704 ERROR ERP.Hadoop_Provider - SplunkMR$SearchHandler$1 - Unexpected end of block in input stream
09-09-2014 12:46:34.704 ERROR ERP.Hadoop_Provider - java.io.EOFException: Unexpected end of block in input stream
09-09-2014 12:46:34.704 ERROR ERP.Hadoop_Provider - at org.apache.hadoop.io.compress.BlockDecompressorStream.getCompressedData(BlockDecompressorStream.java:121)

0 Karma

balagovardhan
Explorer

We have tried to increase hadoop heap size and java heap size to almost 4 to 5 GB issue still remains the same

0 Karma

balagovardhan
Explorer

This issue is solved as we moved snappy native libs to $HADOOP_HOME/lib/native. But now we are getting the error

09-09-2014 12:21:50.492 INFO ERP.Hadoop_Provider - Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
09-09-2014 12:21:50.493 INFO ERP.Hadoop_Provider - at org.apache.hadoop.io.compress.BlockDecompressorStream.getCompressedData(BlockDecompressorStream.java:115)
09-09-2014 12:21:50.493 INFO ERP.Hadoop_Provider - at org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:95)

balagovardhan
Explorer

Yes native libs are in our $HADOOP_HOME/lib/native and in our hadoop env "hadoop fs -text *.snappy" is working fine. We are using hadoop-2.0.0-cdh4.7.0 version.

0 Karma
Get Updates on the Splunk Community!

Infographic provides the TL;DR for the 2024 Splunk Career Impact Report

We’ve been buzzing with excitement about the recent validation of Splunk Education! The 2024 Splunk Career ...

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...