Hunk Job get OutOfMemory Error

techdiverdown · ‎06-25-2014

I have an existing virtual index with some data and it works fine. I decided to compress the data with snappy and I moved this data to another directory in HDFS. I then created a new virtual index to read the compressed data and I get the following:

06-25-2014 10:03:22.810 INFO ERP.psb_cloudera - Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
06-25-2014 10:03:22.811 INFO ERP.psb_cloudera - at org.apache.hadoop.io.compress.BlockDecompressorStream.getCompressedData(BlockDecompressorStream.java:123)

Please advise how to increase the memory in Hunk, assuming that I use the vix.env.HADOOP_HEAPSIZE. I pushed this to 1024 from 512 and still getting the same error. I could increase more, I am wondering why the needed increase if the compress/decompress is mostly CPU cycles.

My original files were 2014-05-24-16-00.01.csv, the new files are 2014-05-24-16-00.01.csv.snappy. The size was about 300 MB per file, ow the size is approx. 60-80 MB per file.

By the way the job seems to die within a few seconds even if i remove all but one of the files in the directory.

*ADDITIONAL INFO**
If I use gzip compression, everything works fine. I will try bzip2 and lzo as well. I believe I want a splitable compression for the file storage in HDFS so that can be seen from this link:

http://comphadoop.weebly.com/index.html

Ledion_Bitincka · ‎06-25-2014

Can you please provide the (scrubbed) contents of search.log as well as indexes.conf?

Also can you test to see if the following command throws the same error:

hadoop fs -text hdfs://host:port/path/to/file.snappy

Also, what version of Hadoop and Snappy libraries are you using?

bosburn_splunk · ‎06-29-2014

Can you open a ticket up and email [email protected] the ticket number?

Bosley

Ledion_Bitincka · ‎06-27-2014

You can either email them to support or maybe post them on pastebin and provide a link

techdiverdown · ‎06-27-2014

Snappy lib 1.0.2, using python-snappy from github.
$hadoop version
Hadoop 2.3.0-cdh5.0.2
Subversion git://github.sf.cloudera.com/CDH/cdh.git -r 8e266e052e423af592871e2dfe09d54c03f6a0e8
Compiled by jenkins on 2014-06-09T16:20Z
Compiled with protoc 2.5.0
From source with checksum 75596fe27f833e512f27fbdaaa7b0ab
This command was run using /usr/lib/hadoop/hadoop-common-2.3.0-cdh5.0.2.jar

techdiverdown · ‎06-27-2014

Dumb question - How do I upload these logs? I cannot paste them into this window.

techdiverdown · ‎06-27-2014

The above command works fine.

hadoop fs -ls hdfs://cloudera-node0:8020/user/netflow/2014-05-24-17-30-01.csv.snappy
14/06/27 15:02:13 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
-rwxr-xr-x 3 netflow netflow 67108864 2014-06-27 14:54 hdfs://cloudera-node0:8020/user/netflow/2014-05-24-17-30-01.csv.snappy

Hunk Job get OutOfMemory Error

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Agentic with Splunk Lantern: Connect to Cisco Cloud Control, Transform ...

July Community Events: Master ITSI 5.0 & Automate Splunk

New Release of Federated Search: Bringing Splunk Analytics to More of Your Data

Join the Conversation

Hunk Job get OutOfMemory Error

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Agentic with Splunk Lantern: Connect to Cisco Cloud Control, Transform ...

July Community Events: Master ITSI 5.0 & Automate Splunk

New Release of Federated Search: Bringing Splunk Analytics to More of Your Data