All Apps and Add-ons

Hunk Job get OutOfMemory Error

techdiverdown
Path Finder

I have an existing virtual index with some data and it works fine. I decided to compress the data with snappy and I moved this data to another directory in HDFS. I then created a new virtual index to read the compressed data and I get the following:

06-25-2014 10:03:22.810 INFO ERP.psb_cloudera - Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
06-25-2014 10:03:22.811 INFO ERP.psb_cloudera - at org.apache.hadoop.io.compress.BlockDecompressorStream.getCompressedData(BlockDecompressorStream.java:123)

Please advise how to increase the memory in Hunk, assuming that I use the vix.env.HADOOP_HEAPSIZE. I pushed this to 1024 from 512 and still getting the same error. I could increase more, I am wondering why the needed increase if the compress/decompress is mostly CPU cycles.

My original files were 2014-05-24-16-00.01.csv, the new files are 2014-05-24-16-00.01.csv.snappy. The size was about 300 MB per file, ow the size is approx. 60-80 MB per file.

By the way the job seems to die within a few seconds even if i remove all but one of the files in the directory.

*ADDITIONAL INFO**
If I use gzip compression, everything works fine. I will try bzip2 and lzo as well. I believe I want a splitable compression for the file storage in HDFS so that can be seen from this link:

http://comphadoop.weebly.com/index.html

0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

Can you please provide the (scrubbed) contents of search.log as well as indexes.conf?

Also can you test to see if the following command throws the same error:

hadoop fs -text hdfs://host:port/path/to/file.snappy

Also, what version of Hadoop and Snappy libraries are you using?

0 Karma

bosburn_splunk
Splunk Employee
Splunk Employee

Can you open a ticket up and email bosburn@splunk.com the ticket number?

0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

You can either email them to support or maybe post them on pastebin and provide a link

0 Karma

techdiverdown
Path Finder

Snappy lib 1.0.2, using python-snappy from github.
$hadoop version
Hadoop 2.3.0-cdh5.0.2
Subversion git://github.sf.cloudera.com/CDH/cdh.git -r 8e266e052e423af592871e2dfe09d54c03f6a0e8
Compiled by jenkins on 2014-06-09T16:20Z
Compiled with protoc 2.5.0
From source with checksum 75596fe27f833e512f27fbdaaa7b0ab
This command was run using /usr/lib/hadoop/hadoop-common-2.3.0-cdh5.0.2.jar

0 Karma

techdiverdown
Path Finder

Dumb question - How do I upload these logs? I cannot paste them into this window.

0 Karma

techdiverdown
Path Finder

The above command works fine.

hadoop fs -ls hdfs://cloudera-node0:8020/user/netflow/2014-05-24-17-30-01.csv.snappy
14/06/27 15:02:13 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
-rwxr-xr-x 3 netflow netflow 67108864 2014-06-27 14:54 hdfs://cloudera-node0:8020/user/netflow/2014-05-24-17-30-01.csv.snappy

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Observability Simplified: Combining User Experience, Application Performance & ...

Tech Talk Observability Simplified: Combining User Experience, Application Performance & Network ...

Event Series May & June: From Network Visibility to Service Intelligence

Unifying the Network: Moving from Alert Noise to Service Intelligence with Splunk ITSI In today’s hybrid ...

Global Splunk User Group Events: May + June 2026

Your Splunk Community Awaits: Discover Upcoming User Group Events Worldwide    Staying ahead in the fast-paced ...