Developing for Splunk Enterprise

ERP.hive-tt - Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

tsunamii
Path Finder

What setting should I try increasing to avoid getting this Hunk + Hive error?

10-08-2015 00:31:55.626 INFO  ERP.hive-tt -  Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readDiskRanges(RecordReaderImpl.java:3085)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readPartialDataStreams(RecordReaderImpl.java:3194)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripe(RecordReaderImpl.java:2796)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:3213)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:3255)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.<init>(RecordReaderImpl.java:322)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:534)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.createReaderFromFile(OrcInputFormat.java:234)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.<init>(OrcInputFormat.java:166)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1133)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at com.splunk.mr.input.HiveRecordReader.vixInitialize(HiveRecordReader.java:206)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at com.splunk.mr.input.BaseSplunkRecordReader.initialize(BaseSplunkRecordReader.java:95)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at com.splunk.mr.JobSubmitterInputFormat.createRecordReader(JobSubmitterInputFormat.java:66)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at com.splunk.mr.SplunkBaseMapper.stream(SplunkBaseMapper.java:323)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at com.splunk.mr.SplunkMR$SearchHandler.streamData(SplunkMR.java:644)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at com.splunk.mr.SplunkMR$SearchHandler$1.accept(SplunkMR.java:656)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at com.splunk.mr.SplunkMR$SearchHandler$1.accept(SplunkMR.java:653)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at com.splunk.mr.input.HiveSplitGenerator.sendSplitToAcceptor(HiveSplitGenerator.java:80)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at com.splunk.mr.input.FileSplitGenerator.generateSplits(FileSplitGenerator.java:68)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at com.splunk.mr.input.VirtualIndex$FileSplitter.accept(VirtualIndex.java:1418)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at com.splunk.mr.input.VirtualIndex$FileSplitter.accept(VirtualIndex.java:1396)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at com.splunk.mr.input.VirtualIndex$VIXPathSpecifier.addStatus(VirtualIndex.java:576)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at com.splunk.mr.input.VirtualIndex$VIXPathSpecifier.listStatus(VirtualIndex.java:609)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at com.splunk.mr.input.VirtualIndex$Splitter.generateSplits(VirtualIndex.java:1566)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at com.splunk.mr.input.VirtualIndex.generateSplits(VirtualIndex.java:1485)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at com.splunk.mr.input.VirtualIndex.generateSplits(VirtualIndex.java:1437)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at com.splunk.mr.input.VixSplitGenerator.generateSplits(VixSplitGenerator.java:55)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at com.splunk.mr.SplunkMR$SearchHandler.streamData(SplunkMR.java:674)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at com.splunk.mr.SplunkMR$SearchHandler.executeImpl(SplunkMR.java:936)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at com.splunk.mr.SplunkMR$SearchHandler.execute(SplunkMR.java:771)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at com.splunk.mr.SplunkMR.runImpl(SplunkMR.java:1518)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at com.splunk.mr.SplunkMR.run(SplunkMR.java:1300)
0 Karma
1 Solution

rdagan_splunk
Splunk Employee
Splunk Employee

The failure happened while the ORC Record Reader is trying to read a stripe of data. Increasing the heap size should help.
In the Hunk Provider, what is the value of vix.env.HADOOP_HEAPSIZE?
By default it is 512 and increasing it should fix the OOM error.

View solution in original post

rdagan_splunk
Splunk Employee
Splunk Employee

The failure happened while the ORC Record Reader is trying to read a stripe of data. Increasing the heap size should help.
In the Hunk Provider, what is the value of vix.env.HADOOP_HEAPSIZE?
By default it is 512 and increasing it should fix the OOM error.

View solution in original post

splunkIT
Splunk Employee
Splunk Employee

Keep in mind that the value for vix.env.HADOOP_HEAPSIZE is per Hunk search, increasing it too much and if
you run many concurrent searches can potentially exhaust your available physical memory.

0 Karma

tsunamii
Path Finder

Thanks @rdagan for the suggestion.

I have found the problem:
We recently upgraded the grid package, which also updated the file hadoop-env.sh and removed the line I had commented out the export HADOOP_CLIENT_OPTS="-Xmx128m ${HADOOP_CLIENT_OPTS}" and thus I was not getting the HADOOP HEAP SIZE I thought.

I fixed that and now my job runs for a long time.

0 Karma

burwell
SplunkTrust
SplunkTrust

I am in a situation where I cannot easily comment out the line in hadoop-env.sh (it gets overwritten when I update any packages.)

Is it possible to set these Hadoop variables with an export of HADOOP_CLIENT_OPTS instead of setting them in indexes.conf? Does that work?

0 Karma

kschon_splunk
Splunk Employee
Splunk Employee

Yes, that should work, and I believe it's the preferred way to do this. The line mentioned above puts any user-defined Xmx value after the default 128m value, meaning the user-defined value should win.

0 Karma

burwell
SplunkTrust
SplunkTrust

Thanks Keith.

So to increase heap size to 2048 would that be export HADOOP_CLIENT_OPTS="-Xmx2048m"

0 Karma

kschon_splunk
Splunk Employee
Splunk Employee

Yes, that should work.

0 Karma

burwell
SplunkTrust
SplunkTrust

Thanks. Is there a good way to test what the setting is? i.e. to make sure the EXPORT is working?

0 Karma

kschon_splunk
Splunk Employee
Splunk Employee

A few points:

--I just realized we're ignoring the obvious way to handle this: change the vix.env.HADOOP_CLIENT_OPTS setting in the provider. That will overwrite any value inherited from the user environment.

--To test what value is being used, you can edit your hadoop script. This will usually be in /bin/hadoop. You can insert a line that looks like:

echo $HADOOP_CLIENT_OPTS > ~/my_hadoop_client_opts.txt

Then run a search in Hunk, and see what that file contains.

--Depending on your Hadoop distribution, you may need to set $HADOOP_OPTS instead.

Hope that helps.

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!