Splunk Dev

ERP.hive-tt - Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

tsunamii
Path Finder

What setting should I try increasing to avoid getting this Hunk + Hive error?

10-08-2015 00:31:55.626 INFO  ERP.hive-tt -  Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readDiskRanges(RecordReaderImpl.java:3085)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readPartialDataStreams(RecordReaderImpl.java:3194)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripe(RecordReaderImpl.java:2796)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:3213)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:3255)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.<init>(RecordReaderImpl.java:322)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:534)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.createReaderFromFile(OrcInputFormat.java:234)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.<init>(OrcInputFormat.java:166)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1133)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at com.splunk.mr.input.HiveRecordReader.vixInitialize(HiveRecordReader.java:206)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at com.splunk.mr.input.BaseSplunkRecordReader.initialize(BaseSplunkRecordReader.java:95)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at com.splunk.mr.JobSubmitterInputFormat.createRecordReader(JobSubmitterInputFormat.java:66)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at com.splunk.mr.SplunkBaseMapper.stream(SplunkBaseMapper.java:323)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at com.splunk.mr.SplunkMR$SearchHandler.streamData(SplunkMR.java:644)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at com.splunk.mr.SplunkMR$SearchHandler$1.accept(SplunkMR.java:656)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at com.splunk.mr.SplunkMR$SearchHandler$1.accept(SplunkMR.java:653)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at com.splunk.mr.input.HiveSplitGenerator.sendSplitToAcceptor(HiveSplitGenerator.java:80)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at com.splunk.mr.input.FileSplitGenerator.generateSplits(FileSplitGenerator.java:68)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at com.splunk.mr.input.VirtualIndex$FileSplitter.accept(VirtualIndex.java:1418)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at com.splunk.mr.input.VirtualIndex$FileSplitter.accept(VirtualIndex.java:1396)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at com.splunk.mr.input.VirtualIndex$VIXPathSpecifier.addStatus(VirtualIndex.java:576)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at com.splunk.mr.input.VirtualIndex$VIXPathSpecifier.listStatus(VirtualIndex.java:609)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at com.splunk.mr.input.VirtualIndex$Splitter.generateSplits(VirtualIndex.java:1566)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at com.splunk.mr.input.VirtualIndex.generateSplits(VirtualIndex.java:1485)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at com.splunk.mr.input.VirtualIndex.generateSplits(VirtualIndex.java:1437)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at com.splunk.mr.input.VixSplitGenerator.generateSplits(VixSplitGenerator.java:55)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at com.splunk.mr.SplunkMR$SearchHandler.streamData(SplunkMR.java:674)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at com.splunk.mr.SplunkMR$SearchHandler.executeImpl(SplunkMR.java:936)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at com.splunk.mr.SplunkMR$SearchHandler.execute(SplunkMR.java:771)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at com.splunk.mr.SplunkMR.runImpl(SplunkMR.java:1518)
10-08-2015 00:31:55.626 INFO  ERP.hive-tt -     at com.splunk.mr.SplunkMR.run(SplunkMR.java:1300)
0 Karma
1 Solution

rdagan_splunk
Splunk Employee
Splunk Employee

The failure happened while the ORC Record Reader is trying to read a stripe of data. Increasing the heap size should help.
In the Hunk Provider, what is the value of vix.env.HADOOP_HEAPSIZE?
By default it is 512 and increasing it should fix the OOM error.

View solution in original post

rdagan_splunk
Splunk Employee
Splunk Employee

The failure happened while the ORC Record Reader is trying to read a stripe of data. Increasing the heap size should help.
In the Hunk Provider, what is the value of vix.env.HADOOP_HEAPSIZE?
By default it is 512 and increasing it should fix the OOM error.

splunkIT
Splunk Employee
Splunk Employee

Keep in mind that the value for vix.env.HADOOP_HEAPSIZE is per Hunk search, increasing it too much and if
you run many concurrent searches can potentially exhaust your available physical memory.

0 Karma

tsunamii
Path Finder

Thanks @rdagan for the suggestion.

I have found the problem:
We recently upgraded the grid package, which also updated the file hadoop-env.sh and removed the line I had commented out the export HADOOP_CLIENT_OPTS="-Xmx128m ${HADOOP_CLIENT_OPTS}" and thus I was not getting the HADOOP HEAP SIZE I thought.

I fixed that and now my job runs for a long time.

0 Karma

burwell
SplunkTrust
SplunkTrust

I am in a situation where I cannot easily comment out the line in hadoop-env.sh (it gets overwritten when I update any packages.)

Is it possible to set these Hadoop variables with an export of HADOOP_CLIENT_OPTS instead of setting them in indexes.conf? Does that work?

0 Karma

kschon_splunk
Splunk Employee
Splunk Employee

Yes, that should work, and I believe it's the preferred way to do this. The line mentioned above puts any user-defined Xmx value after the default 128m value, meaning the user-defined value should win.

0 Karma

burwell
SplunkTrust
SplunkTrust

Thanks Keith.

So to increase heap size to 2048 would that be export HADOOP_CLIENT_OPTS="-Xmx2048m"

0 Karma

kschon_splunk
Splunk Employee
Splunk Employee

Yes, that should work.

0 Karma

burwell
SplunkTrust
SplunkTrust

Thanks. Is there a good way to test what the setting is? i.e. to make sure the EXPORT is working?

0 Karma

kschon_splunk
Splunk Employee
Splunk Employee

A few points:

--I just realized we're ignoring the obvious way to handle this: change the vix.env.HADOOP_CLIENT_OPTS setting in the provider. That will overwrite any value inherited from the user environment.

--To test what value is being used, you can edit your hadoop script. This will usually be in /bin/hadoop. You can insert a line that looks like:

echo $HADOOP_CLIENT_OPTS > ~/my_hadoop_client_opts.txt

Then run a search in Hunk, and see what that file contains.

--Depending on your Hadoop distribution, you may need to set $HADOOP_OPTS instead.

Hope that helps.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...