What setting should I try increasing to avoid getting this Hunk + Hive error?
10-08-2015 00:31:55.626 INFO ERP.hive-tt - Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
10-08-2015 00:31:55.626 INFO ERP.hive-tt - at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readDiskRanges(RecordReaderImpl.java:3085)
10-08-2015 00:31:55.626 INFO ERP.hive-tt - at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readPartialDataStreams(RecordReaderImpl.java:3194)
10-08-2015 00:31:55.626 INFO ERP.hive-tt - at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripe(RecordReaderImpl.java:2796)
10-08-2015 00:31:55.626 INFO ERP.hive-tt - at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:3213)
10-08-2015 00:31:55.626 INFO ERP.hive-tt - at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:3255)
10-08-2015 00:31:55.626 INFO ERP.hive-tt - at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.<init>(RecordReaderImpl.java:322)
10-08-2015 00:31:55.626 INFO ERP.hive-tt - at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:534)
10-08-2015 00:31:55.626 INFO ERP.hive-tt - at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.createReaderFromFile(OrcInputFormat.java:234)
10-08-2015 00:31:55.626 INFO ERP.hive-tt - at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.<init>(OrcInputFormat.java:166)
10-08-2015 00:31:55.626 INFO ERP.hive-tt - at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1133)
10-08-2015 00:31:55.626 INFO ERP.hive-tt - at com.splunk.mr.input.HiveRecordReader.vixInitialize(HiveRecordReader.java:206)
10-08-2015 00:31:55.626 INFO ERP.hive-tt - at com.splunk.mr.input.BaseSplunkRecordReader.initialize(BaseSplunkRecordReader.java:95)
10-08-2015 00:31:55.626 INFO ERP.hive-tt - at com.splunk.mr.JobSubmitterInputFormat.createRecordReader(JobSubmitterInputFormat.java:66)
10-08-2015 00:31:55.626 INFO ERP.hive-tt - at com.splunk.mr.SplunkBaseMapper.stream(SplunkBaseMapper.java:323)
10-08-2015 00:31:55.626 INFO ERP.hive-tt - at com.splunk.mr.SplunkMR$SearchHandler.streamData(SplunkMR.java:644)
10-08-2015 00:31:55.626 INFO ERP.hive-tt - at com.splunk.mr.SplunkMR$SearchHandler$1.accept(SplunkMR.java:656)
10-08-2015 00:31:55.626 INFO ERP.hive-tt - at com.splunk.mr.SplunkMR$SearchHandler$1.accept(SplunkMR.java:653)
10-08-2015 00:31:55.626 INFO ERP.hive-tt - at com.splunk.mr.input.HiveSplitGenerator.sendSplitToAcceptor(HiveSplitGenerator.java:80)
10-08-2015 00:31:55.626 INFO ERP.hive-tt - at com.splunk.mr.input.FileSplitGenerator.generateSplits(FileSplitGenerator.java:68)
10-08-2015 00:31:55.626 INFO ERP.hive-tt - at com.splunk.mr.input.VirtualIndex$FileSplitter.accept(VirtualIndex.java:1418)
10-08-2015 00:31:55.626 INFO ERP.hive-tt - at com.splunk.mr.input.VirtualIndex$FileSplitter.accept(VirtualIndex.java:1396)
10-08-2015 00:31:55.626 INFO ERP.hive-tt - at com.splunk.mr.input.VirtualIndex$VIXPathSpecifier.addStatus(VirtualIndex.java:576)
10-08-2015 00:31:55.626 INFO ERP.hive-tt - at com.splunk.mr.input.VirtualIndex$VIXPathSpecifier.listStatus(VirtualIndex.java:609)
10-08-2015 00:31:55.626 INFO ERP.hive-tt - at com.splunk.mr.input.VirtualIndex$Splitter.generateSplits(VirtualIndex.java:1566)
10-08-2015 00:31:55.626 INFO ERP.hive-tt - at com.splunk.mr.input.VirtualIndex.generateSplits(VirtualIndex.java:1485)
10-08-2015 00:31:55.626 INFO ERP.hive-tt - at com.splunk.mr.input.VirtualIndex.generateSplits(VirtualIndex.java:1437)
10-08-2015 00:31:55.626 INFO ERP.hive-tt - at com.splunk.mr.input.VixSplitGenerator.generateSplits(VixSplitGenerator.java:55)
10-08-2015 00:31:55.626 INFO ERP.hive-tt - at com.splunk.mr.SplunkMR$SearchHandler.streamData(SplunkMR.java:674)
10-08-2015 00:31:55.626 INFO ERP.hive-tt - at com.splunk.mr.SplunkMR$SearchHandler.executeImpl(SplunkMR.java:936)
10-08-2015 00:31:55.626 INFO ERP.hive-tt - at com.splunk.mr.SplunkMR$SearchHandler.execute(SplunkMR.java:771)
10-08-2015 00:31:55.626 INFO ERP.hive-tt - at com.splunk.mr.SplunkMR.runImpl(SplunkMR.java:1518)
10-08-2015 00:31:55.626 INFO ERP.hive-tt - at com.splunk.mr.SplunkMR.run(SplunkMR.java:1300)
The failure happened while the ORC Record Reader is trying to read a stripe of data. Increasing the heap size should help.
In the Hunk Provider, what is the value of vix.env.HADOOP_HEAPSIZE?
By default it is 512 and increasing it should fix the OOM error.
The failure happened while the ORC Record Reader is trying to read a stripe of data. Increasing the heap size should help.
In the Hunk Provider, what is the value of vix.env.HADOOP_HEAPSIZE?
By default it is 512 and increasing it should fix the OOM error.
Keep in mind that the value for vix.env.HADOOP_HEAPSIZE is per Hunk search, increasing it too much and if
you run many concurrent searches can potentially exhaust your available physical memory.
Thanks @rdagan for the suggestion.
I have found the problem:
We recently upgraded the grid package, which also updated the file hadoop-env.sh and removed the line I had commented out the export HADOOP_CLIENT_OPTS="-Xmx128m ${HADOOP_CLIENT_OPTS}" and thus I was not getting the HADOOP HEAP SIZE I thought.
I fixed that and now my job runs for a long time.
I am in a situation where I cannot easily comment out the line in hadoop-env.sh (it gets overwritten when I update any packages.)
Is it possible to set these Hadoop variables with an export of HADOOP_CLIENT_OPTS instead of setting them in indexes.conf? Does that work?
Yes, that should work, and I believe it's the preferred way to do this. The line mentioned above puts any user-defined Xmx value after the default 128m value, meaning the user-defined value should win.
Thanks Keith.
So to increase heap size to 2048 would that be export HADOOP_CLIENT_OPTS="-Xmx2048m"
Yes, that should work.
Thanks. Is there a good way to test what the setting is? i.e. to make sure the EXPORT is working?
A few points:
--I just realized we're ignoring the obvious way to handle this: change the vix.env.HADOOP_CLIENT_OPTS setting in the provider. That will overwrite any value inherited from the user environment.
--To test what value is being used, you can edit your hadoop script. This will usually be in /bin/hadoop. You can insert a line that looks like:
echo $HADOOP_CLIENT_OPTS > ~/my_hadoop_client_opts.txt
Then run a search in Hunk, and see what that file contains.
--Depending on your Hadoop distribution, you may need to set $HADOOP_OPTS instead.
Hope that helps.