Archive

How do we diagnose a problem within a hunk MR job

Path Finder

I have connected my Hunk instance to my Hadoop cluster (which is HDP 2.6 and Kerberized). I can see files with HDFS. I seem to be able to launch mapreduce jobs through YARN. I think. I can do some searches and get back some records - but not all. And it is really really slow. I have found that although HDFS access seems ok, the MapReduce is failing. Here are some logs.

Is this failing when trying to ungzip the data? Or after it? Any tips?

Error: java.lang.NullPointerException at
com.splunk.mr.SplunkSearchMapper.runImpl(SplunkSearchMapper.java:354) at
com.splunk.mr.SplunkBaseMapper.run(SplunkBaseMapper.java:169) at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:415) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at
org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

Bigger log is :

2015-02-25 17:46:31,714 WARN [main] org.apache.hadoop.metrics2.impl.MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-maptask.properties,hadoop-metrics2.properties
2015-02-25 17:46:31,871 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2015-02-25 17:46:31,871 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started
2015-02-25 17:46:31,906 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens:
2015-02-25 17:46:31,906 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1424874708069_0010, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@34d18542)
2015-02-25 17:46:31,992 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: TIMELINE_DELEGATION_TOKEN, Service: 10.34.37.2:8188, Ident: (owner=splunk, renewer=yarn, realUser=, issueDate=1424886364096, maxDate=1425491164096, sequenceNumber=47, masterKeyId=10)
2015-02-25 17:46:32,002 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:uatcluster, Ident: (HDFS_DELEGATION_TOKEN token 143 for splunk)
2015-02-25 17:46:32,211 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now.
2015-02-25 17:46:32,560 INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir for child: /var/hadoop/yarn/local/usercache/splunk/appcache/application_1424874708069_0010
2015-02-25 17:46:33,286 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
2015-02-25 17:46:34,095 INFO [main] org.apache.hadoop.mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
2015-02-25 17:46:34,474 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: /data/kafka/DAILY/REDACTED/hourly/2014/12/18/00/REDACTED.1.0.544436.3235301.1418889600000.gz:0+63108525
2015-02-25 17:46:34,732 INFO [main] com.splunk.mr.input.VixTimeSpecifier: using timezone=, _tz.id="GB-Eire", name="Greenwich Mean Time" for regex=/data/kafka/DAILY/.*/hourly/(\d+)/(\d+)/(\d+)/(\d+), format=yyyyMMddHH
2015-02-25 17:46:34,732 INFO [main] com.splunk.mr.input.VixTimeSpecifier: using timezone=, _tz.id="GB-Eire", name="Greenwich Mean Time" for regex=/data/kafka/DAILY/.*/hourly/(\d+)/(\d+)/(\d+)/(\d+), format=yyyyMMddHH
2015-02-25 17:46:34,756 WARN [main] com.splunk.mr.SplunkBaseMapper: Could not create preprocessor object, will try the next one ... class=com.splunk.mr.input.SplunkJournalRecordReader, message=File path does not match regex to use this record reader, name=com.splunk.mr.input.SplunkJournalRecordReader, path=hdfs://uatcluster:8020/data/kafka/DAILY/REDACTED/hourly/2014/12/18/00/REDACTED.1.0.544436.3235301.1418889600000.gz, regex=/journal\.gz$.
2015-02-25 17:46:34,768 WARN [main] com.splunk.mr.SplunkBaseMapper: Could not create preprocessor object, will try the next one ... class=com.splunk.mr.input.ValueAvroRecordReader, message=File path does not match regex to use this record reader, name=com.splunk.mr.input.ValueAvroRecordReader, path=hdfs://uatcluster:8020/data/kafka/DAILY/REDACTED/hourly/2014/12/18/00/REDACTED.1.0.544436.3235301.1418889600000.gz, regex=\.avro$.
2015-02-25 17:46:34,777 WARN [main] com.splunk.mr.SplunkBaseMapper: Could not create preprocessor object, will try the next one ... class=com.splunk.mr.input.SimpleCSVRecordReader, message=File path does not match regex to use this record reader, name=com.splunk.mr.input.SimpleCSVRecordReader, path=hdfs://uatcluster:8020/data/kafka/DAILY/REDACTEDN/hourly/2014/12/18/00/REDACTED.1.0.544436.3235301.1418889600000.gz, regex=\.([tc]sv)(?:\.(?:gz|bz2|snappy))?$.
2015-02-25 17:46:34,783 WARN [main] com.splunk.mr.SplunkBaseMapper: Could not create preprocessor object, will try the next one ... class=com.splunk.mr.input.SequenceFileRecordReader, message=File path does not match regex to use this record reader, name=com.splunk.mr.input.SequenceFileRecordReader, path=hdfs://uatcluster:8020/data/kafka/DAILY/REDACTED/hourly/2014/12/18/00/REDACTED.1.0.544436.3235301.1418889600000.gz, regex=\.seq$.
2015-02-25 17:46:34,783 INFO [main] com.splunk.mr.JobSubmitterInputFormat: using class=com.splunk.mr.input.SplunkLineRecordReader to process split=/data/kafka/DAILY/REDACTED/hourly/2014/12/18/00/REDACTED.1.0.544436.3235301.1418889600000.gz:0+63108525
2015-02-25 17:46:34,810 INFO [main] org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
2015-02-25 17:46:34,812 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.gz]
2015-02-25 17:46:34,887 INFO [main] com.splunk.mr.SplunkSearchMapper: CONF_DN_HOME is set to /tmp/splunk/vruatsph001.domain.local/
2015-02-25 17:46:34,908 INFO [main] com.splunk.mr.SplunkSearchMapper: Ensuring Hunk is setup correctly took elapsed_ms=21
2015-02-25 17:46:35,122 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.NullPointerException
    at com.splunk.mr.SplunkSearchMapper.runImpl(SplunkSearchMapper.java:354)
    at com.splunk.mr.SplunkBaseMapper.run(SplunkBaseMapper.java:169)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

2015-02-25 17:46:35,124 INFO [main] org.apache.hadoop.mapred.Task: Running cleanup for the task
Tags (3)
1 Solution

Splunk Employee
Splunk Employee

That's an unfortunate place to get an error. The real exception is not logged, and the exception you're seeing is replacing the original exception that was thrown.

From what I can see in the source code, possible real reasons of what's going on is either:
1. You cannot write the MR results file on HDFS. (Unlikely, but could be a permissions issue).
2. The Hunk package name you're using is wrong. (Unlikely, unless you've changed the Hunk directory name or supplied your own Hunk package via "vix.splunk.setup.package")
3. Your mapper node (tasktracker) is not allowed to write a file to a child of "vix.splunk.home.datanode", which defaults to "/tmp/splunk/" (More likely?)
4. Your mapper is not allowed to execute Hunk under "vix.splunk.home.datanode". (again under /tmp/splunk/...)

My recommendation is to make sure you have permissions to your "/tmp/splunk" directory on all mapper nodes. Or change the "vix.splunk.home.datanode" setting in your provider to a path where your mappers/tasktrackers can read, write, execute stuff. Note that this path is on the local filesystem, not a path on HDFS.

Let me know if it doesn't help.
- Petter

PS.I'll make sure that the real exception is logged in the next version of Hunk.

View solution in original post

Splunk Employee
Splunk Employee

You can also check the hadoop MR job task log from its web console and see if this log statement exists:
"_splunkProcess.getStdin()="

If not, the remote splunkd is not started, likely due to the reason 3 listed by Petter.

0 Karma

Path Finder

There is a remote splunkd? That is the first I have heard of it. 😞
I assumed the hunk software (aka splunk) was started by the hunk request.

0 Karma

Splunk Employee
Splunk Employee

Your assumption is right - what Allan meat was the splunk process started by the Hunk request (the binary happens to be called splunkd - it is NOT ran as a daemon)

0 Karma

Path Finder

Oh - thanks! so the software is effectively a demon on the search head - but not on the hadoop cluster. That makes sense.

0 Karma

Splunk Employee
Splunk Employee

Sorry, I mean remote splunk process, not demon. It is started and stopped by Hunk search request.

0 Karma

Splunk Employee
Splunk Employee

What is your Hunk version? If you turn on debug logging (vix.splunk.search.debug=1), you can get remote search.log under /hunk-working-dir-in-hdfs/your-search-sid/dispatch_dirs and check if there is any error.

0 Karma

Path Finder

I am using the latest Hunk available AFAIK. Thanks for the debug tip - I have switched it on.

0 Karma

Splunk Employee
Splunk Employee

That's an unfortunate place to get an error. The real exception is not logged, and the exception you're seeing is replacing the original exception that was thrown.

From what I can see in the source code, possible real reasons of what's going on is either:
1. You cannot write the MR results file on HDFS. (Unlikely, but could be a permissions issue).
2. The Hunk package name you're using is wrong. (Unlikely, unless you've changed the Hunk directory name or supplied your own Hunk package via "vix.splunk.setup.package")
3. Your mapper node (tasktracker) is not allowed to write a file to a child of "vix.splunk.home.datanode", which defaults to "/tmp/splunk/" (More likely?)
4. Your mapper is not allowed to execute Hunk under "vix.splunk.home.datanode". (again under /tmp/splunk/...)

My recommendation is to make sure you have permissions to your "/tmp/splunk" directory on all mapper nodes. Or change the "vix.splunk.home.datanode" setting in your provider to a path where your mappers/tasktrackers can read, write, execute stuff. Note that this path is on the local filesystem, not a path on HDFS.

Let me know if it doesn't help.
- Petter

PS.I'll make sure that the real exception is logged in the next version of Hunk.

View solution in original post

Path Finder

A big thank you Petter. I am thinking that the problem is number 4

I can see the Hunk software being unpacked under /tmp on all the Hadoop compute nodes, but /tmp is mounted with noexec - so it probably cannot execute it there. (I think that is a fairly common security policy)

I have tried to change vix.splunk.home.datanode but it does not seem to have worked for some reasons I cannot determine. But that is a different problem.

0 Karma

Splunk Employee
Splunk Employee

What problems do you run into when you've changed vix.splunk.home.datanode to a directory where you can read, write, execute? I'd be glad to help out some more!

Are there any files in your vix.splunk.home.hdfs/dispatch now? (as Ledion has suggested in this thread).
What does your logs say?

0 Karma

Path Finder

Well, after trying to get it to use somewhere other than /tmp/splunk I have now broken the config so it does not appear to be launching Yarn MapReduce jobs at all. I can't see any created, whether succeeding or failing 😞

0 Karma

Splunk Employee
Splunk Employee

Can you look for any err-m-..tmp files left over in the HDFS dispatch directory (in working dir HDFS)? They might contain info about any issues we might be running when starting up a search process on the mappers

0 Karma

Path Finder

There does not appear to be any dispatch directories on HDFS. I am guessing it is failing before that. Thanks

0 Karma

Path Finder

Sometime yesterday it managed to create a bunch of directories and some files in hdfs:/user/splunk including hdfs://user/splunk/dispatch but that dir is empty 😞

0 Karma