All Apps and Add-ons

Null pointer with Hunk and EMR

ravenbyron
Engager

Hello,

I am trying to get Hunk for AWS ELB up and running and every search is failing with

[elb_log_provider] Error while running external process, return_code=255. See search.log for more info
[elb_log_provider] NullPointerException - null

04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - SplunkMR -
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - java.lang.NullPointerException
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at com.splunk.mr.JobSubmitter.waitForCurrentJobToComplete(JobSubmitter.java:300)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at com.splunk.mr.JobSubmitter.waitForCompletion(JobSubmitter.java:117)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at com.splunk.mr.SplunkMR$SearchHandler.executeMapReduce(SplunkMR.java:1198)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at com.splunk.mr.SplunkMR$SearchHandler.executeImpl(SplunkMR.java:1152)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at com.splunk.mr.SplunkMR$SearchHandler.execute(SplunkMR.java:1075)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at com.splunk.mr.SplunkMR.runImpl(SplunkMR.java:1370)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at com.splunk.mr.SplunkMR.run(SplunkMR.java:1212)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at com.splunk.mr.SplunkMR.main(SplunkMR.java:1382)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at java.lang.reflect.Method.invoke(Method.java:606)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

Not sure what I am doing wrong. It seems to die at a different point each time.

Byron

Tags (3)
0 Karma

ravenbyron
Engager

I changed the EMR Cluster to
AMI version:2.4.2
Hadoop distribution:Amazon 1.0.3
Applications:Hive 0.11.0.1, Pig 0.11.1.1

And updated the time extract path (it needed a * at the end of it).

Now it just seems really slow. The data isn't very big (a few gb) but across 1000's of files. Any suggestions on making it faster?

0 Karma

ravenbyron
Engager

I changed the EMR Cluster to
AMI version:2.4.2
Hadoop distribution:Amazon 1.0.3
Applications:Hive 0.11.0.1, Pig 0.11.1.1

And updated the time extract path (it needed a * at the end of it).

Now it just seems really slow. The data isn't very big (a few gb) but across 1000's of files. Any suggestions on making it faster?

0 Karma

ravenbyron
Engager

I changed the EMR Cluster to
AMI version:2.4.2
Hadoop distribution:Amazon 1.0.3
Applications:Hive 0.11.0.1, Pig 0.11.1.1

And updated the time extract path (it needed a * at the end of it).

Now it just seems really slow. The data isn't very big (a few gb) but across 1000's of files. Any suggestions on making it faster?

0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

There are some known issues with Hunk and Amazon's Hadoop 2.2.0 that cause Hunk to not be able to spawn MapReduce jobs. If you're using the EMR cluster exclusively for Hunk I would recommend that you try Amazon's 1.0.3 version. Also, from the search.log I noticed that there's a possible misconfiguration for the time extraction from the paths, there's a ton of these warning lines:

04-28-2014 19:29:22.819 WARN  ERP.elb_log_provider -  VixTimeSpecifier - Could not match time regex="/AWSLogs/*/elasticloadbalancing/*/(\d+)/(\d+)/(\d+)/" against path="/logs/AWSLogs/<redacted>"

replacing '*' with '.*?' should fix this issue

Ledion_Bitincka
Splunk Employee
Splunk Employee

Can you please share the full contents of search.log with us? As well as what version of EMR are you trying to use?

0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

There are definitely a few optimization knobs that can be turned from the Hunk side - can you please share search.log with us again so we can see what's taking up the time?

0 Karma

ravenbyron
Engager

I changed the EMR Cluster to
AMI version:2.4.2
Hadoop distribution:Amazon 1.0.3
Applications:Hive 0.11.0.1, Pig 0.11.1.1

And updated the time extract path (it needed a * at the end of it).

Now it just seems really slow. The data isn't very big (a few gb) but across 1000's of files. Any suggestions on making it faster?

0 Karma

ravenbyron
Engager

I would be happy to share the full log but it has my aws keys is there some way I can send it to support w/o posting it publicly.

And its a default EMR cluster
AMI version:3.0.4
Hadoop distribution:Amazon 2.2.0
Applications:Hive 0.11.0.2, Pig 0.11.1.1
Master:Running-1-m3.xlarge
Core:Running-3-m3.xlarge

0 Karma
Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.