Re: Null pointer with Hunk and EMR

ravenbyron · ‎04-22-2014

Hello,

I am trying to get Hunk for AWS ELB up and running and every search is failing with

[elb_log_provider] Error while running external process, return_code=255. See search.log for more info
[elb_log_provider] NullPointerException - null

04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - SplunkMR -
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - java.lang.NullPointerException
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at com.splunk.mr.JobSubmitter.waitForCurrentJobToComplete(JobSubmitter.java:300)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at com.splunk.mr.JobSubmitter.waitForCompletion(JobSubmitter.java:117)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at com.splunk.mr.SplunkMR$SearchHandler.executeMapReduce(SplunkMR.java:1198)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at com.splunk.mr.SplunkMR$SearchHandler.executeImpl(SplunkMR.java:1152)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at com.splunk.mr.SplunkMR$SearchHandler.execute(SplunkMR.java:1075)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at com.splunk.mr.SplunkMR.runImpl(SplunkMR.java:1370)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at com.splunk.mr.SplunkMR.run(SplunkMR.java:1212)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at com.splunk.mr.SplunkMR.main(SplunkMR.java:1382)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at java.lang.reflect.Method.invoke(Method.java:606)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

Not sure what I am doing wrong. It seems to die at a different point each time.

Byron

ravenbyron · ‎04-28-2014

I changed the EMR Cluster to
AMI version:2.4.2
Hadoop distribution:Amazon 1.0.3
Applications:Hive 0.11.0.1, Pig 0.11.1.1

And updated the time extract path (it needed a * at the end of it).

Now it just seems really slow. The data isn't very big (a few gb) but across 1000's of files. Any suggestions on making it faster?

ravenbyron · ‎04-28-2014

I changed the EMR Cluster to
AMI version:2.4.2
Hadoop distribution:Amazon 1.0.3
Applications:Hive 0.11.0.1, Pig 0.11.1.1

And updated the time extract path (it needed a * at the end of it).

Now it just seems really slow. The data isn't very big (a few gb) but across 1000's of files. Any suggestions on making it faster?

ravenbyron · ‎04-28-2014

I changed the EMR Cluster to
AMI version:2.4.2
Hadoop distribution:Amazon 1.0.3
Applications:Hive 0.11.0.1, Pig 0.11.1.1

And updated the time extract path (it needed a * at the end of it).

Now it just seems really slow. The data isn't very big (a few gb) but across 1000's of files. Any suggestions on making it faster?

Ledion_Bitincka · ‎04-28-2014

There are some known issues with Hunk and Amazon's Hadoop 2.2.0 that cause Hunk to not be able to spawn MapReduce jobs. If you're using the EMR cluster exclusively for Hunk I would recommend that you try Amazon's 1.0.3 version. Also, from the search.log I noticed that there's a possible misconfiguration for the time extraction from the paths, there's a ton of these warning lines:

04-28-2014 19:29:22.819 WARN  ERP.elb_log_provider -  VixTimeSpecifier - Could not match time regex="/AWSLogs/*/elasticloadbalancing/*/(\d+)/(\d+)/(\d+)/" against path="/logs/AWSLogs/<redacted>"

replacing '*' with '.*?' should fix this issue

Ledion_Bitincka · ‎04-25-2014

Can you please share the full contents of search.log with us? As well as what version of EMR are you trying to use?

Ledion_Bitincka · ‎04-29-2014

There are definitely a few optimization knobs that can be turned from the Hunk side - can you please share search.log with us again so we can see what's taking up the time?

ravenbyron · ‎04-28-2014

I changed the EMR Cluster to
AMI version:2.4.2
Hadoop distribution:Amazon 1.0.3
Applications:Hive 0.11.0.1, Pig 0.11.1.1

And updated the time extract path (it needed a * at the end of it).

Now it just seems really slow. The data isn't very big (a few gb) but across 1000's of files. Any suggestions on making it faster?

ravenbyron · ‎04-28-2014

I would be happy to share the full log but it has my aws keys is there some way I can send it to support w/o posting it publicly.

And its a default EMR cluster
AMI version:3.0.4
Hadoop distribution:Amazon 2.2.0
Applications:Hive 0.11.0.2, Pig 0.11.1.1
Master:Running-1-m3.xlarge
Core:Running-3-m3.xlarge

Null pointer with Hunk and EMR

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas

What Is the Name of the USB Key Inserted by Bob Smith? (BOTS Hint, Not the Answer)

Automating Threat Operations and Threat Hunting with Recorded Future

Join the Conversation

Null pointer with Hunk and EMR

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas

What Is the Name of the USB Key Inserted by Bob Smith? (BOTS Hint, Not the Answer)

Automating Threat Operations and Threat Hunting with Recorded Future