Null pointer with Hunk and EMR

ravenbyron · ‎04-22-2014

Hello,

I am trying to get Hunk for AWS ELB up and running and every search is failing with

[elb_log_provider] Error while running external process, return_code=255. See search.log for more info
[elb_log_provider] NullPointerException - null

04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - SplunkMR -
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - java.lang.NullPointerException
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at com.splunk.mr.JobSubmitter.waitForCurrentJobToComplete(JobSubmitter.java:300)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at com.splunk.mr.JobSubmitter.waitForCompletion(JobSubmitter.java:117)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at com.splunk.mr.SplunkMR$SearchHandler.executeMapReduce(SplunkMR.java:1198)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at com.splunk.mr.SplunkMR$SearchHandler.executeImpl(SplunkMR.java:1152)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at com.splunk.mr.SplunkMR$SearchHandler.execute(SplunkMR.java:1075)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at com.splunk.mr.SplunkMR.runImpl(SplunkMR.java:1370)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at com.splunk.mr.SplunkMR.run(SplunkMR.java:1212)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at com.splunk.mr.SplunkMR.main(SplunkMR.java:1382)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at java.lang.reflect.Method.invoke(Method.java:606)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

Not sure what I am doing wrong. It seems to die at a different point each time.

Byron

ravenbyron · ‎04-28-2014

I changed the EMR Cluster to
AMI version:2.4.2
Hadoop distribution:Amazon 1.0.3
Applications:Hive 0.11.0.1, Pig 0.11.1.1

And updated the time extract path (it needed a * at the end of it).

Now it just seems really slow. The data isn't very big (a few gb) but across 1000's of files. Any suggestions on making it faster?

ravenbyron · ‎04-28-2014

I changed the EMR Cluster to
AMI version:2.4.2
Hadoop distribution:Amazon 1.0.3
Applications:Hive 0.11.0.1, Pig 0.11.1.1

And updated the time extract path (it needed a * at the end of it).

Now it just seems really slow. The data isn't very big (a few gb) but across 1000's of files. Any suggestions on making it faster?

ravenbyron · ‎04-28-2014

I changed the EMR Cluster to
AMI version:2.4.2
Hadoop distribution:Amazon 1.0.3
Applications:Hive 0.11.0.1, Pig 0.11.1.1

And updated the time extract path (it needed a * at the end of it).

Now it just seems really slow. The data isn't very big (a few gb) but across 1000's of files. Any suggestions on making it faster?

Ledion_Bitincka · ‎04-28-2014

There are some known issues with Hunk and Amazon's Hadoop 2.2.0 that cause Hunk to not be able to spawn MapReduce jobs. If you're using the EMR cluster exclusively for Hunk I would recommend that you try Amazon's 1.0.3 version. Also, from the search.log I noticed that there's a possible misconfiguration for the time extraction from the paths, there's a ton of these warning lines:

04-28-2014 19:29:22.819 WARN  ERP.elb_log_provider -  VixTimeSpecifier - Could not match time regex="/AWSLogs/*/elasticloadbalancing/*/(\d+)/(\d+)/(\d+)/" against path="/logs/AWSLogs/<redacted>"

replacing '*' with '.*?' should fix this issue

Ledion_Bitincka · ‎04-25-2014

Can you please share the full contents of search.log with us? As well as what version of EMR are you trying to use?

Ledion_Bitincka · ‎04-29-2014

There are definitely a few optimization knobs that can be turned from the Hunk side - can you please share search.log with us again so we can see what's taking up the time?

ravenbyron · ‎04-28-2014

I changed the EMR Cluster to
AMI version:2.4.2
Hadoop distribution:Amazon 1.0.3
Applications:Hive 0.11.0.1, Pig 0.11.1.1

And updated the time extract path (it needed a * at the end of it).

Now it just seems really slow. The data isn't very big (a few gb) but across 1000's of files. Any suggestions on making it faster?

ravenbyron · ‎04-28-2014

I would be happy to share the full log but it has my aws keys is there some way I can send it to support w/o posting it publicly.

And its a default EMR cluster
AMI version:3.0.4
Hadoop distribution:Amazon 2.2.0
Applications:Hive 0.11.0.2, Pig 0.11.1.1
Master:Running-1-m3.xlarge
Core:Running-3-m3.xlarge

Null pointer with Hunk and EMR

Data Management Digest – December 2025

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Join the Conversation

Null pointer with Hunk and EMR

Data Management Digest – December 2025

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...