All Apps and Add-ons

Null pointer with Hunk and EMR

ravenbyron
Engager

Hello,

I am trying to get Hunk for AWS ELB up and running and every search is failing with

[elb_log_provider] Error while running external process, return_code=255. See search.log for more info
[elb_log_provider] NullPointerException - null

04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - SplunkMR -
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - java.lang.NullPointerException
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at com.splunk.mr.JobSubmitter.waitForCurrentJobToComplete(JobSubmitter.java:300)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at com.splunk.mr.JobSubmitter.waitForCompletion(JobSubmitter.java:117)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at com.splunk.mr.SplunkMR$SearchHandler.executeMapReduce(SplunkMR.java:1198)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at com.splunk.mr.SplunkMR$SearchHandler.executeImpl(SplunkMR.java:1152)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at com.splunk.mr.SplunkMR$SearchHandler.execute(SplunkMR.java:1075)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at com.splunk.mr.SplunkMR.runImpl(SplunkMR.java:1370)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at com.splunk.mr.SplunkMR.run(SplunkMR.java:1212)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at com.splunk.mr.SplunkMR.main(SplunkMR.java:1382)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at java.lang.reflect.Method.invoke(Method.java:606)
04-22-2014 14:48:04.885 ERROR ERP.elb_log_provider - at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

Not sure what I am doing wrong. It seems to die at a different point each time.

Byron

Tags (3)
0 Karma

ravenbyron
Engager

I changed the EMR Cluster to
AMI version:2.4.2
Hadoop distribution:Amazon 1.0.3
Applications:Hive 0.11.0.1, Pig 0.11.1.1

And updated the time extract path (it needed a * at the end of it).

Now it just seems really slow. The data isn't very big (a few gb) but across 1000's of files. Any suggestions on making it faster?

0 Karma

ravenbyron
Engager

I changed the EMR Cluster to
AMI version:2.4.2
Hadoop distribution:Amazon 1.0.3
Applications:Hive 0.11.0.1, Pig 0.11.1.1

And updated the time extract path (it needed a * at the end of it).

Now it just seems really slow. The data isn't very big (a few gb) but across 1000's of files. Any suggestions on making it faster?

0 Karma

ravenbyron
Engager

I changed the EMR Cluster to
AMI version:2.4.2
Hadoop distribution:Amazon 1.0.3
Applications:Hive 0.11.0.1, Pig 0.11.1.1

And updated the time extract path (it needed a * at the end of it).

Now it just seems really slow. The data isn't very big (a few gb) but across 1000's of files. Any suggestions on making it faster?

0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

There are some known issues with Hunk and Amazon's Hadoop 2.2.0 that cause Hunk to not be able to spawn MapReduce jobs. If you're using the EMR cluster exclusively for Hunk I would recommend that you try Amazon's 1.0.3 version. Also, from the search.log I noticed that there's a possible misconfiguration for the time extraction from the paths, there's a ton of these warning lines:

04-28-2014 19:29:22.819 WARN  ERP.elb_log_provider -  VixTimeSpecifier - Could not match time regex="/AWSLogs/*/elasticloadbalancing/*/(\d+)/(\d+)/(\d+)/" against path="/logs/AWSLogs/<redacted>"

replacing '*' with '.*?' should fix this issue

Ledion_Bitincka
Splunk Employee
Splunk Employee

Can you please share the full contents of search.log with us? As well as what version of EMR are you trying to use?

0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

There are definitely a few optimization knobs that can be turned from the Hunk side - can you please share search.log with us again so we can see what's taking up the time?

0 Karma

ravenbyron
Engager

I changed the EMR Cluster to
AMI version:2.4.2
Hadoop distribution:Amazon 1.0.3
Applications:Hive 0.11.0.1, Pig 0.11.1.1

And updated the time extract path (it needed a * at the end of it).

Now it just seems really slow. The data isn't very big (a few gb) but across 1000's of files. Any suggestions on making it faster?

0 Karma

ravenbyron
Engager

I would be happy to share the full log but it has my aws keys is there some way I can send it to support w/o posting it publicly.

And its a default EMR cluster
AMI version:3.0.4
Hadoop distribution:Amazon 2.2.0
Applications:Hive 0.11.0.2, Pig 0.11.1.1
Master:Running-1-m3.xlarge
Core:Running-3-m3.xlarge

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas     Cisco Live 2026 is almost here, and this ...

What Is the Name of the USB Key Inserted by Bob Smith? (BOTS Hint, Not the Answer)

Hello Splunkers,   So you searched, “what is the name of the usb key inserted by bob smith?”  Not gonna lie… ...

Automating Threat Operations and Threat Hunting with Recorded Future

    Automating Threat Operations and Threat Hunting with Recorded Future June 29, 2026 | Register   Is your ...