Archive

Hunk Reduce Jobs Limit

Path Finder

We have sent a very large data set of Windows Security logs directly into Hadoop. We are using hunk to view the data. A search of 60 minutes on the virtual index is a little slow. The search is using "Smart Mode", since a change to "Verbose Mode" would utilize streaming to retrieve the data.

The search is a simple index=vi-winsec, and I am using the time range picker set at 60 minutes. The search log for the job shows that 1 reducer is in use.

6-19-2015 15:49:11.379 INFO ERP.vsa-prod - VixTimeSpecifier - using timezone=null, _tz.id="GMT", name="Greenwich Mean Time" for regex=.?/windows/staging/(\d+)-(\d+)-(\d+)/(\d+)/., format=yyyyMMddHHmm
06-19-2015 15:49:11.408 INFO ERP.vsa-prod - SplunkMR$SearchHandler - Reduce search: null
06-19-2015 15:49:11.408 INFO ERP.vsa-prod - SplunkMR$SearchHandler - Search mode: stream
06-19-2015 15:49:11.409 INFO ERP.vsa-prod - SplunkMR$SearchHandler - setting requiredFields=*
06-19-2015 15:49:12.106 INFO ERP.vsa-prod - SplunkMR$SearchHandler - Created filesystem object, elapsed_ms=697
06-19-2015 15:49:12.537 INFO ERP.vsa-prod - ClusterInfoLogger - Hadoop cluster spec: provider=vsa-prod, tasktrackers=39, map_inuse=1, map_slots=390, reduce_inuse=1, reduce_slots=78

How can the number of reducers be increased? The Hunk host is RHEL 6.5.

Tags (3)
0 Karma

Splunk Employee
Splunk Employee

Good, that is more like the expected behavior.

To improve performance, you may want to cache the results or see some of the options here: http://blogs.splunk.com/2015/05/05/caching-hadoop-data-with-splunk-and-hunk/

0 Karma

Splunk Employee
Splunk Employee

1) Normally to make sure you run MR Jobs you need index=vi-winsec | and be in Smart Mode. Therefore, in your case, I believe you are not running any MR Jobs. From a performance point of view that is a big difference.
2) Hunk does not use Hadoop Reduce phase. It does the Reduce in the client node (i.e. Hunk / Hadoop client node). Hunk only uses Hadoop Map Phase. So if you go to your Hadoop monitoring (for example, Yarn Resource Manager UI:8088/cluster) you will see exactly what is being used for any Hunk jobs.
3) You can identify Hunk Jobs by looking for Job that starts with SPLK

0 Karma

Path Finder

I see a lot faster response when using index=vi-winsec minutesago=60 | stats count by EventCode. Will I ever see more than one reduce_inuse=1 in the search log?

index=vi-winsec minutesago=60 | stats count by EventCode (in smart mode = 88.26 seconds)
index=vi-winsec minutesago=60 | stats count by EventCode (in verbose mode = I stopped the search after 14 minutes)

I am trying to find ways to improve performance where a user doesn't leverage some of the SPL reporting commands.

0 Karma

Splunk Employee
Splunk Employee

I would recommend:
a) leave the search mode to "Smart" unless you're troubleshooting something
b) use the timepicker (on the right hand side of the search bar) instead of specifying the time range in the search string
c) try to be as specific as possible in the search and narrow down the time as much as possible

0 Karma

Splunk Employee
Splunk Employee

Oops it looks like some of my message was cut off:
to run MR Jobs with Hunk you need index=vi-winsec | Splunk reporting command (for example stats count) + be in smart mode

0 Karma