Are there any docs or other helpful resources to help explain the Execution costs chart on the Job Inspector page?
Here is an example search that I'm trying to optimize:
search source=WMI:* | stats count as events, avg(linecount) as lpe, dc(Name) as names, by sourcetype, host | rename sourcetype as st, host as h | eval lpe=round(lpe,1)
Execution costs:
Duration Component Invocations Inputs Outputs
0.001 command.eval 1 60 60
0.018 command.fields 16 81,988 81,988
2.706 command.prestats 17 81,988 755
0.001 command.rename 1 60 60
124.468 command.search 16 - 81,988
92.625 command.search.kv 18
22.515 command.search.typer 16 81,988 81,988
4.790 command.search.rawdata 18
1.329 command.search.tags 16 81,988 81,988
0.355 command.search.fieldalias 18 81,988 81,988
0.346 command.search.lookups 18 81,988 81,988
0.157 command.search.filter 18
0.100 command.search.index 23
0.032 command.stats 1 200,360 60
124.499 dispatch.fetch 17
0.457 dispatch.preview 16
0.059 dispatch.reduce 1
13.887 dispatch.timeline 17
I can guess and most of this stuff, but there are a number of components here that I'm not sure about. Here are some of my questions:
command.search.index
include? Is that raw index lookup time?command.search.rawdata
?dispatch.fetch
seems really close to the overall command.search
time, so is that time inherited? (Since the search took around 2 minutes not 4 minutes, it would seem that theres some overlap there.)command.search.kv
include all search-time field extractions?command.search.filter
what takes care of removing events that match indexed terms, but have to be removed because the raw event doesn't match. (For example, A search for the phrase "error in file"
would search for the terms "error", "in" and "file", but not all events that contains those three terms would be in the correct order in the raw event and therefore don't match the search expression.... is that what this component is?)BTW. I know there are a few things I can do to optimize this search, such as using the fields
command to explicitly list what fields should be extracted, and disable the typer and lookups components by using dispatch.* settings in savedsearches.conf, but I'm trying to keep this question about understanding the execution cost info, and less about this exact search.
fyi, there are now docs about the job inspector: http://docs.splunk.com/Documentation/Splunk/latest/Knowledge/ViewsearchjobpropertieswiththeJobInspec...
fyi, there are now docs about the job inspector: http://docs.splunk.com/Documentation/Splunk/latest/Knowledge/ViewsearchjobpropertieswiththeJobInspec...
I did a little testing with command.search.typer.
The bit of knowledge i gained was that it didn't only seem to be related to event types. I also saw a performance improvement when i limited the scope of tags.conf entries. It could have been a fluke, though.
Most of my testing was with windows security logs. As a side note, an awesome reduction in command.search.typer time by limiting the sharing scope of eventtypes.conf and tags.conf that come in the windows apps.
My hunch is that the windows app (and some others) share these types globally for use by other higher level summary based apps. For me, the event types aren't super important. Field extraction seems to happen much more quickly and is more useful for me.
I edited the metadata/local.meta to set export=none for those parts of the windows app.
Yes, I've disabled the export for both the windows and unix apps. The quality of these apps leaves a lot to be desired, IMHO. Some of the eventtypes in the unix app aren't even valid, and some of them are simply mirror sourcetypes, and I'm not sure what value that adds. BTW, adding | fields - eventtype
should disable the typer
or reduce it's time to nearly 0. (I found that this doesn't work if you have field extractions based on eventtype, which isn't recommended anymore and has caused me a bunch of pain; so I suggested staying clear.)
command.search.index
is time spent identifying (from tokens in the base search) what events must be retrieved.
command.search.rawdata
is what is spent getting the raw text of the identified events.
dispatch.fetch
is time spent by the search head, while command.search
includes time spent by all indexers (and so can be more than the actual elapsed time of the search). If you have only a single node, then they will be similar, but depending on the specific search in a distributed environment, they can be very different.
command.search.filter
I believe also removes events where field values do not match what is requested in the base query. (i.e., extracted fields where the field value fails to match the event) in addition to misordered search tokens.