Which search below is better or optimal from a performance perspective and why?
sourcetype="mysoucetype" AND field1="none" AND field2="123" AND field3!="-" | stats count by field3 | eval cnt="Event" | eval iterator="field2"
sourcetype="mysoucetype" | search field1="none" | search field2="123" | search field3!="-" | stats count by field3 | eval cnt="Event" | eval iterator="field2"
I would expect that splunk could optimize the first search more than the second, based on the fact that splunk can search the index for the terms "none", "123" in the first case, and in the second it has to do a post-search (like a "grep" type of command).
Of course, this all depends on your data set. If the field values you are looking for are very common, then both searches may preform similarly. I would still pick search 1. I've seen splunk's search assistant suggest combining search expressions (like in search 2) this to the base search (like in search 1). While this isn't always possible in more complex searches, it certainly is here and so I would recommend it.
I'm not sure I understand the purpose of your
evals at the end. You have quotes around
"field2", which makes the field
iterator contain a constant value rather than the value of
field2. I know this is just a contrived example for purpose of a question, but I don't get what this is trying to accomplish or simulate in a real search.
BTW, I would expect that
sourcetype="mysoucetype" field1="none" field2="123" field3!="-" | ...
to preform pretty close to the same as:
sourcetype="mysoucetype" field1="none" field2="123" | search field3!="-" | ...
Since the search engine can't use indexed terms to filter out for not-equals operations.
In most cases, the performance of your search is mostly governed by the number of events that are returned and match the initial index query, i.e., the part before the first
| in the query. With a few exceptions, the contribution of most post-processing (i.e., after the first
|) commands contributes a very small proportion of the overall execution time. (The exceptions include items that use a subsearch [e.g.,
set] that need to execute another search.
This is because for the most part, the time of a search is dominated by the time it takes to find, read, and retrieve events from the physical disk. Compared to this, most other processing takes very little time. So search optimization should first focus on how to most effectively get just the least amount of required data from initial query (i.e., off the disk) in the least amount of time.
This a generalization. There certainly are ways to cause the post-processing to be significantly faster or slower, and there are also considerations that can make a big difference in a distributed search cluster vs a single-node environment, but this is the first, biggest, and most common optimization to consider. Additionally, there are ways to run searches that cause them to be faster or slower that aren't really an aspect of search optimization per se (e.g., removing fields to speed up the flashtimeline view).