To give further examples, a distributable streaming command that can run on an indexer can also run on the search head, so take this example index=_audit
``` This eval runs on the indexer ```
| eval...
See more...
To give further examples, a distributable streaming command that can run on an indexer can also run on the search head, so take this example index=_audit
``` This eval runs on the indexer ```
| eval isAdmin=if(user="admin", 1, 0)
``` This lookup runs on the indexer ```
| lookup actions.csv action OUTPUT action_name
``` This stats runs on both indexer and search head, i.e. the indexer
will generate stats and then pass its set of stats to the search
head, along with all other stats from other indexers and then
the final counters are merged on the search head ```
| stats count by user action_name isAdmin
``` This lookup runs on the search head, as the data now exists on the SH.
Once the data is on the SH, it will not go back to the indexer. ```
| lookup users.csv user OUTPUT user_name
``` So now this eval runs on the search head ```
| eval do_alert=if(isAdmin, 1, 0) As you can see it contains some eval, lookup and stats commands. This search will be sent from the SH to the "search peers", which are the indexers it can use to search against. Each indexer will run this same search on the set of data it owns. The key point here is that once it hits the stats command, that is the trigger for the indexers to return their dataset to the search head. If you look at the job properties of any search that does a stats command, you will see in the phase0 detail something like the following for a simple "index=_audit | stats count by user" litsearch index=_audit | addinfo type=count label=prereport_events track_fieldmeta_events=true | fields keepcolorder=t "prestats_reserved_*" "psrsvd_*" "user" | prestats count by user this is showing that the indexer will return some "prestats", which is its own reduced data set that it will send to the search head. In the above example, the first lookup will run first on the indexer then the second on the SH. So when it talks about 'invoking' the command, it's really about where the data happens to be in the execution of the entire SPL. As you can see, as soon as you use a dataset processing command or a transforming command, the data is shifted from the indexers to the search head, so you immediately lose parallelism, so it is best to put those type of commands as far down the SPL pipeline as possible. If you look at the command types table, you can see some commands can work differently depending on how it's called, e.g. fillnull is a dataset processing command with no parameters, but distributable streaming when used with a field name, so be aware of these subtle distinctions when considering search performance.