Deployment Architecture

Clarification on indexer and search head

yoshileigh66
Explorer

I am aware of forwarder -> indexer -> search head. However, when reading about streaming commands, Splunk states "A distributable streaming command runs on the indexer or the search head, depending on where in the search the command is invoked."

I am very confused as I read this as saying that there are searches on the indexer, and then there searches on the search head. But my understanding is that the search head is used to search events on the indexer, and that there is no searching the indexer without the search head. 

What is the difference between a search on the indexer and a search on the search head? 

https://docs.splunk.com/Documentation/Splunk/9.1.1/SearchReference/Commandsbytype

Labels (3)
0 Karma

bowesmana
SplunkTrust
SplunkTrust

To give further examples, a distributable streaming command that can run on an indexer can also run on the search head, so take this example

index=_audit
``` This eval runs on the indexer ```
| eval isAdmin=if(user="admin", 1, 0)
``` This lookup runs on the indexer ```
| lookup actions.csv action OUTPUT action_name
``` This stats runs on both indexer and search head, i.e. the indexer
    will generate stats and then pass its set of stats to the search
    head, along with all other stats from other indexers and then 
    the final counters are merged on the search head ```
| stats count by user action_name isAdmin
``` This lookup runs on the search head, as the data now exists on the SH.
    Once the data is on the SH, it will not go back to the indexer. ```
| lookup users.csv user OUTPUT user_name
``` So now this eval runs on the search head ```
| eval do_alert=if(isAdmin, 1, 0)

As you can see it contains some eval, lookup and stats commands.

This search will be sent from the SH to the "search peers", which are the indexers it can use to search against. Each indexer will run this same search on the set of data it owns.

The key point here is that once it hits the stats command, that is the trigger for the indexers to return their dataset to the search head. If you look at the job properties of any search that does a stats command, you will see in the phase0 detail something like the following for a simple "index=_audit | stats count by user"

litsearch index=_audit | addinfo type=count label=prereport_events track_fieldmeta_events=true | fields keepcolorder=t "prestats_reserved_*" "psrsvd_*" "user" | prestats count by user

this is showing that the indexer will return some "prestats", which is its own reduced data set that it will send to the search head.

In the above example, the first lookup will run first on the indexer then the second on the SH. So when it talks about 'invoking' the command, it's really about where the data happens to be in the execution of the entire SPL. 

As you can see, as soon as you use a dataset processing command or a transforming command, the data is shifted from the indexers to the search head, so you immediately lose parallelism, so it is best to put those type of commands as far down the SPL pipeline as possible.

If you look at the command types table, you can see some commands can work differently depending on how it's called, e.g. fillnull is a dataset processing command with no parameters, but distributable streaming when used with a field name, so be aware of these subtle distinctions when considering search performance.

isoutamo
SplunkTrust
SplunkTrust
I sent feedback to doc team and they promise to take this on their backlog and clarify this on docs.
0 Karma

richgalloway
SplunkTrust
SplunkTrust

Splunk uses the map/reduce model for searching data.  The search head is the map/reduce coordinator.  The SH sends the query to each indexer for execution against the data stored on that indexer.  Each indexer sends its results back to the SH where they are combined and presented.

Some SPL commands, however, must be executed against the full result set and so must be performed by the SH.  When such a command is reached, the indexer stops it portion of the search and returns the current results to the SH.  The SH then completes the query. 

For example, the stats command must be performed by the SH.  A distributable streaming command that follows stats will be performed by the SH; otherwise it will be performed by the indexer.

---
If this reply helps you, Karma would be appreciated.

isoutamo
SplunkTrust
SplunkTrust

Hi

some additional comments.

depending on where in the search the command is invoked” I understand this like the position of command in SPL query define is this command run as normally parallel on indexers or is it run on SH. The key event is, are there any transforming commands positioned in SPL query before streaming command. When the search process returns back to SH it’s never go back to parallel mode to indexers. Example of this is e.g.

....
| fields a b
| stats values(a) by b

vs

....
| table a b
| stats values(a) by b

in first example stats is run on indexers and 2nd one in SH.

r. Ismo

Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...