from my understanding | rex ... does the search on client side. is there a way to specify a regex search string on the search head instead to improve performance?
They is no documentation on the location of each search, it depends of the order.
Statistical commands can run on the search-peer and be combined in the search-head. Initial rex filters can apply on the search-peers, but rex applied on result of search results will be applied on the search-head of course
example :
mysearch terms -> on search-peers
| rex field=_raw to populate fieldA -> on search-peer
| stats count latest(_raw) AS fieldB by fieldA -> apply on search-peer and is consolidated on search-head
| rex field=fieldB to populate fieldC -> search-head
| stats sum(fieldC ) by fieldA -> search-head
The search inspector can show you how long it took per indexer, and the overall cost per search component. but not the details you are asking.
What do you mean by "client"? In my world "client" would refer to your own machine that you're using to access Splunk. Nothing in Splunk's searching is done client-side in that sense.
If you mean "client" = "search peer", i.e. an indexer that the search head issues searches to, then generally you'd want to look at the map/reduce model used by Splunk and which commands are considered to be of "map" type and which ones are considered to be of the "reduce" type. Sadly this is not documented anywhere (that I know of) but you can get pretty far by using common sense. The thing is, when a search head issues a search to its search peers, all parts of the search up until the first command of "reduce" type will run on the search peers. A "reduce" operation is one that requires data from the search peers to be combined in some way and hence cannot be parallellized anymore, so the search peer has to gather all data from its peers and do the rest of the search itself. So say you have something like
search ... | rex ... | lookup ... | stats ... | eval ... | ...
All commands up until stats
are "map" type commands, or perhaps it's easier here to say that they're at least not "reduce". So, the first 3 (search
, rex
and lookup
) will run on each search peer before stats
causes the search head to gather the data from its peers. eval
and the rest of the search will be run on the search head.
This is important to keep track of in situations where you want a search to scale as optimally as possible, like your rex
example though I think you really would need a very resource intensive regex to really make a difference where it runs. It is also important when you have things like dynamic lookups that will yield different results based on where they run.
Again, there's no documentation on this but you can guess which commands would force the search head to gather data. All types of commands that aggregate data in one way or another need a complete set of data to work on, and all types of commands that just do some sort of event-by-event mapping or transformation typically do not need this. I hope you get the idea.
...that is, if I'm interpreting what you really mean by your question. I might be answering to a completely different question than what you're asking 🙂
The |rex... command will actually be executed at search time.
This means that using the rex command on the search head should do the parsing on the search head. If you need to do it on the indexer (not recommended) then you can use the props.conf and transforms.conf files to set up regex extractions at index time.
Take a look at the following documentation:
http://docs.splunk.com/Documentation/Splunk/5.0.2/Indexer/Indextimeversussearchtime
http://docs.splunk.com/Documentation/Splunk/5.0.2/Search/Extractfieldswithsearchcommands
thanks, is there a list of which piped commands are executed on search head ( presumably on if piped right after the first search string ) and which on the client? better yet is there a way to get transparency into which part of my query is being executed where?