topic Re: Regex search on server side in Splunk Search

Regex search on server side

leecaf — Wed, 05 Jun 2013 20:53:07 GMT

from my understanding | rex ... does the search on client side. is there a way to specify a regex search string on the search head instead to improve performance?

Re: Regex search on server side

Rob — Wed, 05 Jun 2013 21:01:13 GMT

The |rex... command will actually be executed at search time.

This means that using the rex command on the search head should do the parsing on the search head. If you need to do it on the indexer (not recommended) then you can use the props.conf and transforms.conf files to set up regex extractions at index time.

Take a look at the following documentation:

http://docs.splunk.com/Documentation/Splunk/5.0.2/Indexer/Indextimeversussearchtime

http://docs.splunk.com/Documentation/Splunk/5.0.2/Search/Extractfieldswithsearchcommands

Re: Regex search on server side

leecaf — Wed, 05 Jun 2013 21:11:49 GMT

thanks, is there a list of which piped commands are executed on search head ( presumably on if piped right after the first search string ) and which on the client? better yet is there a way to get transparency into which part of my query is being executed where?

Re: Regex search on server side

Ayn — Wed, 05 Jun 2013 22:00:03 GMT

What do you mean by "client"? In my world "client" would refer to your own machine that you're using to access Splunk. Nothing in Splunk's searching is done client-side in that sense.

If you mean "client" = "search peer", i.e. an indexer that the search head issues searches to, then generally you'd want to look at the map/reduce model used by Splunk and which commands are considered to be of "map" type and which ones are considered to be of the "reduce" type. Sadly this is not documented anywhere (that I know of) but you can get pretty far by using common sense. The thing is, when a search head issues a search to its search peers, all parts of the search up until the first command of "reduce" type will run on the search peers. A "reduce" operation is one that requires data from the search peers to be combined in some way and hence cannot be parallellized anymore, so the search peer has to gather all data from its peers and do the rest of the search itself. So say you have something like

search ... | rex ... | lookup ... | stats ... | eval ... | ...

All commands up until stats are "map" type commands, or perhaps it's easier here to say that they're at least not "reduce". So, the first 3 (search, rex and lookup) will run on each search peer before stats causes the search head to gather the data from its peers. eval and the rest of the search will be run on the search head.

This is important to keep track of in situations where you want a search to scale as optimally as possible, like your rex example though I think you really would need a very resource intensive regex to really make a difference where it runs. It is also important when you have things like dynamic lookups that will yield different results based on where they run.

Again, there's no documentation on this but you can guess which commands would force the search head to gather data. All types of commands that aggregate data in one way or another need a complete set of data to work on, and all types of commands that just do some sort of event-by-event mapping or transformation typically do not need this. I hope you get the idea.

...that is, if I'm interpreting what you really mean by your question. I might be answering to a completely different question than what you're asking 🙂

Re: Regex search on server side

yannK — Wed, 05 Jun 2013 23:42:17 GMT

They is no documentation on the location of each search, it depends of the order.

Statistical commands can run on the search-peer and be combined in the search-head. Initial rex filters can apply on the search-peers, but rex applied on result of search results will be applied on the search-head of course

example :
mysearch terms -> on search-peers | rex field=_raw to populate fieldA -> on search-peer | stats count latest(_raw) AS fieldB by fieldA -> apply on search-peer and is consolidated on search-head | rex field=fieldB to populate fieldC -> search-head | stats sum(fieldC ) by fieldA -> search-head

The search inspector can show you how long it took per indexer, and the overall cost per search component. but not the details you are asking.