After reading some answers, I see that if I use regex for searching events corresponding to a pattern, it will take a lot of time as Splunk reads all events from disk.
For example: I use
index=X email="test@*", it will be so much faster than
index=X | regex email="test@.*".
So my question is beside the
* , can I use another regex term in the default search without using regex that provides the same performance as original search.
index=X email="test@[a-z]+.com" ?
index=X email="test@[0-9]*.com" ?
it does not actually respond to my question because if field 1 contains un regular expression that is not "*" wild card, you have to use regex command and ... splunk reads all events for the comparision. I think the temporary solution maybe use the hydrid solution like the answer of @somesoni2 above
From quickly scanning through some documentation, it seems that "rex" is actually a "distributed streaming" command which means it can be run on the indexer itself so you don't have to worry about innefficiencies with map-reduce.
However, to better structure your search you can provide all the "known search tokens" to your search and you could do something like this:
index=x test @ | regex email="test@.*"
What this does is it passes the known "search tokens" of "test" and "@" as search tokens to the indexer which allows the indexer to pull out only events with those two tokens anywhere in the event. THEN the "rex" will do the specific pattern match. I dont think doing the rex on it's own will allow the indexer to search for events where ONLY "test" and "@" are present. It will have to search ALL events first.
So the search above reduces inefficiencies because it puts everything that you know you need before the first pipe and then allows the rex to do the pattern matching afterwards.
Well, that is only true if "test" and "hello" are not individual tokens.
I.E. If I search as follows:
index=X test hello @ | rex email="test.*hello@.*"
This will NOT return any results IF the data you are looking for is something like
This is because you cannot search for "test" or "hello" on their own if they are just a part of a larger token (testworldhello).
The search above WILL return results if the data looks like:
The main point I am trying to make is that to create better search efficiency you can provide as many actual tokens as you can, up front. Tokens are separated by things like dots, dashes, slashes, etc.
To see how tokens are identified and separated in Splunk you can research segmenters.conf which shows you how Splunk breaks out tokens in any event.
Regular expressions are not supported in base search (only wild card */ asterisk ). I would suggest to add some filters in the base search using wildcard and then use regex to do to the point filter (hybrid of both type of filter).
thank for your reply, only * is supported in base search ( cannot use ?, [0-9], or [a-z] ), is it right ?
I ask this type of question because i did not where the doc of splunk mentions all regular expressions that could be used in base search.
The base search provides all the options a "| search" command provides (actually they are the same, it's hidden in base search). It basically uses logical expression (not regular expressions). See more info here