Splunk Search

Is there an alternative to using regex in my search for better performance?

Contributor

hello,

After reading some answers, I see that if I use regex for searching events corresponding to a pattern, it will take a lot of time as Splunk reads all events from disk.

For example: I use index=X email="test@*", it will be so much faster than index=X | regex email="test@.*".

So my question is beside the * , can I use another regex term in the default search without using regex that provides the same performance as original search.

For ex:
index=X email="test@[a-z]+.com" ?
index=X email="test@[0-9]*.com" ?

0 Karma

Builder

Hi,

Have you try to extract the fields of the email like @

Then if you make a search using these fields it should be faster like

index=aaaa field1=test field2=google.com

Hope i help you

0 Karma

Contributor

it does not actually respond to my question because if field 1 contains un regular expression that is not "*" wild card, you have to use regex command and ... splunk reads all events for the comparision. I think the temporary solution maybe use the hydrid solution like the answer of @somesoni2 above

0 Karma

Builder

Ok regards 🙂

0 Karma

Path Finder

From quickly scanning through some documentation, it seems that "rex" is actually a "distributed streaming" command which means it can be run on the indexer itself so you don't have to worry about innefficiencies with map-reduce.

However, to better structure your search you can provide all the "known search tokens" to your search and you could do something like this:

index=x test @ | regex email="test@.*"

What this does is it passes the known "search tokens" of "test" and "@" as search tokens to the indexer which allows the indexer to pull out only events with those two tokens anywhere in the event. THEN the "rex" will do the specific pattern match. I dont think doing the rex on it's own will allow the indexer to search for events where ONLY "test" and "@" are present. It will have to search ALL events first.

So the search above reduces inefficiencies because it puts everything that you know you need before the first pipe and then allows the rex to do the pattern matching afterwards.

0 Karma

Contributor

your answer is just right for some specific cases. if i search for "email="test.*hello@.*", the search with the tokens like "test hello @" will return nothing.

0 Karma

Path Finder

Well, that is only true if "test" and "hello" are not individual tokens.

I.E. If I search as follows:

index=X test hello @ | rex email="test.*hello@.*"

This will NOT return any results IF the data you are looking for is something like

"testworldhello@something.com"

This is because you cannot search for "test" or "hello" on their own if they are just a part of a larger token (testworldhello).

The search above WILL return results if the data looks like:

"test.world-hello@something.com"

The main point I am trying to make is that to create better search efficiency you can provide as many actual tokens as you can, up front. Tokens are separated by things like dots, dashes, slashes, etc.

To see how tokens are identified and separated in Splunk you can research segmenters.conf which shows you how Splunk breaks out tokens in any event.

0 Karma

Contributor

i got your point, but that's the reason i asked this question, i want to know if splunk supports more than the asterisk wild card in the base search. Thank you in anyway.

0 Karma

Path Finder

Ah I see. Looks like you got your answer above! Good luck! 🙂

0 Karma

Revered Legend

Regular expressions are not supported in base search (only wild card */ asterisk ). I would suggest to add some filters in the base search using wildcard and then use regex to do to the point filter (hybrid of both type of filter).

0 Karma

Contributor

thank for your reply, only * is supported in base search ( cannot use ?, [0-9], or [a-z] ), is it right ?
I ask this type of question because i did not where the doc of splunk mentions all regular expressions that could be used in base search.

0 Karma

Revered Legend

The base search provides all the options a "| search" command provides (actually they are the same, it's hidden in base search). It basically uses logical expression (not regular expressions). See more info here
http://docs.splunk.com/Documentation/Splunk/6.4.1/SearchReference/Search#Usage

0 Karma

Contributor

Thank you for your information, now i know only * is supported. I hope splunk would support more wild card in the future version 🙂 , for ex: "?" or "|".

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!