Splunk Search

Is there an alternative to using regex in my search for better performance?

sieutruc
Contributor

hello,

After reading some answers, I see that if I use regex for searching events corresponding to a pattern, it will take a lot of time as Splunk reads all events from disk.

For example: I use index=X email="test@*", it will be so much faster than index=X | regex email="test@.*".

So my question is beside the * , can I use another regex term in the default search without using regex that provides the same performance as original search.

For ex:
index=X email="test@[a-z]+.com" ?
index=X email="test@[0-9]*.com" ?

0 Karma

jmallorquin
Builder

Hi,

Have you try to extract the fields of the email like @

Then if you make a search using these fields it should be faster like

index=aaaa field1=test field2=google.com

Hope i help you

0 Karma

sieutruc
Contributor

it does not actually respond to my question because if field 1 contains un regular expression that is not "*" wild card, you have to use regex command and ... splunk reads all events for the comparision. I think the temporary solution maybe use the hydrid solution like the answer of @somesoni2 above

0 Karma

jmallorquin
Builder

Ok regards 🙂

0 Karma

jdunlea
Contributor

From quickly scanning through some documentation, it seems that "rex" is actually a "distributed streaming" command which means it can be run on the indexer itself so you don't have to worry about innefficiencies with map-reduce.

However, to better structure your search you can provide all the "known search tokens" to your search and you could do something like this:

index=x test @ | regex email="test@.*"

What this does is it passes the known "search tokens" of "test" and "@" as search tokens to the indexer which allows the indexer to pull out only events with those two tokens anywhere in the event. THEN the "rex" will do the specific pattern match. I dont think doing the rex on it's own will allow the indexer to search for events where ONLY "test" and "@" are present. It will have to search ALL events first.

So the search above reduces inefficiencies because it puts everything that you know you need before the first pipe and then allows the rex to do the pattern matching afterwards.

0 Karma

sieutruc
Contributor

your answer is just right for some specific cases. if i search for "email="test.*hello@.*", the search with the tokens like "test hello @" will return nothing.

0 Karma

jdunlea
Contributor

Well, that is only true if "test" and "hello" are not individual tokens.

I.E. If I search as follows:

index=X test hello @ | rex email="test.*hello@.*"

This will NOT return any results IF the data you are looking for is something like

"testworldhello@something.com"

This is because you cannot search for "test" or "hello" on their own if they are just a part of a larger token (testworldhello).

The search above WILL return results if the data looks like:

"test.world-hello@something.com"

The main point I am trying to make is that to create better search efficiency you can provide as many actual tokens as you can, up front. Tokens are separated by things like dots, dashes, slashes, etc.

To see how tokens are identified and separated in Splunk you can research segmenters.conf which shows you how Splunk breaks out tokens in any event.

0 Karma

sieutruc
Contributor

i got your point, but that's the reason i asked this question, i want to know if splunk supports more than the asterisk wild card in the base search. Thank you in anyway.

0 Karma

jdunlea
Contributor

Ah I see. Looks like you got your answer above! Good luck! 🙂

0 Karma

somesoni2
Revered Legend

Regular expressions are not supported in base search (only wild card */ asterisk ). I would suggest to add some filters in the base search using wildcard and then use regex to do to the point filter (hybrid of both type of filter).

0 Karma

sieutruc
Contributor

thank for your reply, only * is supported in base search ( cannot use ?, [0-9], or [a-z] ), is it right ?
I ask this type of question because i did not where the doc of splunk mentions all regular expressions that could be used in base search.

0 Karma

somesoni2
Revered Legend

The base search provides all the options a "| search" command provides (actually they are the same, it's hidden in base search). It basically uses logical expression (not regular expressions). See more info here
http://docs.splunk.com/Documentation/Splunk/6.4.1/SearchReference/Search#Usage

0 Karma

sieutruc
Contributor

Thank you for your information, now i know only * is supported. I hope splunk would support more wild card in the future version 🙂 , for ex: "?" or "|".

0 Karma
Get Updates on the Splunk Community!

Accelerate Service Onboarding, Decomposition, Troubleshooting - and more with ITSI’s ...

Accelerate Service Onboarding, Decomposition, Troubleshooting - and more! Faster Time to ValueManaging and ...

New Release | Splunk Enterprise 9.3

Admins and Analyst can benefit from:  Seamlessly route data to your local file system to save on storage ...

2024 Splunk Career Impact Survey | Earn a $20 gift card for participating!

Hear ye, hear ye! The time has come again for Splunk's annual Career Impact Survey!  We need your help by ...