Splunk Search

What is an efficient way to exclude multiple string criteria from a field in search?

JaoelNameiol
Explorer

Need to exclude field results based on multiple string-matching cirteria (OR):

-Not equals to any one of several names
-Not ends with "$"
-Only has A-Z, a-z, "-", ".", "_"
-Not contains any one of several names

Here's my inefficient solution. AdminAccount is the field to query.

| where not (AdminAccount = "Joe" or AdminAccount = "Mike" or AdminAccount = "David" or AdminAccount = "Max" or AdminAccount = "Abe" or AdminAccount = "Peter")
| regex AdminAccount != "\$$"
| where NOT match(AdminAccount,"\d+$")
| where NOT match(AdminAccount,"sql|ssoadmin|local service|internal|snapshots|sharepoint")

Any way to do this better? bonus points if you explain why.

0 Karma

worshamn
Contributor

Techinically the whole thing could be one big regex for a single filter like so:

| regex AdminAccount != "^Joe$|^Mike$|^David$|^Max$|^Abe$|^Peter$|\$$|\d+$|sql|sso|admin|local service|internal|snapshots|sharepoint"

But if readability counts, then maybe switch the first where statement to a search (because the IN operator is handy though where has something similar) and combine the regex expressions

| search AdminAccount IN (Joe Mike David Max Peter)
| regex AdminAccount != "\$$|\d+$|sql|sso|admin|local service|internal|snapshots|sharepoint"

JaoelNameiol
Explorer

Is one regex faster/more efficient than multiple regex'es? assuming readability doesn't matter

0 Karma

worshamn
Contributor

Well I'm not certain how regex is handled "under the hood" so to speak. I think nickhillscpl depiction of using job inspector is a good idea to test it, but logically a single operation has got to be more efficient then multiple (unless Splunk is combining them) and likely you are passing the load to the regex engine/module/whatever all at once.

0 Karma

nickhills
Ultra Champion

Where you have a long list of things to exclude, you may consider using a lookup.

Create a CSV with something like:

AdminAccount,exclude
Joe,1
Mike,1
David,1
Max,1
*$,1
sql,1

etc, etc

Create a lookup definition for your CSV lookup and set the match type to WILDCARD for the AdminAccount field

Then run your search, and perform the lookup:

[my search]|lookup exclude_accounts AdminAccount OUTPUT exclude|where exclude!=1

https://docs.splunk.com/Documentation/Splunk/7.2.4/Knowledge/ConfigureCSVlookups
https://docs.splunk.com/Documentation/Splunk/7.2.4/Knowledge/Addfieldmatchingrulestoyourlookupconfig...

If my comment helps, please give it a thumbs up!

JaoelNameiol
Explorer

Is a lookup more efficient than the in-search where clause?

0 Karma

nickhills
Ultra Champion

Thats an excellent question - and not one I have ever seen performance comparisons on, however small lookups (<10mb) anecdotally perform very well.
The reason is that the data is loaded once into memory, and events are simply matched based on the field value as they are returned, the single where to exclude them is probably as efficient as it gets.

I would suggest testing both approaches in your environment and use the job inspector to see which one works best for your data and env.

If my comment helps, please give it a thumbs up!

JaoelNameiol
Explorer

Will do! thanks

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...