What is an efficient way to exclude multiple strin...

JaoelNameiol · ‎03-20-2019

Need to exclude field results based on multiple string-matching cirteria (OR):

-Not equals to any one of several names
-Not ends with "$"
-Only has A-Z, a-z, "-", ".", "_"
-Not contains any one of several names

Here's my inefficient solution. AdminAccount is the field to query.

| where not (AdminAccount = "Joe" or AdminAccount = "Mike" or AdminAccount = "David" or AdminAccount = "Max" or AdminAccount = "Abe" or AdminAccount = "Peter")
| regex AdminAccount != "\$$"
| where NOT match(AdminAccount,"\d+$")
| where NOT match(AdminAccount,"sql|ssoadmin|local service|internal|snapshots|sharepoint")

Any way to do this better? bonus points if you explain why.

worshamn · ‎03-20-2019

Techinically the whole thing could be one big regex for a single filter like so:

| regex AdminAccount != "^Joe$|^Mike$|^David$|^Max$|^Abe$|^Peter$|\$$|\d+$|sql|sso|admin|local service|internal|snapshots|sharepoint"

But if readability counts, then maybe switch the first where statement to a search (because the IN operator is handy though where has something similar) and combine the regex expressions

| search AdminAccount IN (Joe Mike David Max Peter)
| regex AdminAccount != "\$$|\d+$|sql|sso|admin|local service|internal|snapshots|sharepoint"

JaoelNameiol · ‎03-20-2019

Is one regex faster/more efficient than multiple regex'es? assuming readability doesn't matter

worshamn · ‎03-20-2019

Well I'm not certain how regex is handled "under the hood" so to speak. I think nickhillscpl depiction of using job inspector is a good idea to test it, but logically a single operation has got to be more efficient then multiple (unless Splunk is combining them) and likely you are passing the load to the regex engine/module/whatever all at once.

nickhills · ‎03-20-2019

Where you have a long list of things to exclude, you may consider using a lookup.

Create a CSV with something like:

AdminAccount,exclude
Joe,1
Mike,1
David,1
Max,1
*$,1
sql,1

etc, etc

Create a lookup definition for your CSV lookup and set the match type to WILDCARD for the AdminAccount field

Then run your search, and perform the lookup:

[my search]|lookup exclude_accounts AdminAccount OUTPUT exclude|where exclude!=1

https://docs.splunk.com/Documentation/Splunk/7.2.4/Knowledge/ConfigureCSVlookups
https://docs.splunk.com/Documentation/Splunk/7.2.4/Knowledge/Addfieldmatchingrulestoyourlookupconfig...

If my comment helps, please give it a thumbs up!

JaoelNameiol · ‎03-20-2019

Is a lookup more efficient than the in-search where clause?

nickhills · ‎03-20-2019

Thats an excellent question - and not one I have ever seen performance comparisons on, however small lookups (<10mb) anecdotally perform very well.
The reason is that the data is loaded once into memory, and events are simply matched based on the field value as they are returned, the single where to exclude them is probably as efficient as it gets.

I would suggest testing both approaches in your environment and use the job inspector to see which one works best for your data and env.

If my comment helps, please give it a thumbs up!

JaoelNameiol · ‎03-20-2019

Will do! thanks

What is an efficient way to exclude multiple string criteria from a field in search?

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits

Join the Conversation

What is an efficient way to exclude multiple string criteria from a field in search?

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits