Solved: Most efficient way to filter results matching a li...

hackwerks · ‎02-25-2022

Hello everyone. I'm trying to find the most efficient way to filter results for a list of values that may have a match within two (or more) distinct fields. Say, a list of IP addresses that can match either the source or destination fields. I'm almost certain this question has been answered in the past but I couldn't find the correct wording to find the answer (am braindead atm), so here I am.

To help filter some potential answers to my request, I do know that I can do something like:
index=my_index ...etc... (field1 IN (value1, value2, value3, value4, value5, ...) OR field2 IN (value1, value2, value3, value4, value5, ...))

However, what I am attempting to do is make this query more "efficient". Or, perhaps just less of an eye-sore. Provided a list of 10+ values to filter for, it's easy to see how this query can get out of hand at least visually. Here is an example query I intuitively tried that should help illustrate what I'm looking for:
index=my_index ...etc... (field1 OR field2 IN (value1, value2, value3, value4, value5, ...))

Splunk Pros, please help: What am I overlooking/overthinking? Or is my first example the best (or most "efficient") way to go about this?

Thanks so much!

yuanliu · ‎02-26-2022

Thanks for posing this question. Boy, do I have eyesore problem of my own:-) One workaround is to use macro. You can put any literal in a macro, such as "(value1, value2, value3, value4, value5, ...)" so your search becomes

index=my_index ...etc... (field1 IN `mymacro` OR field2 IN `mymacro`)

You can even make "IN (value1, value2, value3, value4, value5, ...)" your macro so the search becomes

index=my_index ...etc... (field1 `myinmacro` OR field2 `myinmacro`)

With this method, you cannot parameterize or the eyesore returns. (Suffice to say, macro is not a cure-all. Hence my own eyesores:-)

OT: A classic solution to this type of eyesore is iteration. In non-search languages, iteration doesn't compromise efficiency much. The following SPL iteration will give you the same result, but at terrible cost.

index=my_index ...etc...
| foreach field1 field2
  [eval good=mvmap(<<FIELD>>, if(<<FIELD>> IN (value1, value2, value3, value4, value5, ...), "good", good))]
| where isnotnull(good)

View solution in original post

hackwerks · ‎03-01-2022

@gcusello & @yuanliu - Excellent ideas. I'm clearly too new to the platform to be aware of those options. Thank you both very, very much for your prompt responses!

yuanliu · ‎02-26-2022

Thanks for posing this question. Boy, do I have eyesore problem of my own:-) One workaround is to use macro. You can put any literal in a macro, such as "(value1, value2, value3, value4, value5, ...)" so your search becomes

index=my_index ...etc... (field1 IN `mymacro` OR field2 IN `mymacro`)

You can even make "IN (value1, value2, value3, value4, value5, ...)" your macro so the search becomes

index=my_index ...etc... (field1 `myinmacro` OR field2 `myinmacro`)

With this method, you cannot parameterize or the eyesore returns. (Suffice to say, macro is not a cure-all. Hence my own eyesores:-)

OT: A classic solution to this type of eyesore is iteration. In non-search languages, iteration doesn't compromise efficiency much. The following SPL iteration will give you the same result, but at terrible cost.

index=my_index ...etc...
| foreach field1 field2
  [eval good=mvmap(<<FIELD>>, if(<<FIELD>> IN (value1, value2, value3, value4, value5, ...), "good", good))]
| where isnotnull(good)

gcusello · ‎02-25-2022

Hi @hackwerks,

maybe this could be a good workaround:

put the values in a lookup (called e.g. patterns.csv, with one column called e.g. "pattern") and run something like this:

index=my_index [ | inputlookup patterns.csv | rename pattern AS query | fields query ]
| ...

in this way, you run a full text search, using the values in the lookup, on you index.

Ciao.

Giuseppe

Most efficient way to filter results matching a list of values in either of two(+) fields?

field extraction

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits

Join the Conversation

Most efficient way to filter results matching a list of values in either of two(+) fields?

field extraction

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits