Solved: Filter by subsearch result

Ste · ‎11-28-2024

Dear experts

Basic idea of what I try to do: the results of a search should be filtered in a way, that only data points are displayed which are not part of a "Blacklist" maintained as lookup table.

The challenging thing is, there are 3 columns at the same time to be taken into account for filtering.

After a lot of trials, I ended up in creating a key from the 3 columns (which is unique) and then filter on the key. It is working, I just don't understand why :-(.

Question: Has anybody an idea why the Version 1 filter works, and why Version 2 filter fails?

Question: What needs to be changed to get Version 2 also to work?

index="pm-azlm_internal_prod_events" sourcetype="azlmj" 
| strcat ocp "_" fr "_" el unique_id 
| table _time ocp fr el unique_id d_1
| search d_1="DEF ges AZ*" 

``` VERSION 1: the working one ```
``` As long the subsearch returns a table with the column unique_id ```
``` which is exactly the name of the column I want to filter on, all works great.```
| search NOT [| inputlookup pm-azlm-aufschneidmelder-j
                          | strcat ocp "_" fr "_" sec unique_id
                          | table unique_id]

``` VERSION 2: NOT working ``` 
``` As soon I change the name of the column in the subsearch, the filte won't work anymore```
| search NOT [| inputlookup pm-azlm-aufschneidmelder-j
                          | strcat ocp "_" fr "_" sec ignore
                          | table ignore]```

| timechart span=1d limit=0 count by unique_id

And the final question: is there a way for such filtering without going through the key creation?

Thank you in advance.

PickleRick · ‎11-28-2024

What do you mean by "doesn't work"? It doesn't filter out the values? Because there is a mismatch between field names. A subsearch (unless its results consist (solely?) of fields named "search" or "query" or you used the format command explicitly) is rendered as set of conditions based on the names of the resulting fields.

So your subsearch in example 2 is rendered as

((unique_id="some_value") OR (unique_id="another+value") OR ... )

whereas your subsearch in example 3 is rendered smilarily but the field is called "ignore". You're not creating a field called "ignore" anywhere earlier in the search so you have nothing to filter on.

BTW, you are aware that this is a relatively ineffective way to search? (inclusion is better than exclusion!)

View solution in original post

ITWhisperer · ‎11-28-2024

Try something like this

index="pm-azlm_internal_prod_events" sourcetype="azlmj" NOT [| inputlookup pm-azlm-aufschneidmelder-j | table ocp fr sec | format]
| table _time ocp fr el d_1
| search d_1="DEF ges AZ*"

Ste · ‎11-28-2024

@ITWhisperer : Thank you for this, based on your input I was able to find a working answer for my 3rd question.

PickleRick · ‎11-28-2024

What do you mean by "doesn't work"? It doesn't filter out the values? Because there is a mismatch between field names. A subsearch (unless its results consist (solely?) of fields named "search" or "query" or you used the format command explicitly) is rendered as set of conditions based on the names of the resulting fields.

So your subsearch in example 2 is rendered as

((unique_id="some_value") OR (unique_id="another+value") OR ... )

whereas your subsearch in example 3 is rendered smilarily but the field is called "ignore". You're not creating a field called "ignore" anywhere earlier in the search so you have nothing to filter on.

BTW, you are aware that this is a relatively ineffective way to search? (inclusion is better than exclusion!)

Ste · ‎11-28-2024

@PickleRick : Thank you for the explanations, now I understand what is going on.
About inclusion/ exclusion and search efficiency: I was not aware about this; this is something I would need to take care of as well....

PickleRick · ‎11-29-2024

OK. Three more cents on that.

When you're searching for a condition like this

field="some value"

unless it's some special case which we're not gonna be bothered with at this time Splunk firstly searches for occurrences of terms "some" and "value" in its indexes, searches which events contain those two words (hopefully there will not be many events matching such criteria) and only those events will get parsed and Splunk will check if the "some value" string parses out in a proper spot within the event. That's quite effective for typical relatively sparse search. It might get less effective in some border cases so then you might help yourself with other means but that's a relatively advanced topic so let's leave it at that.

If you're searching for either

field!="some value"

or

NOT field="some value"

(you are aware those are not equivalent, right?) Splunk might be able to relatively quickly find all events when neither "some" nor "value" exists because if the search terms don't show up in the event at all they will obviously not match the field extraction but

1) This will only account for the second case

NOT field="some value"

If we're talking about the

field!="some value"

condition Splunk still has to parse the event and check if there is any value for the field. And if we have a multivalued field... Here's where it gets even more confusing - your second condition might still match even if one of the values in multivalued field does equal "some value" but there is another one which doesn't.

2) Even if you have both "some" and "value in the event, they still might be in different places within the event so the event as a whole might stil not match our initial condition.

So it's way way better to specify searches by means of multiple inclusion conditions which by intersecting narrow our event set which Splunk will finally have to reach raw data for and parse all the fields from than to use general exclusion on a very "wide" basis.

Another thing worth knowing which might not be important in your particular case since you're simply using inputlookup within your subsearch which is a very quick command is that subsearches have limits. If your subsearch hits time execution limit (by default it's 60 seconds IIRC) or exceeds the limit for returned results (10k rows; 50k in some specific use cases like join) it is _silently_ finalized and only results obtained so far are returned to the outer search. What is most tricky here is that the subsearch will get finalized _silently_ so you won't be aware that the subsearch didn't get a full result set and you won't be aware that your search including a subsearch might as a whole return incomplete or plain wrong results.

So you must be very very careful with subsearches and always make sure that you're not gonna hit those subsearch limits.

Ste · ‎12-02-2024

@PickleRick Thanks for highlighting the limitation of the amount of rows be returned by a sub search. This explains why one of my other Dashboards won't provide trustful values at the moment. Looks like I need to review and update some of my searches.....

richgalloway · ‎11-28-2024

The field (not "column") returned by the subsearch ("unique_id" in Version 1 and "ignore" in Version 2) must exist in the main search.

---
If this reply helps you, Karma would be appreciated.

Filter by subsearch result

subsearch

Can’t make it to .conf25? Join us online!

Leveraging Automated Threat Analysis Across the Splunk Ecosystem

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Are you a member of the Splunk Community?