Splunk Search

How to Exclude some events based on a different dataset

yuanliu
SplunkTrust
SplunkTrust

The problem is a simple one: I have a base search from which I want to exclude a subset based on a criteria determined in a different dataset.  But I cannot find an efficient way to do this.

So far, what I am doing is

 

 

basesearch
| join joinkey
  [set diff
    [ basesearch
      | stats count by joinkey
      | fields - count ]
    [ criteria
      | stats count by joinkey
      | fields - count ]
  ]

 

 

While the logic works, it feels immensely inefficient.  Without even considering that set operations is itself expensive, but basesearch is performed two times with no change.

What is the proper way of doing this simple exclusion?

Labels (2)
0 Karma
1 Solution

bowesmana
SplunkTrust
SplunkTrust

Are you looking for this type of exclusion

| makeresults count=10000
| eval key=random() % 4000
| eval DataSetType="Type 1"
| append [
  | makeresults count=10000
  | eval key=random() % 4000
  | eval DataSetType="Type 2"
]
| stats count values(DataSetType) as Types by key
| where mvcount(Types)=1 OR count=1

If you search both data sets in your outer search, then use stats to aggregate all values based on your joinkey, then you can usually test for a) count=1 or b) mvcount(somefield)=1 AND  somefield="your criteria"

Is that what you're trying to do?

View solution in original post

Tags (1)

bowesmana
SplunkTrust
SplunkTrust

Are you looking for this type of exclusion

| makeresults count=10000
| eval key=random() % 4000
| eval DataSetType="Type 1"
| append [
  | makeresults count=10000
  | eval key=random() % 4000
  | eval DataSetType="Type 2"
]
| stats count values(DataSetType) as Types by key
| where mvcount(Types)=1 OR count=1

If you search both data sets in your outer search, then use stats to aggregate all values based on your joinkey, then you can usually test for a) count=1 or b) mvcount(somefield)=1 AND  somefield="your criteria"

Is that what you're trying to do?

Tags (1)

yuanliu
SplunkTrust
SplunkTrust

Thanks, @bowesmana!  I always forget the great append.  I was experimenting with 

(basesearch) OR (criteria)
| stats values(DataSetType) as DataSetType by joinkey
| where mvcount(DataSetType) == 1

which is impossibly slower than my original logic.

I usually use stats to limit number of fields as early as possible after search.  But for this test, I also tried eventstats in order to preserve useful fields.  What I discover is that if I do not limit number of fields in `criteria` search, eventstats-where makes the alternative slightly slower than join-set-diff.

But because for my purposes, `criteria` search is only used for that one key, I can limit append to joinkey + DataSetType.  The net result is slightly faster join-set-diff even with eventstats-where. (My test dataset is not huge.  The advantage will be bigger with bigger dataset.)

0 Karma

yuanliu
SplunkTrust
SplunkTrust
... was experimenting with 
(basesearch) OR (criteria)
| stats values(DataSetType) as DataSetType by joinkey
| where mvcount(DataSetType) == 1

which is impossibly slower than my original logic.


Since this topic is really about avoiding join to improve performance, it is worth an update: I have since tested more with "OR" in raw search and noticed that the above conclusion was false; I was testing the two methods in different conditions.  Using "OR" is more advantageous.

What motivated this update, though, was Nick Mealy's great 2020 talk Master Joining Datasets Without Using Join which I watched two years too late😶.  The first topic? "What’s Wrong With the Join and Append Commands?" (In a nutshell, indexer search and search head search make a big difference.)  So, if anyone has the same needs in the future, favor the "OR" method to append. (But restrict the search as much as practical.)  I sure will return to this talk next time I have an urge to "join".

Get Updates on the Splunk Community!

New Splunk Observability innovations: Deeper visibility and smarter alerting to ...

You asked, we delivered. Splunk Observability Cloud has several new innovations giving you deeper visibility ...

Synthetic Monitoring: Not your Grandma’s Polyester! Tech Talk: DevOps Edition

Register today and join TekStream on Tuesday, February 28 at 11am PT/2pm ET for a demonstration of Splunk ...

Instrumenting Java Websocket Messaging

Instrumenting Java Websocket MessagingThis article is a code-based discussion of passing OpenTelemetry trace ...