Splunk Search

How to Exclude some events based on a different dataset

yuanliu
SplunkTrust
SplunkTrust

The problem is a simple one: I have a base search from which I want to exclude a subset based on a criteria determined in a different dataset.  But I cannot find an efficient way to do this.

So far, what I am doing is

 

 

basesearch
| join joinkey
  [set diff
    [ basesearch
      | stats count by joinkey
      | fields - count ]
    [ criteria
      | stats count by joinkey
      | fields - count ]
  ]

 

 

While the logic works, it feels immensely inefficient.  Without even considering that set operations is itself expensive, but basesearch is performed two times with no change.

What is the proper way of doing this simple exclusion?

Labels (2)
0 Karma
1 Solution

bowesmana
SplunkTrust
SplunkTrust

Are you looking for this type of exclusion

| makeresults count=10000
| eval key=random() % 4000
| eval DataSetType="Type 1"
| append [
  | makeresults count=10000
  | eval key=random() % 4000
  | eval DataSetType="Type 2"
]
| stats count values(DataSetType) as Types by key
| where mvcount(Types)=1 OR count=1

If you search both data sets in your outer search, then use stats to aggregate all values based on your joinkey, then you can usually test for a) count=1 or b) mvcount(somefield)=1 AND  somefield="your criteria"

Is that what you're trying to do?

View solution in original post

Tags (1)

bowesmana
SplunkTrust
SplunkTrust

Are you looking for this type of exclusion

| makeresults count=10000
| eval key=random() % 4000
| eval DataSetType="Type 1"
| append [
  | makeresults count=10000
  | eval key=random() % 4000
  | eval DataSetType="Type 2"
]
| stats count values(DataSetType) as Types by key
| where mvcount(Types)=1 OR count=1

If you search both data sets in your outer search, then use stats to aggregate all values based on your joinkey, then you can usually test for a) count=1 or b) mvcount(somefield)=1 AND  somefield="your criteria"

Is that what you're trying to do?

Tags (1)

yuanliu
SplunkTrust
SplunkTrust

Thanks, @bowesmana!  I always forget the great append.  I was experimenting with 

(basesearch) OR (criteria)
| stats values(DataSetType) as DataSetType by joinkey
| where mvcount(DataSetType) == 1

which is impossibly slower than my original logic.

I usually use stats to limit number of fields as early as possible after search.  But for this test, I also tried eventstats in order to preserve useful fields.  What I discover is that if I do not limit number of fields in `criteria` search, eventstats-where makes the alternative slightly slower than join-set-diff.

But because for my purposes, `criteria` search is only used for that one key, I can limit append to joinkey + DataSetType.  The net result is slightly faster join-set-diff even with eventstats-where. (My test dataset is not huge.  The advantage will be bigger with bigger dataset.)

0 Karma

yuanliu
SplunkTrust
SplunkTrust
... was experimenting with 
(basesearch) OR (criteria)
| stats values(DataSetType) as DataSetType by joinkey
| where mvcount(DataSetType) == 1

which is impossibly slower than my original logic.


Since this topic is really about avoiding join to improve performance, it is worth an update: I have since tested more with "OR" in raw search and noticed that the above conclusion was false; I was testing the two methods in different conditions.  Using "OR" is more advantageous.

What motivated this update, though, was Nick Mealy's great 2020 talk Master Joining Datasets Without Using Join which I watched two years too late😶.  The first topic? "What’s Wrong With the Join and Append Commands?" (In a nutshell, indexer search and search head search make a big difference.)  So, if anyone has the same needs in the future, favor the "OR" method to append. (But restrict the search as much as practical.)  I sure will return to this talk next time I have an urge to "join".

Get Updates on the Splunk Community!

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...