 
		
		
		
		
		
	
			
		
		
			
					
		The problem is a simple one: I have a base search from which I want to exclude a subset based on a criteria determined in a different dataset. But I cannot find an efficient way to do this.
So far, what I am doing is
basesearch
| join joinkey
  [set diff
    [ basesearch
      | stats count by joinkey
      | fields - count ]
    [ criteria
      | stats count by joinkey
      | fields - count ]
  ]
While the logic works, it feels immensely inefficient. Without even considering that set operations is itself expensive, but basesearch is performed two times with no change.
What is the proper way of doing this simple exclusion?
 
		
		
		
		
		
	
			
		
		
			
					
		Are you looking for this type of exclusion
| makeresults count=10000
| eval key=random() % 4000
| eval DataSetType="Type 1"
| append [
  | makeresults count=10000
  | eval key=random() % 4000
  | eval DataSetType="Type 2"
]
| stats count values(DataSetType) as Types by key
| where mvcount(Types)=1 OR count=1If you search both data sets in your outer search, then use stats to aggregate all values based on your joinkey, then you can usually test for a) count=1 or b) mvcount(somefield)=1 AND somefield="your criteria"
Is that what you're trying to do?
 
		
		
		
		
		
	
			
		
		
			
					
		Are you looking for this type of exclusion
| makeresults count=10000
| eval key=random() % 4000
| eval DataSetType="Type 1"
| append [
  | makeresults count=10000
  | eval key=random() % 4000
  | eval DataSetType="Type 2"
]
| stats count values(DataSetType) as Types by key
| where mvcount(Types)=1 OR count=1If you search both data sets in your outer search, then use stats to aggregate all values based on your joinkey, then you can usually test for a) count=1 or b) mvcount(somefield)=1 AND somefield="your criteria"
Is that what you're trying to do?
 
		
		
		
		
		
	
			
		
		
			
					
		Thanks, @bowesmana! I always forget the great append. I was experimenting with
(basesearch) OR (criteria)
| stats values(DataSetType) as DataSetType by joinkey
| where mvcount(DataSetType) == 1which is impossibly slower than my original logic.
I usually use stats to limit number of fields as early as possible after search. But for this test, I also tried eventstats in order to preserve useful fields. What I discover is that if I do not limit number of fields in `criteria` search, eventstats-where makes the alternative slightly slower than join-set-diff.
But because for my purposes, `criteria` search is only used for that one key, I can limit append to joinkey + DataSetType. The net result is slightly faster join-set-diff even with eventstats-where. (My test dataset is not huge. The advantage will be bigger with bigger dataset.)
 
		
		
		
		
		
	
			
		
		
			
					
		... was experimenting with(basesearch) OR (criteria)
| stats values(DataSetType) as DataSetType by joinkey
| where mvcount(DataSetType) == 1which is impossibly slower than my original logic.
Since this topic is really about avoiding join to improve performance, it is worth an update: I have since tested more with "OR" in raw search and noticed that the above conclusion was false; I was testing the two methods in different conditions. Using "OR" is more advantageous.
What motivated this update, though, was Nick Mealy's great 2020 talk Master Joining Datasets Without Using Join which I watched two years too late😶. The first topic? "What’s Wrong With the Join and Append Commands?" (In a nutshell, indexer search and search head search make a big difference.) So, if anyone has the same needs in the future, favor the "OR" method to append. (But restrict the search as much as practical.) I sure will return to this talk next time I have an urge to "join".
