I want to run a query where:
1. Query1 returns resultset1containing myEvent1.uid
2. Query2 returns resultset2 containing myEvent2.uid which is a subset of the myEvent1uid values.
3. Filter myEvent1 events and discard any that don't have a matching myEvent2.uid.
This can be done easily with an inner join but the result2 dataset is larger than 50k so I cannot use a join.
What I want is to do an inner join without using join! 😀 (I'm also practicing not using join, in general, but I really can't use join in this case.)
Saw some other posts that use join and other tricks and tried different solutions with coalesce() and also creating a new fields but haven't figured out a way that worked.
Thanks in advance!
The typical approach would be to get your events to contain the common field - let's call it "uid" and do stats values(*) as * by uid.
The trick is to perform field shuffling (rename, conditional evals) to make fields from different types of events not overlap each other.
The easiest way is to do
search1
| append [ search2 ]
| stats values(*) as * by common_field
But this is ugly since it uses a subsearch and append.
More sophisticated (but tricky solution) is to use a single search to find both result sets and conditionally "remodel" the fields.
(Conditions1) OR (conditions2)
| eval field1=if(...)
| ...
| stats values(*) as * by common_field
The typical approach would be to get your events to contain the common field - let's call it "uid" and do stats values(*) as * by uid.
The trick is to perform field shuffling (rename, conditional evals) to make fields from different types of events not overlap each other.
The easiest way is to do
search1
| append [ search2 ]
| stats values(*) as * by common_field
But this is ugly since it uses a subsearch and append.
More sophisticated (but tricky solution) is to use a single search to find both result sets and conditionally "remodel" the fields.
(Conditions1) OR (conditions2)
| eval field1=if(...)
| ...
| stats values(*) as * by common_field
Thank you, you gave me a great brainstorm!
Using your advice I set out to create some new fields and to manipulate them.
It made me notice that for Query1 I was inadvertently fetching multiple occurrences of events matching myEvent1.common_uid_field value rather than one representative event that is sufficient. I needed to constrain Query1 further.
By limiting Query1 to return only one event per common_uid_field I could now use one query for both event1 or event2 and count by common_uid_field. And only when count=2 does it mean the value is in both events.
My query is now of the form:
index=myindex (event1) OR (event2)
| stats values(field_in_event1) AS ImportantField count by common_uid_field
| where count=2
| stats count BY ImportantField