Solved: How to achieve a query for two events based on int...

lmonahan · ‎06-30-2022

I want to run a query where:

1. Query1 returns resultset1containing myEvent1.uid

2. Query2 returns resultset2 containing myEvent2.uid which is a subset of the myEvent1uid values.

3. Filter myEvent1 events and discard any that don't have a matching myEvent2.uid.

This can be done easily with an inner join but the result2 dataset is larger than 50k so I cannot use a join.

What I want is to do an inner join without using join! 😀 (I'm also practicing not using join, in general, but I really can't use join in this case.)

Saw some other posts that use join and other tricks and tried different solutions with coalesce() and also creating a new fields but haven't figured out a way that worked.

Thanks in advance!

PickleRick · ‎06-30-2022

The typical approach would be to get your events to contain the common field - let's call it "uid" and do stats values(*) as * by uid.

The trick is to perform field shuffling (rename, conditional evals) to make fields from different types of events not overlap each other.

The easiest way is to do

search1
| append [ search2 ]
| stats values(*) as * by common_field

But this is ugly since it uses a subsearch and append.

More sophisticated (but tricky solution) is to use a single search to find both result sets and conditionally "remodel" the fields.

(Conditions1) OR (conditions2)
| eval field1=if(...)
| ...
| stats values(*) as * by common_field

View solution in original post

PickleRick · ‎06-30-2022

The typical approach would be to get your events to contain the common field - let's call it "uid" and do stats values(*) as * by uid.

The trick is to perform field shuffling (rename, conditional evals) to make fields from different types of events not overlap each other.

The easiest way is to do

search1
| append [ search2 ]
| stats values(*) as * by common_field

But this is ugly since it uses a subsearch and append.

More sophisticated (but tricky solution) is to use a single search to find both result sets and conditionally "remodel" the fields.

(Conditions1) OR (conditions2)
| eval field1=if(...)
| ...
| stats values(*) as * by common_field

lmonahan · ‎06-30-2022

Thank you, you gave me a great brainstorm!

Using your advice I set out to create some new fields and to manipulate them.

It made me notice that for Query1 I was inadvertently fetching multiple occurrences of events matching myEvent1.common_uid_field value rather than one representative event that is sufficient. I needed to constrain Query1 further.

By limiting Query1 to return only one event per common_uid_field I could now use one query for both event1 or event2 and count by common_uid_field. And only when count=2 does it mean the value is in both events.

My query is now of the form:

index=myindex (event1) OR (event2)
| stats values(field_in_event1) AS ImportantField count by common_uid_field
| where count=2
| stats count BY ImportantField

How to achieve a query for two events based on intersection of common value - without using join?

join

Splunk Answers Content Calendar, July Edition I

Secure Your Future: Mastering Upgrade Readiness for Splunk 10

Observability Unlocked: Kubernetes & Cloud Monitoring with Splunk IM

Are you a member of the Splunk Community?

How to achieve a query for two events based on intersection of common value - without using join?

join

Splunk Answers Content Calendar, July Edition I

Secure Your Future: Mastering Upgrade Readiness for Splunk 10

Observability Unlocked: Kubernetes & Cloud Monitoring with Splunk IM