Splunk Search

How to achieve a query for two events based on intersection of common value - without using join?

lmonahan
Path Finder

I want to run a query where:

1. Query1 returns resultset1containing myEvent1.uid

2. Query2 returns resultset2 containing myEvent2.uid which is a subset of the myEvent1uid values.

3. Filter myEvent1 events and discard any that don't have a matching myEvent2.uid.

This can be done easily with an inner join but the result2 dataset is larger than 50k so I cannot use a join.

What I want is to do an inner join without using join😀  (I'm also practicing not using join, in general, but I really can't use join in this case.)

Saw some other posts that use join and other tricks and tried different solutions with coalesce() and also creating a new fields but haven't figured out a way that worked.

Thanks in advance!

 

Labels (1)
0 Karma
1 Solution

PickleRick
Ultra Champion

The typical approach would be to get your events to contain the common field - let's call it "uid" and do stats values(*) as * by uid.

The trick is to perform field shuffling (rename, conditional evals) to make fields from different types of events not overlap each other.

The easiest way is to do

search1
| append [ search2 ]
| stats values(*) as * by common_field

But this is ugly since it uses a subsearch and append.

More sophisticated (but tricky solution) is to use a single search to find both result sets and conditionally "remodel" the fields.

(Conditions1) OR (conditions2)
| eval field1=if(...)
| ...
| stats values(*) as * by common_field

 

View solution in original post

PickleRick
Ultra Champion

The typical approach would be to get your events to contain the common field - let's call it "uid" and do stats values(*) as * by uid.

The trick is to perform field shuffling (rename, conditional evals) to make fields from different types of events not overlap each other.

The easiest way is to do

search1
| append [ search2 ]
| stats values(*) as * by common_field

But this is ugly since it uses a subsearch and append.

More sophisticated (but tricky solution) is to use a single search to find both result sets and conditionally "remodel" the fields.

(Conditions1) OR (conditions2)
| eval field1=if(...)
| ...
| stats values(*) as * by common_field

 

lmonahan
Path Finder

Thank you, you gave me a great brainstorm!

Using your advice I set out to create some new fields and to manipulate them. 

It made me notice that for Query1 I was inadvertently fetching multiple occurrences of events matching myEvent1.common_uid_field value rather than one representative event that is sufficient. I needed to constrain Query1 further.

By limiting Query1 to return only one event per common_uid_field I could now use one query for both event1 or event2 and count by common_uid_field. And only when count=2 does it mean the value is in both events.

My query is now of the form:

index=myindex (event1) OR (event2)
| stats values(field_in_event1) AS ImportantField count by common_uid_field
| where count=2
| stats count BY ImportantField
0 Karma
Get Updates on the Splunk Community!

Splunk Forwarders and Forced Time Based Load Balancing

Splunk customers use universal forwarders to collect and send data to Splunk. A universal forwarder can send ...

NEW! Log Views in Splunk Observability Dashboards Gives Context From a Single Page

Today, Splunk Observability releases log views, a new feature for users to add their logs data from Splunk Log ...

Last Chance to Submit Your Paper For BSides Splunk - Deadline is August 12th!

Hello everyone! Don't wait to submit - The deadline is August 12th! We have truly missed the community so ...