Splunk Search

Find common entries in two different sources

dbcase
Motivator

Hi,

I have data that looks like this

Source1
PREMISE,CREATION_DATE,RESULT_TIME
111111,20160621111111,20160621111211
222222,20160622111111,20160622111211
333333,20160622111111,20160622111211
555555,20160522111111,20160622111211
666666,20160522111111,20160622111211

Source2
PREMISE,CREATION_DATE,QUEUED_TIME
111111,20160621111111,20160621111211
222222,20160622111111,20160622111211
333333,20160622111111,20160622111211
122222,20160621111111,20160621111211
233333,20160622111111,20160622111211
344444,20160622111111,20160622111211

I need to select the common values by PREMISE (meaning PREMISE needs to exist in BOTH files) so ideally I would end up with a table like this:
PREMISE,CREATION_DATE
111111,20160621111111
222222,20160622111111
333333,20160622111111

Then I need to select the values by PREMISE that exist in Source1 but NOT in source two, so ideally I'd end up with this:

PREMISE,CREATION_DATE
555555,20160522111111
666666,20160522111111

I've tried several queries but no luck and now I'm going crosseyed.....

0 Karma
1 Solution

maciep
Champion

You might be able to do this join...

This will do an inner join by default, so it should give you results where premise exists in both sources.

source=source1  | join premise [source=source2] | table premise creation_date

And then this will do a left join. So it will keep everything from source1 and only those that match from source2. So then you can keep only those that didn't match

source=source1 | join type=left premise [source=source2] | where isnull(queued_time) | table premise creation_date

And if you wanted to see both results in one search, then maybe just modify the second search a bit

source=source1 | join type=left premise [source=source2] | eval matched = if(isnull(queued_time),"no","yes") | table premise creation_date matched

View solution in original post

0 Karma

sundareshr
Legend

Try this (add as many fields as you like and adjust where command as needed)

sourcetype=source1 OR sourcetype=source2 | chart dc over premise by sourcetype | where source1=0 AND source2>0
0 Karma

dbcase
Motivator

I think (stress THINK) I may have found it by using a combination of everyone's answers. This query seems to do the trick, thoughts?

index=top10 source=/home/oracle/workdir/reboots_requests_summary.csv|dedup PREMISE|where RESULT_TIME!="" | join type=left PREMISE NOT [ search source=/home/oracle/workdir/reboots_summary.csv]|transaction PREMISE maxspan=1h|table PREMISE, RESULT_TIME,CREATION_DATE,_time|timechart count

0 Karma

maciep
Champion

You might be able to do this join...

This will do an inner join by default, so it should give you results where premise exists in both sources.

source=source1  | join premise [source=source2] | table premise creation_date

And then this will do a left join. So it will keep everything from source1 and only those that match from source2. So then you can keep only those that didn't match

source=source1 | join type=left premise [source=source2] | where isnull(queued_time) | table premise creation_date

And if you wanted to see both results in one search, then maybe just modify the second search a bit

source=source1 | join type=left premise [source=source2] | eval matched = if(isnull(queued_time),"no","yes") | table premise creation_date matched
0 Karma

dbcase
Motivator

Hmmmmm

When I use this query that models after your first query, I get no results

index=top10 source=/home/oracle/workdir/reboots_requests_summary.csv | join PREMISE [search source=/home/oracle/workdir/reboots_summary.csv] |table PREMISE CREATION_DATE

I had to add the [search in front of source to get it to not have an error. Without adding [search splunk reported an unknown command.

0 Karma

Richfez
SplunkTrust
SplunkTrust

I'm assuming at this point you have a field called "Premise" in your events. If you do not we can fix that, but it's not worth fixing unless it's a problem so just let us know. 🙂 Also assuming that "Premise" will be unique. E.g. that there is and will be only one Premise 555555 in either source. Also, are date/time stamps working out OK on that data?

Anyway, with those questions and caveats, the base search you can use for both answers is...

sourcetype=source1 OR sourcetype=source2 | transaction maxspan=1h Premise

Then to see just the ones where it's in both tables, add to the end of that ...

sourcetype=source1 OR sourcetype=source2 | transaction maxspan=1h Premise | search eventcount>=2

Or for the ones in only one table...

sourcetype=source1 OR sourcetype=source2 | transaction maxspan=1h Premise | search eventcount=1

There are more efficient ways to "put those together" than transaction (stats), but I think transaction is easier and really not that bad in non-distributed environments. You'll use maxspan=X where X is as reasonably short of a time period as you can use that will still let it connect (keeps it efficient), so if source2's entry will always be within 5 minutes of source1's entry, then set a maxspan=5m

0 Karma

dbcase
Motivator

Thanks Rich!!!

0 Karma

dbcase
Motivator

I like the eventcount trick! That helped! Is there anyway to see entries that are ONLY in source1 but not in Source2? When I use the first query I end up with zero results, even though I know there are "matches" (match= entry is in source1 but not in source 2)

0 Karma

Richfez
SplunkTrust
SplunkTrust

Yes.

sourcetype=source1 OR sourcetype=source2 | transaction maxspan=1h Premise | search eventcount=1 sourcetype=source1

You start out with either sourcetype, create the transaction on "Premise", then search for where there's only one event and it's with sourcetype source1.

Let me know if that works.

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...