Splunk Search
Highlighted

Find common entries in two different sources

Motivator

Hi,

I have data that looks like this

Source1
PREMISE,CREATIONDATE,RESULTTIME
111111,20160621111111,20160621111211
222222,20160622111111,20160622111211
333333,20160622111111,20160622111211
555555,20160522111111,20160622111211
666666,20160522111111,20160622111211

Source2
PREMISE,CREATIONDATE,QUEUEDTIME
111111,20160621111111,20160621111211
222222,20160622111111,20160622111211
333333,20160622111111,20160622111211
122222,20160621111111,20160621111211
233333,20160622111111,20160622111211
344444,20160622111111,20160622111211

I need to select the common values by PREMISE (meaning PREMISE needs to exist in BOTH files) so ideally I would end up with a table like this:
PREMISE,CREATION_DATE
111111,20160621111111
222222,20160622111111
333333,20160622111111

Then I need to select the values by PREMISE that exist in Source1 but NOT in source two, so ideally I'd end up with this:

PREMISE,CREATION_DATE
555555,20160522111111
666666,20160522111111

I've tried several queries but no luck and now I'm going crosseyed.....

0 Karma
Highlighted

Re: Find common entries in two different sources

SplunkTrust
SplunkTrust

I'm assuming at this point you have a field called "Premise" in your events. If you do not we can fix that, but it's not worth fixing unless it's a problem so just let us know. 🙂 Also assuming that "Premise" will be unique. E.g. that there is and will be only one Premise 555555 in either source. Also, are date/time stamps working out OK on that data?

Anyway, with those questions and caveats, the base search you can use for both answers is...

sourcetype=source1 OR sourcetype=source2 | transaction maxspan=1h Premise

Then to see just the ones where it's in both tables, add to the end of that ...

sourcetype=source1 OR sourcetype=source2 | transaction maxspan=1h Premise | search eventcount>=2

Or for the ones in only one table...

sourcetype=source1 OR sourcetype=source2 | transaction maxspan=1h Premise | search eventcount=1

There are more efficient ways to "put those together" than transaction (stats), but I think transaction is easier and really not that bad in non-distributed environments. You'll use maxspan=X where X is as reasonably short of a time period as you can use that will still let it connect (keeps it efficient), so if source2's entry will always be within 5 minutes of source1's entry, then set a maxspan=5m

0 Karma
Highlighted

Re: Find common entries in two different sources

Motivator

I like the eventcount trick! That helped! Is there anyway to see entries that are ONLY in source1 but not in Source2? When I use the first query I end up with zero results, even though I know there are "matches" (match= entry is in source1 but not in source 2)

0 Karma
Highlighted

Re: Find common entries in two different sources

SplunkTrust
SplunkTrust

Yes.

sourcetype=source1 OR sourcetype=source2 | transaction maxspan=1h Premise | search eventcount=1 sourcetype=source1

You start out with either sourcetype, create the transaction on "Premise", then search for where there's only one event and it's with sourcetype source1.

Let me know if that works.

0 Karma
Highlighted

Re: Find common entries in two different sources

Motivator

Thanks Rich!!!

0 Karma
Highlighted

Re: Find common entries in two different sources

Champion

You might be able to do this join...

This will do an inner join by default, so it should give you results where premise exists in both sources.

source=source1  | join premise [source=source2] | table premise creation_date

And then this will do a left join. So it will keep everything from source1 and only those that match from source2. So then you can keep only those that didn't match

source=source1 | join type=left premise [source=source2] | where isnull(queued_time) | table premise creation_date

And if you wanted to see both results in one search, then maybe just modify the second search a bit

source=source1 | join type=left premise [source=source2] | eval matched = if(isnull(queued_time),"no","yes") | table premise creation_date matched

View solution in original post

0 Karma
Highlighted

Re: Find common entries in two different sources

Motivator

Hmmmmm

When I use this query that models after your first query, I get no results

index=top10 source=/home/oracle/workdir/rebootsrequestssummary.csv | join PREMISE [search source=/home/oracle/workdir/rebootssummary.csv] |table PREMISE CREATIONDATE

I had to add the [search in front of source to get it to not have an error. Without adding [search splunk reported an unknown command.

0 Karma
Highlighted

Re: Find common entries in two different sources

Legend

Try this (add as many fields as you like and adjust where command as needed)

sourcetype=source1 OR sourcetype=source2 | chart dc over premise by sourcetype | where source1=0 AND source2>0
0 Karma
Highlighted

Re: Find common entries in two different sources

Motivator

I think (stress THINK) I may have found it by using a combination of everyone's answers. This query seems to do the trick, thoughts?

index=top10 source=/home/oracle/workdir/rebootsrequestssummary.csv|dedup PREMISE|where RESULTTIME!="" | join type=left PREMISE NOT [ search source=/home/oracle/workdir/rebootssummary.csv]|transaction PREMISE maxspan=1h|table PREMISE, RESULTTIME,CREATIONDATE,_time|timechart count

0 Karma