Hi,
I have data that looks like this
Source1
PREMISE,CREATION_DATE,RESULT_TIME
111111,20160621111111,20160621111211
222222,20160622111111,20160622111211
333333,20160622111111,20160622111211
555555,20160522111111,20160622111211
666666,20160522111111,20160622111211
Source2
PREMISE,CREATION_DATE,QUEUED_TIME
111111,20160621111111,20160621111211
222222,20160622111111,20160622111211
333333,20160622111111,20160622111211
122222,20160621111111,20160621111211
233333,20160622111111,20160622111211
344444,20160622111111,20160622111211
I need to select the common values by PREMISE (meaning PREMISE needs to exist in BOTH files) so ideally I would end up with a table like this:
PREMISE,CREATION_DATE
111111,20160621111111
222222,20160622111111
333333,20160622111111
Then I need to select the values by PREMISE that exist in Source1 but NOT in source two, so ideally I'd end up with this:
PREMISE,CREATION_DATE
555555,20160522111111
666666,20160522111111
I've tried several queries but no luck and now I'm going crosseyed.....
You might be able to do this join...
This will do an inner join by default, so it should give you results where premise exists in both sources.
source=source1 | join premise [source=source2] | table premise creation_date
And then this will do a left join. So it will keep everything from source1 and only those that match from source2. So then you can keep only those that didn't match
source=source1 | join type=left premise [source=source2] | where isnull(queued_time) | table premise creation_date
And if you wanted to see both results in one search, then maybe just modify the second search a bit
source=source1 | join type=left premise [source=source2] | eval matched = if(isnull(queued_time),"no","yes") | table premise creation_date matched
Try this (add as many fields as you like and adjust where command as needed)
sourcetype=source1 OR sourcetype=source2 | chart dc over premise by sourcetype | where source1=0 AND source2>0
I think (stress THINK) I may have found it by using a combination of everyone's answers. This query seems to do the trick, thoughts?
index=top10 source=/home/oracle/workdir/reboots_requests_summary.csv|dedup PREMISE|where RESULT_TIME!="" | join type=left PREMISE NOT [ search source=/home/oracle/workdir/reboots_summary.csv]|transaction PREMISE maxspan=1h|table PREMISE, RESULT_TIME,CREATION_DATE,_time|timechart count
You might be able to do this join...
This will do an inner join by default, so it should give you results where premise exists in both sources.
source=source1 | join premise [source=source2] | table premise creation_date
And then this will do a left join. So it will keep everything from source1 and only those that match from source2. So then you can keep only those that didn't match
source=source1 | join type=left premise [source=source2] | where isnull(queued_time) | table premise creation_date
And if you wanted to see both results in one search, then maybe just modify the second search a bit
source=source1 | join type=left premise [source=source2] | eval matched = if(isnull(queued_time),"no","yes") | table premise creation_date matched
Hmmmmm
When I use this query that models after your first query, I get no results
index=top10 source=/home/oracle/workdir/reboots_requests_summary.csv | join PREMISE [search source=/home/oracle/workdir/reboots_summary.csv] |table PREMISE CREATION_DATE
I had to add the [search in front of source to get it to not have an error. Without adding [search splunk reported an unknown command.
I'm assuming at this point you have a field called "Premise" in your events. If you do not we can fix that, but it's not worth fixing unless it's a problem so just let us know. 🙂 Also assuming that "Premise" will be unique. E.g. that there is and will be only one Premise 555555 in either source. Also, are date/time stamps working out OK on that data?
Anyway, with those questions and caveats, the base search you can use for both answers is...
sourcetype=source1 OR sourcetype=source2 | transaction maxspan=1h Premise
Then to see just the ones where it's in both tables, add to the end of that ...
sourcetype=source1 OR sourcetype=source2 | transaction maxspan=1h Premise | search eventcount>=2
Or for the ones in only one table...
sourcetype=source1 OR sourcetype=source2 | transaction maxspan=1h Premise | search eventcount=1
There are more efficient ways to "put those together" than transaction (stats), but I think transaction is easier and really not that bad in non-distributed environments. You'll use maxspan=X
where X is as reasonably short of a time period as you can use that will still let it connect (keeps it efficient), so if source2's entry will always be within 5 minutes of source1's entry, then set a maxspan=5m
Thanks Rich!!!
I like the eventcount trick! That helped! Is there anyway to see entries that are ONLY in source1 but not in Source2? When I use the first query I end up with zero results, even though I know there are "matches" (match= entry is in source1 but not in source 2)
Yes.
sourcetype=source1 OR sourcetype=source2 | transaction maxspan=1h Premise | search eventcount=1 sourcetype=source1
You start out with either sourcetype, create the transaction on "Premise", then search for where there's only one event and it's with sourcetype source1.
Let me know if that works.