I have data that looks like this
I need to select the common values by PREMISE (meaning PREMISE needs to exist in BOTH files) so ideally I would end up with a table like this:
Then I need to select the values by PREMISE that exist in Source1 but NOT in source two, so ideally I'd end up with this:
I've tried several queries but no luck and now I'm going crosseyed.....
I'm assuming at this point you have a field called "Premise" in your events. If you do not we can fix that, but it's not worth fixing unless it's a problem so just let us know. 🙂 Also assuming that "Premise" will be unique. E.g. that there is and will be only one Premise 555555 in either source. Also, are date/time stamps working out OK on that data?
Anyway, with those questions and caveats, the base search you can use for both answers is...
sourcetype=source1 OR sourcetype=source2 | transaction maxspan=1h Premise
Then to see just the ones where it's in both tables, add to the end of that ...
sourcetype=source1 OR sourcetype=source2 | transaction maxspan=1h Premise | search eventcount>=2
Or for the ones in only one table...
sourcetype=source1 OR sourcetype=source2 | transaction maxspan=1h Premise | search eventcount=1
There are more efficient ways to "put those together" than transaction (stats), but I think transaction is easier and really not that bad in non-distributed environments. You'll use
maxspan=X where X is as reasonably short of a time period as you can use that will still let it connect (keeps it efficient), so if source2's entry will always be within 5 minutes of source1's entry, then set a
I like the eventcount trick! That helped! Is there anyway to see entries that are ONLY in source1 but not in Source2? When I use the first query I end up with zero results, even though I know there are "matches" (match= entry is in source1 but not in source 2)
sourcetype=source1 OR sourcetype=source2 | transaction maxspan=1h Premise | search eventcount=1 sourcetype=source1
You start out with either sourcetype, create the transaction on "Premise", then search for where there's only one event and it's with sourcetype source1.
Let me know if that works.
You might be able to do this join...
This will do an inner join by default, so it should give you results where premise exists in both sources.
source=source1 | join premise [source=source2] | table premise creation_date
And then this will do a left join. So it will keep everything from source1 and only those that match from source2. So then you can keep only those that didn't match
source=source1 | join type=left premise [source=source2] | where isnull(queued_time) | table premise creation_date
And if you wanted to see both results in one search, then maybe just modify the second search a bit
source=source1 | join type=left premise [source=source2] | eval matched = if(isnull(queued_time),"no","yes") | table premise creation_date matched
When I use this query that models after your first query, I get no results
index=top10 source=/home/oracle/workdir/rebootsrequestssummary.csv | join PREMISE [search source=/home/oracle/workdir/rebootssummary.csv] |table PREMISE CREATIONDATE
I had to add the [search in front of source to get it to not have an error. Without adding [search splunk reported an unknown command.
Try this (add as many fields as you like and adjust where command as needed)
sourcetype=source1 OR sourcetype=source2 | chart dc over premise by sourcetype | where source1=0 AND source2>0
I think (stress THINK) I may have found it by using a combination of everyone's answers. This query seems to do the trick, thoughts?
index=top10 source=/home/oracle/workdir/rebootsrequestssummary.csv|dedup PREMISE|where RESULTTIME!="" | join type=left PREMISE NOT [ search source=/home/oracle/workdir/rebootssummary.csv]|transaction PREMISE maxspan=1h|table PREMISE, RESULTTIME,CREATIONDATE,_time|timechart count