Splunk Search

Compare 2 datasets

ppanchal
Path Finder
I have 2 data sets
 
index=support source=sites earliest=-1d@d latest=-0d@d
index=support source=sites earliest=-0d@d latest=now
 
I want to pull out that data which is changed in data set 2 as compared to data set 1
Labels (3)
0 Karma

sanjeev543
Communicator

@ppanchal See if the below logic helps!!

| makeresults
| eval site="1,2,3"
| makemv site delim=","
| mvexpand site
| appendcols
[| makeresults
| eval site1="4,3,5"
| makemv site1 delim=","
| mvexpand site1]
| eventstats values(site1) as site2
| eval val=if(in(site,site2),"YES","NO")

============================

index=support source=sites earliest=-1d@d latest=-0d@d | rename site as site1 | table site1 | appendcols [search index=support source=sites earliest=-0d@d latest=now | rename site as site2 | table site2]
|eventstats values(site2) as site2_values | eval present_in_both=if(in(site1,site2_values),"YES","NO")
0 Karma

bowesmana
SplunkTrust
SplunkTrust

You will need to provide a little more information on the fields in your data you are looking to compare and what output you will expect to see at the end.

Can you provide an example?

0 Karma

ppanchal
Path Finder

We have a field called Site_id which is a string.

Example:

Set 1 - Site id = a1, a2, a3

Set 2 - Site id = a2, a3, a4

My result should be Site id = a1, a4

0 Karma

bowesmana
SplunkTrust
SplunkTrust

Try this

(index=support source=sites earliest=-1d@d latest=-0d@d) OR 
(index=support source=sites earliest=-0d@d latest=now)
| bin _time span=1d@d
| stats count by _time Site_id
| stats values(_time) as _time by Site_id
| where mvcount(_time) = 1

This groups the sites by date then counts the dates that site has been found and only looks for results where there is only one time value.

Hope this helps

 

0 Karma

ppanchal
Path Finder

This did not work, I tried below,

(index=support source=sites SITE_ID=S028 AND SITE_ID=S056 earliest=-1d@d latest=-0d@d) OR (index=support source=sites SITE_ID=S028 AND SITE_ID=S056 AND SITE_ID=S10 earliest=-0d@d latest=now)
| bin _time span=1d@d | stats count by _time SITE_ID | stats values(_time) as _time by SITE_ID | where mvcount(_time) = 1

It gives me 0 events/results. Can you please help?

0 Karma

bowesmana
SplunkTrust
SplunkTrust

Can you take out the 'where' clause, so you can see what results you get back. If you replace the where clause with 

| eval date=strftime(_time, "%F")

then you will get a date column, where it shows you the dates of data it has seen for those sites.

 

0 Karma

ppanchal
Path Finder

No this will not help me. I have like 500-600 site IDs. I will have to go through the entire list to see the IDs are present for both the dates.

Any other solution?

I only want to find the sites ids that  has only 1 occurrence. 

0 Karma

bowesmana
SplunkTrust
SplunkTrust

It shouldn't matter how many you have - the point is to see what your search is returning so the first line would be enough to investigate - it might be that your time constraints in the query are not working in your environment in that 'now' is not giving you any window between -0d@d

you could change the time constraints to

(index=support source=sites earliest=-2d@d latest=-1d@d) OR 
(index=support source=sites earliest=-1d@d latest=@d)

which will compare two days ago to yesterday

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...