Splunk Search

Compare 2 datasets

ppanchal
Path Finder
I have 2 data sets
 
index=support source=sites earliest=-1d@d latest=-0d@d
index=support source=sites earliest=-0d@d latest=now
 
I want to pull out that data which is changed in data set 2 as compared to data set 1
Labels (3)
0 Karma

sanjeev543
Communicator

@ppanchal See if the below logic helps!!

| makeresults
| eval site="1,2,3"
| makemv site delim=","
| mvexpand site
| appendcols
[| makeresults
| eval site1="4,3,5"
| makemv site1 delim=","
| mvexpand site1]
| eventstats values(site1) as site2
| eval val=if(in(site,site2),"YES","NO")

============================

index=support source=sites earliest=-1d@d latest=-0d@d | rename site as site1 | table site1 | appendcols [search index=support source=sites earliest=-0d@d latest=now | rename site as site2 | table site2]
|eventstats values(site2) as site2_values | eval present_in_both=if(in(site1,site2_values),"YES","NO")
0 Karma

bowesmana
SplunkTrust
SplunkTrust

You will need to provide a little more information on the fields in your data you are looking to compare and what output you will expect to see at the end.

Can you provide an example?

0 Karma

ppanchal
Path Finder

We have a field called Site_id which is a string.

Example:

Set 1 - Site id = a1, a2, a3

Set 2 - Site id = a2, a3, a4

My result should be Site id = a1, a4

0 Karma

bowesmana
SplunkTrust
SplunkTrust

Try this

(index=support source=sites earliest=-1d@d latest=-0d@d) OR 
(index=support source=sites earliest=-0d@d latest=now)
| bin _time span=1d@d
| stats count by _time Site_id
| stats values(_time) as _time by Site_id
| where mvcount(_time) = 1

This groups the sites by date then counts the dates that site has been found and only looks for results where there is only one time value.

Hope this helps

 

0 Karma

ppanchal
Path Finder

This did not work, I tried below,

(index=support source=sites SITE_ID=S028 AND SITE_ID=S056 earliest=-1d@d latest=-0d@d) OR (index=support source=sites SITE_ID=S028 AND SITE_ID=S056 AND SITE_ID=S10 earliest=-0d@d latest=now)
| bin _time span=1d@d | stats count by _time SITE_ID | stats values(_time) as _time by SITE_ID | where mvcount(_time) = 1

It gives me 0 events/results. Can you please help?

0 Karma

bowesmana
SplunkTrust
SplunkTrust

Can you take out the 'where' clause, so you can see what results you get back. If you replace the where clause with 

| eval date=strftime(_time, "%F")

then you will get a date column, where it shows you the dates of data it has seen for those sites.

 

0 Karma

ppanchal
Path Finder

No this will not help me. I have like 500-600 site IDs. I will have to go through the entire list to see the IDs are present for both the dates.

Any other solution?

I only want to find the sites ids that  has only 1 occurrence. 

0 Karma

bowesmana
SplunkTrust
SplunkTrust

It shouldn't matter how many you have - the point is to see what your search is returning so the first line would be enough to investigate - it might be that your time constraints in the query are not working in your environment in that 'now' is not giving you any window between -0d@d

you could change the time constraints to

(index=support source=sites earliest=-2d@d latest=-1d@d) OR 
(index=support source=sites earliest=-1d@d latest=@d)

which will compare two days ago to yesterday

0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...