Splunk Search

Compare 2 datasets

ppanchal
Path Finder
I have 2 data sets
 
index=support source=sites earliest=-1d@d latest=-0d@d
index=support source=sites earliest=-0d@d latest=now
 
I want to pull out that data which is changed in data set 2 as compared to data set 1
Labels (3)
0 Karma

sanjeev543
Communicator

@ppanchal See if the below logic helps!!

| makeresults
| eval site="1,2,3"
| makemv site delim=","
| mvexpand site
| appendcols
[| makeresults
| eval site1="4,3,5"
| makemv site1 delim=","
| mvexpand site1]
| eventstats values(site1) as site2
| eval val=if(in(site,site2),"YES","NO")

============================

index=support source=sites earliest=-1d@d latest=-0d@d | rename site as site1 | table site1 | appendcols [search index=support source=sites earliest=-0d@d latest=now | rename site as site2 | table site2]
|eventstats values(site2) as site2_values | eval present_in_both=if(in(site1,site2_values),"YES","NO")
0 Karma

bowesmana
SplunkTrust
SplunkTrust

You will need to provide a little more information on the fields in your data you are looking to compare and what output you will expect to see at the end.

Can you provide an example?

0 Karma

ppanchal
Path Finder

We have a field called Site_id which is a string.

Example:

Set 1 - Site id = a1, a2, a3

Set 2 - Site id = a2, a3, a4

My result should be Site id = a1, a4

0 Karma

bowesmana
SplunkTrust
SplunkTrust

Try this

(index=support source=sites earliest=-1d@d latest=-0d@d) OR 
(index=support source=sites earliest=-0d@d latest=now)
| bin _time span=1d@d
| stats count by _time Site_id
| stats values(_time) as _time by Site_id
| where mvcount(_time) = 1

This groups the sites by date then counts the dates that site has been found and only looks for results where there is only one time value.

Hope this helps

 

0 Karma

ppanchal
Path Finder

This did not work, I tried below,

(index=support source=sites SITE_ID=S028 AND SITE_ID=S056 earliest=-1d@d latest=-0d@d) OR (index=support source=sites SITE_ID=S028 AND SITE_ID=S056 AND SITE_ID=S10 earliest=-0d@d latest=now)
| bin _time span=1d@d | stats count by _time SITE_ID | stats values(_time) as _time by SITE_ID | where mvcount(_time) = 1

It gives me 0 events/results. Can you please help?

0 Karma

bowesmana
SplunkTrust
SplunkTrust

Can you take out the 'where' clause, so you can see what results you get back. If you replace the where clause with 

| eval date=strftime(_time, "%F")

then you will get a date column, where it shows you the dates of data it has seen for those sites.

 

0 Karma

ppanchal
Path Finder

No this will not help me. I have like 500-600 site IDs. I will have to go through the entire list to see the IDs are present for both the dates.

Any other solution?

I only want to find the sites ids that  has only 1 occurrence. 

0 Karma

bowesmana
SplunkTrust
SplunkTrust

It shouldn't matter how many you have - the point is to see what your search is returning so the first line would be enough to investigate - it might be that your time constraints in the query are not working in your environment in that 'now' is not giving you any window between -0d@d

you could change the time constraints to

(index=support source=sites earliest=-2d@d latest=-1d@d) OR 
(index=support source=sites earliest=-1d@d latest=@d)

which will compare two days ago to yesterday

0 Karma
Get Updates on the Splunk Community!

Splunk Observability as Code: From Zero to Dashboard

For the details on what Self-Service Observability and Observability as Code is, we have some awesome content ...

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...

Shape the Future of Splunk: Join the Product Research Lab!

Join the Splunk Product Research Lab and connect with us in the Slack channel #product-research-lab to get ...