How to compare larger data in same index with diff...

KongJian · ‎05-26-2021

Scenario

example Index:

Index=os, Ingested

data _time, type, id

08:00,A,1

08:10,A,2

08:11,A,3

08:12,A,4

08:13,A,5

09:00,B,1

09:10,B,2

09:11,B,3

09:12,B,4

10:00,C,1

10:10,C,2

10:11,C,3

we want to calculate the number of ID in type B that exist in type A.

like type B have (1,2,3,4,) and type A have (1,2,3,4,5). so result should be 4/5=80%

Since we have huge amount of data, Is there any solution to handle that with on SPL?

ITWhisperer · ‎05-26-2021

| makeresults
| eval _raw="data _time, type, id
08:00,A,1
08:10,A,2
08:11,A,3
08:12,A,4
08:13,A,5
09:00,B,1
09:10,B,2
09:11,B,3
09:12,B,4
10:00,C,1
10:10,C,2
10:11,C,3"
| multikv forceheader=1
| fields - _* linecount


| where type IN ("A", "B")
| dedup type id
| eventstats count by id
| where type="A"
| stats sum(count) as total count as ids
| eval percent=(total-ids)/ids

KongJian · ‎05-26-2021

@ITWhisperer

Appreciate your solution

It works great!

we are running around 200,000 data, it takes 30s. is there any idea to accelerate the SPL?

ITWhisperer · ‎05-26-2021

You could look at the job inspector to see where the job is taking time. You could try switching the where type IN and the dedup to see if that makes a difference to the time.

How to compare larger data in same index with different field

stats

Can’t make it to .conf25? Join us online!

Community Content Calendar, September edition

Splunkbase Unveils New App Listing Management Public Preview

Leveraging Automated Threat Analysis Across the Splunk Ecosystem

Are you a member of the Splunk Community?