Splunk Search

Splunk calculate sequential sum for every timestamp

ruhtraeel
Path Finder

Hello,
I would like to create fields (or a field with multiple values) which represents the sum for each timestamp.

For example, if I have data like this:

10/23/19
10:37:01.000 AM 
urlupdateid=6, urlid=1, payer=Aetna, ReviewDate_datetype_correct=true, ReviewDate_date_correct=true, total_datetypes_correct=1, total_dates_correct=1, total_datetypes=1

10/17/19
3:18:48.000 PM  
urlupdateid=5, urlid=1, payer=Aetna, RetiredDate_datetype_correct=false, RetiredDate_date_correct=false, total_datetypes_correct=0, total_dates_correct=0, total_datetypes=1

10/17/19
3:18:48.000 PM  
urlupdateid=4, urlid=1, payer=Aetna, EffectiveDate_datetype_correct=true, EffectiveDate_date_correct=false, total_datetypes_correct=1, total_dates_correct=0, total_datetypes=1

10/17/19
3:18:48.000 PM  
urlupdateid=3, urlid=1, payer=Aetna, EffectiveDate_datetype_correct=false, EffectiveDate_date_correct=false, total_datetypes_correct=0, total_dates_correct=0, total_datetypes=1

10/17/19
3:18:48.000 PM  
urlupdateid=2, urlid=1, payer=Aetna, ReviewDate_datetype_correct=true, ReviewDate_date_correct=false, total_datetypes_correct=1, total_dates_correct=0, total_datetypes=1

10/17/19
3:18:48.000 PM  
urlupdateid=1, urlid=1, payer=Aetna, PublicationDate_datetype_correct=true, PublicationDate_date_correct=true, ReviewDate_datetype_correct=true, ReviewDate_date_correct=true, total_datetypes_correct=2, total_dates_correct=2, total_datetypes=2

The result would be two data points from sum(total_dates_correct) / sum(total_datetypes):
one where TOTAL_ACCURACY = 2/6 (from all the events with the timestamp 10/17/19 3:18:48.000 PM) and one another where TOTAL_ACCURACY = 3/7 (from all events including both timestamp 10/17/19 3:18:48.000 PM and 10/23/19 10:37:01.000 AM)

How would I do this? This query does not work because TOTAL_ACCURACY only reflects the accuracy up to the current timestamp:

| eventstats sum(total_dates_correct) as TOTAL_CORRECT | eventstats sum(total_datetypes) as TOTAL | eval TOTAL_ACCURACY = (TOTAL_CORRECT / TOTAL) * 100

Ideally, I would like to create a dashboard that shows the accuracy over the time range that is changeable in the search.

Thanks

Tags (1)
0 Karma

woodcock
Esteemed Legend

Why did you abandon this Q&A without even commenting on the efficacy of the answers?
https://answers.splunk.com/answers/778366/find-interval-sums-of-a-field-and-display-it-in-a.html#ans...

0 Karma

to4kawa
Ultra Champion
| stats count
| eval raw="10/23/19 10:37:01.000 AM urlupdateid=6, urlid=1, payer=Aetna, ReviewDate_datetype_correct=true, ReviewDate_date_correct=true, total_datetypes_correct=1, total_dates_correct=1, total_datetypes=1
10/17/19 3:18:48.000 PM urlupdateid=5, urlid=1, payer=Aetna, RetiredDate_datetype_correct=false, RetiredDate_date_correct=false, total_datetypes_correct=0, total_dates_correct=0, total_datetypes=1
10/17/19 3:18:48.000 PM urlupdateid=4, urlid=1, payer=Aetna, EffectiveDate_datetype_correct=true, EffectiveDate_date_correct=false, total_datetypes_correct=1, total_dates_correct=0, total_datetypes=1
10/17/19 3:18:48.000 PM urlupdateid=3, urlid=1, payer=Aetna, EffectiveDate_datetype_correct=false, EffectiveDate_date_correct=false, total_datetypes_correct=0, total_dates_correct=0, total_datetypes=1
10/17/19 3:18:48.000 PM urlupdateid=2, urlid=1, payer=Aetna, ReviewDate_datetype_correct=true, ReviewDate_date_correct=false, total_datetypes_correct=1, total_dates_correct=0, total_datetypes=1
10/17/19 3:18:48.000 PM urlupdateid=1, urlid=1, payer=Aetna, PublicationDate_datetype_correct=true, PublicationDate_date_correct=true, ReviewDate_datetype_correct=true, ReviewDate_date_correct=true, total_datetypes_correct=2, total_dates_correct=2, total_datetypes=2"
| makemv delim="
" raw
| mvexpand raw
| rex field=raw "(?<time>^.+(AM|PM))"
| rex field=raw "total_dates_correct=(?<total_dates_correct>\d), total_datetypes=(?<total_datetypes>\d)"
| eval _time=strptime(time,"%m/%d/%y %H:%M:%S.%3Q")
| table _time total_dates_correct total_datetypes
`comment("this is sample data")`
| timechart span=1d sum(total_dates_correct) as TOTAL_CORRECT sum(total_datetypes) as TOTAL 
| eval TOTAL_ACCURACY = (TOTAL_CORRECT / TOTAL) * 100

Hi, this is sample query.

For example, search period is one month,
If you change span = to 2week, you will get the expected results.
Wouldn't it be nice to make it a dashboard and give it as a token?

0 Karma

aberkow
Builder

Have you tried the streamstats command? This is a rolling computation, so you can sum that value for each row, and evaluate the total accuracy for each row (not just the 6th and 7th), and use that as the value for the dashboard over time.

The docs can be found here: https://docs.splunk.com/Documentation/Splunk/7.3.2/SearchReference/Streamstats#Basic_examples but the syntax is quite similar to eventstats (i.e. | streamstats sum(x)). There are some other cool optional arguments around when you want to stop rolling that sum, so you should definitely check out the docs.

Also as a small syntax note: you could have done both of your eventstats computations in one command call, similar to a stats (i.e.: | eventstats sum(total_dates_correct) as TOTAL_CORRECT, sum(total_datetypes) as TOTAL will also work, because you aren't doing any additional split by clauses that might have different buckets between the two).

Let me know if this helps/you have any other questions!

0 Karma

ruhtraeel
Path Finder

The only problem with this is that it wouldn't take weighting into account. If I calculate the total accuracy for each row using streamstats and then average that out, I'd run into this problem:

Event 1:
Accuracy: 100%, total dates = 1, dates correct = 1

Event 2:
Accuracy: 0%, total dates = 10, dates correct = 0

Accuracy:
50% (even though it should be 1/11 = 9.1%)

0 Karma

aberkow
Builder

With streamstats? It should weight, because the sum of total dates would be 11 and the sum of total dates correct would be 1.

What I was thinking:
base search...
| streamstats sum(total_dates_correct) as totalDatesCorrectRunning, sum(total_date_types) as totalDatesRunning
| eval runningAverage=totalDatesCorrectRunning/totalDatesRunning

This generates an average which should be correctly weighted by day, is this still not the case?

0 Karma

ruhtraeel
Path Finder

I think you're right; I tested out that query, and it seems to keep track of the running average correctly. I've encountered another problem; how would I plot this as a chart, with X being the time and Y being the runningAverage? I tried doing this:
| table _time runningAverage
But this messes up the _time column, as this makes the earliest date (10/17/19 3:18:48.000 PM) have the latest running average (42%), and the most recent event have a running average of 100% (which is actually the running average after the first event)

Ideally, there would be two points in the graph, one representing the final running average on 10/17/19
3:18:48.000

and another point with the running average on
10/23/19
10:37:01.000

and another with the running average on

0 Karma

aberkow
Builder

Can you add a sort on the _time field before you run streamstats? This should persist through:

| sort _time asc

The params you can play around with for sort are +/-, asc/desc, and the limit on how many fields to sort: https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/sort#Syntax

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to July and August Tech Talks, Office Hours, and Webinars!

Dive into our sizzling summer lineup for July and August Community Office Hours and Tech Talks. Scroll down to ...

Edge Processor Scaling, Energy & Manufacturing Use Cases, and More New Articles on ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Get More Out of Your Security Practice With a SIEM

Get More Out of Your Security Practice With a SIEMWednesday, July 31, 2024  |  11AM PT / 2PM ETREGISTER ...