Splunk calculate sequential sum for every timestam...

ruhtraeel · ‎10-23-2019

Hello,
I would like to create fields (or a field with multiple values) which represents the sum for each timestamp.

For example, if I have data like this:

10/23/19
10:37:01.000 AM 
urlupdateid=6, urlid=1, payer=Aetna, ReviewDate_datetype_correct=true, ReviewDate_date_correct=true, total_datetypes_correct=1, total_dates_correct=1, total_datetypes=1

10/17/19
3:18:48.000 PM  
urlupdateid=5, urlid=1, payer=Aetna, RetiredDate_datetype_correct=false, RetiredDate_date_correct=false, total_datetypes_correct=0, total_dates_correct=0, total_datetypes=1

10/17/19
3:18:48.000 PM  
urlupdateid=4, urlid=1, payer=Aetna, EffectiveDate_datetype_correct=true, EffectiveDate_date_correct=false, total_datetypes_correct=1, total_dates_correct=0, total_datetypes=1

10/17/19
3:18:48.000 PM  
urlupdateid=3, urlid=1, payer=Aetna, EffectiveDate_datetype_correct=false, EffectiveDate_date_correct=false, total_datetypes_correct=0, total_dates_correct=0, total_datetypes=1

10/17/19
3:18:48.000 PM  
urlupdateid=2, urlid=1, payer=Aetna, ReviewDate_datetype_correct=true, ReviewDate_date_correct=false, total_datetypes_correct=1, total_dates_correct=0, total_datetypes=1

10/17/19
3:18:48.000 PM  
urlupdateid=1, urlid=1, payer=Aetna, PublicationDate_datetype_correct=true, PublicationDate_date_correct=true, ReviewDate_datetype_correct=true, ReviewDate_date_correct=true, total_datetypes_correct=2, total_dates_correct=2, total_datetypes=2

The result would be two data points from sum(total_dates_correct) / sum(total_datetypes):
one where TOTAL_ACCURACY = 2/6 (from all the events with the timestamp 10/17/19 3:18:48.000 PM) and one another where TOTAL_ACCURACY = 3/7 (from all events including both timestamp 10/17/19 3:18:48.000 PM and 10/23/19 10:37:01.000 AM)

How would I do this? This query does not work because TOTAL_ACCURACY only reflects the accuracy up to the current timestamp:

| eventstats sum(total_dates_correct) as TOTAL_CORRECT | eventstats sum(total_datetypes) as TOTAL | eval TOTAL_ACCURACY = (TOTAL_CORRECT / TOTAL) * 100

Ideally, I would like to create a dashboard that shows the accuracy over the time range that is changeable in the search.

Thanks

woodcock · ‎10-28-2019

Why did you abandon this Q&A without even commenting on the efficacy of the answers?
https://answers.splunk.com/answers/778366/find-interval-sums-of-a-field-and-display-it-in-a.html#ans...

to4kawa · ‎10-24-2019

| stats count
| eval raw="10/23/19 10:37:01.000 AM urlupdateid=6, urlid=1, payer=Aetna, ReviewDate_datetype_correct=true, ReviewDate_date_correct=true, total_datetypes_correct=1, total_dates_correct=1, total_datetypes=1
10/17/19 3:18:48.000 PM urlupdateid=5, urlid=1, payer=Aetna, RetiredDate_datetype_correct=false, RetiredDate_date_correct=false, total_datetypes_correct=0, total_dates_correct=0, total_datetypes=1
10/17/19 3:18:48.000 PM urlupdateid=4, urlid=1, payer=Aetna, EffectiveDate_datetype_correct=true, EffectiveDate_date_correct=false, total_datetypes_correct=1, total_dates_correct=0, total_datetypes=1
10/17/19 3:18:48.000 PM urlupdateid=3, urlid=1, payer=Aetna, EffectiveDate_datetype_correct=false, EffectiveDate_date_correct=false, total_datetypes_correct=0, total_dates_correct=0, total_datetypes=1
10/17/19 3:18:48.000 PM urlupdateid=2, urlid=1, payer=Aetna, ReviewDate_datetype_correct=true, ReviewDate_date_correct=false, total_datetypes_correct=1, total_dates_correct=0, total_datetypes=1
10/17/19 3:18:48.000 PM urlupdateid=1, urlid=1, payer=Aetna, PublicationDate_datetype_correct=true, PublicationDate_date_correct=true, ReviewDate_datetype_correct=true, ReviewDate_date_correct=true, total_datetypes_correct=2, total_dates_correct=2, total_datetypes=2"
| makemv delim="
" raw
| mvexpand raw
| rex field=raw "(?<time>^.+(AM|PM))"
| rex field=raw "total_dates_correct=(?<total_dates_correct>\d), total_datetypes=(?<total_datetypes>\d)"
| eval _time=strptime(time,"%m/%d/%y %H:%M:%S.%3Q")
| table _time total_dates_correct total_datetypes
`comment("this is sample data")`
| timechart span=1d sum(total_dates_correct) as TOTAL_CORRECT sum(total_datetypes) as TOTAL 
| eval TOTAL_ACCURACY = (TOTAL_CORRECT / TOTAL) * 100

Hi, this is sample query.

For example, search period is one month,
If you change span = to 2week, you will get the expected results.
Wouldn't it be nice to make it a dashboard and give it as a token?

aberkow · ‎10-23-2019

Have you tried the streamstats command? This is a rolling computation, so you can sum that value for each row, and evaluate the total accuracy for each row (not just the 6th and 7th), and use that as the value for the dashboard over time.

The docs can be found here: https://docs.splunk.com/Documentation/Splunk/7.3.2/SearchReference/Streamstats#Basic_examples but the syntax is quite similar to eventstats (i.e. | streamstats sum(x)). There are some other cool optional arguments around when you want to stop rolling that sum, so you should definitely check out the docs.

Also as a small syntax note: you could have done both of your eventstats computations in one command call, similar to a stats (i.e.: | eventstats sum(total_dates_correct) as TOTAL_CORRECT, sum(total_datetypes) as TOTAL will also work, because you aren't doing any additional split by clauses that might have different buckets between the two).

Let me know if this helps/you have any other questions!

ruhtraeel · ‎10-23-2019

The only problem with this is that it wouldn't take weighting into account. If I calculate the total accuracy for each row using streamstats and then average that out, I'd run into this problem:

Event 1:
Accuracy: 100%, total dates = 1, dates correct = 1

Event 2:
Accuracy: 0%, total dates = 10, dates correct = 0

Accuracy:
50% (even though it should be 1/11 = 9.1%)

aberkow · ‎10-23-2019

With streamstats? It should weight, because the sum of total dates would be 11 and the sum of total dates correct would be 1.

What I was thinking:
base search... | streamstats sum(total_dates_correct) as totalDatesCorrectRunning, sum(total_date_types) as totalDatesRunning | eval runningAverage=totalDatesCorrectRunning/totalDatesRunning
This generates an average which should be correctly weighted by day, is this still not the case?

ruhtraeel · ‎10-28-2019

I think you're right; I tested out that query, and it seems to keep track of the running average correctly. I've encountered another problem; how would I plot this as a chart, with X being the time and Y being the runningAverage? I tried doing this:
| table _time runningAverage
But this messes up the _time column, as this makes the earliest date (10/17/19 3:18:48.000 PM) have the latest running average (42%), and the most recent event have a running average of 100% (which is actually the running average after the first event)

Ideally, there would be two points in the graph, one representing the final running average on 10/17/19
3:18:48.000

and another point with the running average on
10/23/19
10:37:01.000

and another with the running average on

aberkow · ‎10-28-2019

Can you add a sort on the _time field before you run streamstats? This should persist through:

| sort _time asc

The params you can play around with for sort are +/-, asc/desc, and the limit on how many fields to sort: https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/sort#Syntax

Splunk calculate sequential sum for every timestamp

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Value Insights: Now Generally Available in the CMC

What’s New in Splunk AI: Volume 02

Splunk App Dev Quarterly Roundup: AI, Agents, and Innovation!

Join the Conversation