Splunk Enterprise

KV store scalability

yoho
Contributor

I'm building an app where Splunk is receiving a large number of IDs and a property that I will need to sum over time.  Let's take for instance a simple example:

_timeidamount
00:00A1
00:01C8
01:01B2
01:02A4

 

At 01:03, a user asks "What is the sum of amount per id, for each id you've seen in the last 15 minutes" ? Take into account this is a very limited example, but you will have millions of unique ids and a big event rate (hundreds of events per second). The expected answer is:

A5
B2

 

The first reflex is something like this: "stats sum(amount) earliest=0 by id [ |search id ]" over the last hour. But it's not really scable since, over time, the first part of the query will need to sum a lot of events.

Then the second thought was to add an intermediary summary index which is doing the sum(amount) over a small period of time (15 min) and keep the result. Yes, it's accelarating but after a year or so, I will still have performance / scalability issues.

In the end, then only thing I need is to keep the last value "sum(amount)" per id and continue counting based on this value every time you receive a new event. That's why I'm wondering if we could use the kvstore to keep counting, everytime we see a record, we simply update the records with :

  • key = id
  • value = value(id) + amount

Anyone having a similar experience with KV Store or having a similar issue ?

Labels (1)
0 Karma

yoho
Contributor

I've made a mistake but can't change it in my message: you should read "over the last 15 minutes" instead of  "over the last hour" in the sentence below the second table

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...