KV store scalability

yoho · ‎06-22-2020

I'm building an app where Splunk is receiving a large number of IDs and a property that I will need to sum over time. Let's take for instance a simple example:

_time	id	amount
00:00	A	1
00:01	C	8
01:01	B	2
01:02	A	4

At 01:03, a user asks "What is the sum of amount per id, for each id you've seen in the last 15 minutes" ? Take into account this is a very limited example, but you will have millions of unique ids and a big event rate (hundreds of events per second). The expected answer is:

A	5
B	2

The first reflex is something like this: "stats sum(amount) earliest=0 by id [ |search id ]" over the last hour. But it's not really scable since, over time, the first part of the query will need to sum a lot of events.

Then the second thought was to add an intermediary summary index which is doing the sum(amount) over a small period of time (15 min) and keep the result. Yes, it's accelarating but after a year or so, I will still have performance / scalability issues.

In the end, then only thing I need is to keep the last value "sum(amount)" per id and continue counting based on this value every time you receive a new event. That's why I'm wondering if we could use the kvstore to keep counting, everytime we see a record, we simply update the records with :

key = id
value = value(id) + amount

Anyone having a similar experience with KV Store or having a similar issue ?

yoho · ‎09-18-2020

I've made a mistake but can't change it in my message: you should read "over the last 15 minutes" instead of "over the last hour" in the sentence below the second table

KV store scalability

administration

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

Are you a member of the Splunk Community?

KV store scalability

administration

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...