Is there a better way to improve performance using...

ranurag · ‎09-30-2019

We have a data model which has following fields -

Source IpAddress FileName FileVersion Flag _time
S1 IP1 File1 FileVersion1 Flag1 _time1
S1 IP1 File1 FileVersion1 Flag2 _time2
S1 IP1 File1 FileVersion1 Flag3 _time3
S1 IP1 File1 FileVersion1 Flag4 _time4

There are more than 10 million FileVersion(s) in data and assuming 2 Flag(s) for each gives us ~20 million events in data model.

The requirement is to get the latest Flag for each FileVersion and then show a count of FileVersion(s) by Flag. So the output is something like this:

Flag Count Other columns
Flag1 11,232 ...
Flag2 67,764 ...
...

We are using query similar to this (execution time ~600sec):
|tstats latest(Flag) as Flag where datamodel=xxx by Source, IpAddress, FileName, FileVersion
|stats count by Flag, Source, IpAddress, FileName

The problem is that tstats is taking long time due to high data cardinality. We even tried using prestats="t" but it does not help much (~10% performance increase).

Another caveat is that new Flag for FileVersion can flow in at any time and we need to show the counts based on latest Flag, so creating summary index is not feasible (we will have to run the summary index generating search very frequently and scan full index)

Is there any way we can improve the performance of the query or any better way to achieve the requirement.

Is there a better way to improve performance using tstats when I need two aggregations and first aggregation returns ~10 million events at high cardinality.

search performance

Introducing the Splunk Community Dashboard Challenge!

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer Certification at ...