Splunk Enterprise

Low Raw To Index Ratio (_audit)

PT_crusher
Explorer

We were investigating some indexes that have low RAW to Index Ratio and came across _audit whose RAW to Index Ratio is 0.81:1.

Screenshot 2021-09-23 at 12.30.02.png

At first glance, _audit seemed a good candidate to learn how to find out if an index has high cardinality and what we can do about it (like tsidx reduction along with bloomfilters). First is not frequently searched to test tsidx reduction or bloomfilters, moreover, it is an index that everyone has in their Splunk installations so, we could benefit from common knowledge. 

We came accross the following numbers about cardinality by taking a sample of the data and using tstats and walklex:

earliestlatestnumber of eventskeywords in lexiconmin number of events per keywordkeywords with min number of eventspercentage of keywords with min number of events
18/09/2021 00:0022/09/2021 24:0057219456764698101766692,61%

 

Just by looking at the above table it is hard to tell if we are in front of an index whose data changes a lot or not. What is considered a high cardinality index? It would be awesome to have some reference numbers but i was not able to find them anywhere.

Q1: Do we have any reference numbers that once compared to, would unequivocally tell us either or not the bucket is an  high cardinality one? Nonetheless, should we expect Raw to Index Ration to drop bellow 1:1?

Then we went through and inspected the size of the tsidx files against the size of the buckets

indexer vmbucketsize_bytestsidxsize_bytesbucketsize_megabytestsidxsize_megabytes
A13176756559332416521257890
B13212313098927934981260851
C146418962010031031221396957
D153851992210459510371467997
E901792050609003289860581
F14174589909291858101352886
G1591446741116772448215181114
H149757413510093806701428963
totals110498884227590383560  


Results show that the tsidx files take around ~69% of the overall disk space needed to store the _audit index in the indexers

Q2: Once again, is this a sign of high cardinality?

Q3: Lastly, any SEGMENTATION config that is commonly applied to _audit index? 

 

0 Karma
Get Updates on the Splunk Community!

Observability Highlights | November 2022 Newsletter

 November 2022Observability CloudEnd Of Support Extension for SignalFx Smart AgentSplunk is extending the End ...

Avoid Certificate Expiry Issues in Splunk Enterprise with Certificate Assist

This blog post is part 2 of 4 of a series on Splunk Assist. Click the links below to see the other ...

Using Machine Learning for Hunting Security Threats

REGISTER NOW Seeing the exponential hike in global cyber threat spectrum, organizations are now striving more ...