Low Raw To Index Ratio (_audit)

PT_crusher · ‎09-23-2021

We were investigating some indexes that have low RAW to Index Ratio and came across _audit whose RAW to Index Ratio is 0.81:1.

At first glance, _audit seemed a good candidate to learn how to find out if an index has high cardinality and what we can do about it (like tsidx reduction along with bloomfilters). First is not frequently searched to test tsidx reduction or bloomfilters, moreover, it is an index that everyone has in their Splunk installations so, we could benefit from common knowledge.

We came accross the following numbers about cardinality by taking a sample of the data and using tstats and walklex:

earliest	latest	number of events	keywords in lexicon	min number of events per keyword	keywords with min number of events	percentage of keywords with min number of events
18/09/2021 00:00	22/09/2021 24:00	5721945	6764698	10	176669	2,61%

Just by looking at the above table it is hard to tell if we are in front of an index whose data changes a lot or not. What is considered a high cardinality index? It would be awesome to have some reference numbers but i was not able to find them anywhere.

Q1: Do we have any reference numbers that once compared to, would unequivocally tell us either or not the bucket is an high cardinality one? Nonetheless, should we expect Raw to Index Ration to drop bellow 1:1?

Then we went through and inspected the size of the tsidx files against the size of the buckets

indexer vm	bucketsize_bytes	tsidxsize_bytes	bucketsize_megabytes	tsidxsize_megabytes
A	1317675655	933241652	1257	890
B	1321231309	892793498	1260	851
C	1464189620	1003103122	1396	957
D	1538519922	1045951037	1467	997
E	901792050	609003289	860	581
F	1417458990	929185810	1352	886
G	1591446741	1167724482	1518	1114
H	1497574135	1009380670	1428	963
totals	11049888422	7590383560

Results show that the tsidx files take around ~69% of the overall disk space needed to store the _audit index in the indexers

Q2: Once again, is this a sign of high cardinality?

Q3: Lastly, any SEGMENTATION config that is commonly applied to _audit index?

Low Raw To Index Ratio (_audit)

configuration

troubleshooting

using Splunk Enterprise

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Index This | What travels the world but is also stuck in place?

Discover New Use Cases: Unlock Greater Value from Your Existing Splunk Data

Continue Your Journey: Join Session 2 of the Data Management and Federation Bootcamp ...

Join the Conversation

Low Raw To Index Ratio (_audit)

configuration

troubleshooting

using Splunk Enterprise

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Index This | What travels the world but is also stuck in place?

Discover New Use Cases: Unlock Greater Value from Your Existing Splunk Data

Continue Your Journey: Join Session 2 of the Data Management and Federation Bootcamp ...