We were investigating some indexes that have low RAW to Index Ratio and came across _audit whose RAW to Index Ratio is 0.81:1. At first glance, _audit seemed a good candidate to learn how to find out if an index has high cardinality and what we can do about it (like tsidx reduction along with bloomfilters). First is not frequently searched to test tsidx reduction or bloomfilters, moreover, it is an index that everyone has in their Splunk installations so, we could benefit from common knowledge. We came accross the following numbers about cardinality by taking a sample of the data and using tstats and walklex: earliest latest number of events keywords in lexicon min number of events per keyword keywords with min number of events percentage of keywords with min number of events 18/09/2021 00:00 22/09/2021 24:00 5721945 6764698 10 176669 2,61% Just by looking at the above table it is hard to tell if we are in front of an index whose data changes a lot or not. What is considered a high cardinality index? It would be awesome to have some reference numbers but i was not able to find them anywhere. Q1: Do we have any reference numbers that once compared to, would unequivocally tell us either or not the bucket is an high cardinality one? Nonetheless, should we expect Raw to Index Ration to drop bellow 1:1? Then we went through and inspected the size of the tsidx files against the size of the buckets indexer vm bucketsize_bytes tsidxsize_bytes bucketsize_megabytes tsidxsize_megabytes A 1317675655 933241652 1257 890 B 1321231309 892793498 1260 851 C 1464189620 1003103122 1396 957 D 1538519922 1045951037 1467 997 E 901792050 609003289 860 581 F 1417458990 929185810 1352 886 G 1591446741 1167724482 1518 1114 H 1497574135 1009380670 1428 963 totals 11049888422 7590383560 Results show that the tsidx files take around ~69% of the overall disk space needed to store the _audit index in the indexers Q2: Once again, is this a sign of high cardinality? Q3: Lastly, any SEGMENTATION config that is commonly applied to _audit index?
... View more