Best Practices for Handling High-Cardinality Dimen...

grunt

We are using a metrics index to store metric events. These metric events are linked to a different parent dataset through a unique ID dimension. This ID dimension can have tens of thousands of unique values, and the parent dataset primarily consists of string values.

Given the cardinality issues associated with metric indices (where it's best to avoid dimensions with a large range of unique values), what would be the best practice in this scenario?
https://docs.splunk.com/Documentation/Splunk/latest/Metrics/BestPractices#Cardinality_issues

Would it be a good idea to use a key-value store (kvstore) for the parent data and perform lookups from the metric data? How would this approach impact performance?

Brett

Every bucket has to store every dimension value once, so if you are using a million unique IDs to reference combinations of less than a million unique dimension strings, you are making the situation worse.

Using KV Store is a great idea for repetitive asset information, like adding context to a hostname, but in this situation you should still store the meaningful unique identifier (hostname) as a dimension.

I believe your best solution will be some combination of dimensions and KV Store to enrich them, but don't go 100% in either direction, and if you start creating new unique keys to make it work I think it's going too far.

The only other suggestion I have is if you have large logic groups of systems without overlapping dimensions, you could put them into separate indexes and use wildcards in your index filter to access them all. Will keep the TSIDX smaller and performance higher.

isoutamo

@Brett have you any answers to this?

Best Practices for Handling High-Cardinality Dimensions in Metric Indices?

data

other

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

Index This | What are the 12 Days of Splunk-mas?

Get Inspired! We’ve Got Validation that Your Hard Work is Paying Off