Getting Data In

Best Practices for Handling High-Cardinality Dimensions in Metric Indices?

grunt
New Member

We are using a metrics index to store metric events. These metric events are linked to a different parent dataset through a unique ID dimension. This ID dimension can have tens of thousands of unique values, and the parent dataset primarily consists of string values.

Given the cardinality issues associated with metric indices (where it's best to avoid dimensions with a large range of unique values), what would be the best practice in this scenario?
https://docs.splunk.com/Documentation/Splunk/latest/Metrics/BestPractices#Cardinality_issues 

Would it be a good idea to use a key-value store (kvstore) for the parent data and perform lookups from the metric data? How would this approach impact performance?

Labels (1)
0 Karma

Brett
SplunkTrust
SplunkTrust

Every bucket has to store every dimension value once, so if you are using a million unique IDs to reference combinations of less than a million unique dimension strings, you are making the situation worse.

Using KV Store is a great idea for repetitive asset information, like adding context to a hostname, but in this situation you should still store the meaningful unique identifier (hostname) as a dimension.

I believe your best solution will be some combination of dimensions and KV Store to enrich them, but don't go 100% in either direction, and if you start creating new unique keys to make it work I think it's going too far.

The only other suggestion I have is if you have large logic groups of systems without overlapping dimensions, you could put them into separate indexes and use wildcards in your index filter to access them all. Will keep the TSIDX smaller and performance higher.

isoutamo
SplunkTrust
SplunkTrust

@Brett have you any answers to this?

0 Karma
Get Updates on the Splunk Community!

Strengthen Your Future: A Look Back at Splunk 10 Innovations and .conf25 Highlights!

The Big One: Splunk 10 is Here!  The moment many of you have been waiting for has arrived! We are thrilled to ...

Now Offering the AI Assistant Usage Dashboard in Cloud Monitoring Console

Today, we’re excited to announce the release of a brand new AI assistant usage dashboard in Cloud Monitoring ...

Stay Connected: Your Guide to October Tech Talks, Office Hours, and Webinars!

What are Community Office Hours? Community Office Hours is an interactive 60-minute Zoom series where ...