Getting Data In

Performance / Design recommendations for dimensions in Metrics Index

rjthibod
Champion

Does Splunk have any guidelines or limitations on the number of dimensions (i.e., cardinality) that the new Metrics Index supports?

Are there specific limitations in terms of the number of dimensions or unique values of a single dimension or unique combinations of dimensions for a single measurement?

I understand that Splunk's searching and indexing performance is always contingent on the hardware / platform. Just wanting to see if there are any hard limits built into the design of the Metrics Index or a configuration threshold, or even better, can Splunk provide some benchmarks about data sets they have tested?

I have seen other metric stores / time-series databases enforce these kinds of limits (in configuration settings), hence the question.

0 Karma

hsesterhenn_spl
Splunk Employee
Splunk Employee

Hi,

unfortunately the recordings of the keynote sessions of .conf 2017 are not available, yet. (And the metrics sessions as well).

It looks like you can do 50k eps+ per indexer and this number scales well over the number of indexers (10 indexers approx 500k eps)!
https://docs.splunk.com/Documentation/Splunk/7.0.0/Metrics/Performance

A single metrics (measurement) needs two fields: _value and metric_name. But without dimensions (every other field!) you can't filter/aggregate it for statistics (and MSTATS is using dimensions heavily). Host and source are automatically added and available as dimensions.

Regarding data on disk: you always trade speed versus other things... metrics are stored in TSIDX (take a look at "splunk cmd walklex").
It's a little bit like using INDEXED_EXTRACTIONS. You create search time fields which can be used in a TSTATS query (which is up to 1000x faster than having RAW events and doing extractions at search time).

Caveat: you need more storage.

HTH,

Holger

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Recordings for the metrics sessions are available at http://conf.splunk.com/sessions/2017-sessions.html#search=metrics&

---
If this reply helps you, an upvote would be appreciated.
0 Karma

hsesterhenn_spl
Splunk Employee
Splunk Employee

Unfortunately most of the interesting ones are currently not available.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

I was able to download 226 recordings. That's pretty much all of them. For an easy way to download all of the sessions, try this.

curl --silent http://conf.splunk.com/sessions/2017-sessions.html 2>&1 | egrep -i speaker-file | wget -B http://conf.splunk.com -F -i - --continue

---
If this reply helps you, an upvote would be appreciated.
0 Karma

rjthibod
Champion

I was at the keynote, so no need to worry about that - there wasn't anything useful about metrics.

I understand the basics of indexing, tsidx files, and what fields metrics indexes require - I have already built a custom sourcetype for metrics indexes.

I am specifically asking about dimensions. Is there a limit or a suggestion from Splunk about how many dimensions (the cardinality of the index), or the unique sequences/combinations of dimensions that the Metrics index supports? This is a concern in other time-series databases, in fact some of them put in configuration parameters to limit these exact things.

Does Splunk think that more than 5 dimensions in a measurement going to cause problem with scaling? If I have over 1 Million unique values across those 5 dimensions, is that going to cause a significant problem unless I am running on a 16 GB machine? etc...

0 Karma

hsesterhenn_spl
Splunk Employee
Splunk Employee

Expect more official details in the docs the next couple of months...

Let me show you an example I did on my local instance:

curl -k https://localhost:8088/services/collector -H "Authorization: Splunk token-XXXX" -d '{"time": 1503209999.111,"event":"metric","source":"disk","host":"host_99","fields":{"region":"us-west-1","datacenter":"us-west-1a","rack":"63","os":"Ubuntu16.10","arch":"x64","team":"LON","service":"6","service_version":"0","service_environment":"test","path":"/dev/sda1","fstype":"ext3","_value":1099511627776,"metric_name":"total"}}'

This single measurement results in 11 dimensions.

See 'splunk cmd walklex ./var/lib/splunk//db/yyyy.tsidx "" | less

my needle:
0 1 arch::x64
1 1 datacenter::us-west-1a
2 1 fstype::ext3
3 1 host::host_99
4 1 metric_name::total
5 1 os::Ubuntu16.10
6 1 path::/dev/sda1
7 1 rack::63
8 1 region::us-west-1
9 1 service::6
10 1 service_environment::test
11 1 service_version::0
12 1 source::disk
13 1 sourcetype::metrics_hse
14 1 team::LON
15 1 _catalog::total|arch|datacenter|fstype|os|path|rack|region|service|service_environment|service_version|team
16 1 _dims::arch
17 1 _dims::datacenter
18 1 _dims::fstype
19 1 _dims::os
20 1 _dims::path
21 1 _dims::rack
22 1 _dims::region
23 1 _dims::service
24 1 _dims::service_environment
25 1 _dims::service_version
26 1 _dims::team
27 1 _subsecond::.111
28 1 arch::x64
29 1 datacenter::us-west-1a
30 1 fstype::ext3
31 1 host::host_99
32 1 metric_name::total
33 1 os::ubuntu16.10
34 1 path::/dev/sda1
35 1 rack::63
36 1 region::us-west-1
37 1 service::6
38 1 service_environment::test
39 1 service_version::0
40 1 source::disk
41 1 sourcetype::metrics_hse
42 1 team::lon

The "_dims" are used in "| mstats". Did this with 1 millions data points.... search time approx 1 sec on a MacBook....

Does it make more sense for you?

Holger

0 Karma

rjthibod
Champion

Yes, this makes sense are things I have already done with my own custom sourcetypes for metrics.

The issue is what Splunk going to say are the best practices, recommendations, limitations of dimensions and data sets.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

A little over 5 dimensions probably won't matter much. You can probably go a lot over 5. As in all things Splunk, however, you should test it on your dev system first.

---
If this reply helps you, an upvote would be appreciated.
0 Karma

richgalloway
SplunkTrust
SplunkTrust

There are no published limits, but they've been tested with some pretty high numbers. Higher numbers reduce performance so try to keep the number of dimensions as low as you can.

---
If this reply helps you, an upvote would be appreciated.
0 Karma

rjthibod
Champion

When you say higher numbers, are you referring number of dimensions (columns) or unique values for dimensions?

0 Karma

richgalloway
SplunkTrust
SplunkTrust

My information does not specify.

---
If this reply helps you, an upvote would be appreciated.
0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!