Solved: Should Metrics Indexes support overwriting events ...

rjthibod · ‎09-29-2017

In Splunk 7.0.0, when sending data to a metrics index, it looks like one can send duplicate metric measurement events (e.g., the same tuple of time, metric name, and dimensions) and the metric index will store all duplicates, thereby affecting the statistics that come out.

Is that the intended behavior for the metric index?

Other time-series metric stores/indexes/dbs I have played with use overwrite/last-in logic that only preserves the most-recently indexed value for a given metric tuple. Using similar logic here would seem to make more sense for the the use cases I would see for the metric store, but I freely admit to making assumptions.

Please clarify how allowing duplicate metric events is intended to be used / handles.

Note, my understanding of a distinct metric tuple is the timestamp (to milliseconds), metric name, and dimension fields. So, assuming you see the following two metric tuples that arrive at the indexer at different times (the first column), only the later one (the top row) would be saved in the index. Right now (as of Splunk 7.0.0), both are saved in the metrics index/store.

| indexing timestamp |   metric timestamp  |    metric name  |  metric value |   server
|   1506708015.390   |    1506708000.000   | server.power.kw |    126.06     | na-server-1
|   1506708010.242   |    1506708000.000   | server.power.kw |    104.56     | na-server-1

Additional Comments after posting

The example data I provided above is simply made-up in order to simplify the discussion. Don't interpret it as relevant to the question - it's just an example.

Some points to consider:

At least two other time-series databases for metrics don't allow duplicate events: InfluxDB and OpenTSDB. Haven't fully evaluated others (e.g., DataDog), just using these as examples that I know of.
Splunk's documentation openly says individual events are not really relevant in metric indexes,
- you cannot filter or search on the metric value field in mstats
- delete command doesn't work for metrics ( i.e., you can't delete duplicates if they happen)
By allowing duplicate tuples {timestamp, metric name, dimensions} in metrics indexes, backfilling via saved searches or resending of metrics becomes very, very difficult. Backfilling using metrics distilled from event indexes is very easy if you use write-last / last-in / overwriting logic.
Running aggregate metric sources (like my example above - total power consumed in an hour) become very challenging with current, duplicate metric logic
Clustered environments raise the risk of getting duplicate events in the face of delays / blocked queues and resent events

So, maybe best to sum up as this question: is the Splunk metrics index feature intended to work like other time-series databases with strict write logic limitations, or is it an optimized / pared-down version of the standard Splunk event index?

mattymo · ‎10-11-2017

Hey RT, sorry for the delay.

The metrics indexes DO NOT have logic for guarding against duplicate events (beyond the use of SPL), but I have circulated this post and it's thoughts and am always happy to relay any issues or concerns seen in the community and with customers to the DEV/ENG teams!

You know where to find me 😉

- MattyMo

View solution in original post

jeffland · ‎11-18-2023

For anyone interested, there is an idea for this (current status is "Future Prospect"): https://ideas.splunk.com/ideas/EID-I-486.

ScottABachmann · ‎03-05-2020

This is my problem as well. I run a daily report to generate metrics from a larger index. (I need the raw information for some analytics, but only the metrics for others, and the metrics queries are faster this way). If the report is ever run twice (by error, or restart, or an external reason) that day's metrics are forever no longer valid, and no way obvious way to filter out the duplicates.

Because the metric is generated from 24 hours of input raw data, it can't be generated at intake time. As far as I know.

And to counter the answers above, if these metrics were external, and were input more than once, it would be the same problem.

In a perfect world, the duplicates should never occur, but it would be safer and superior if there was an option to update a metric instead of duplicate it.

mattymo · ‎10-11-2017

Hey RT, sorry for the delay.

The metrics indexes DO NOT have logic for guarding against duplicate events (beyond the use of SPL), but I have circulated this post and it's thoughts and am always happy to relay any issues or concerns seen in the community and with customers to the DEV/ENG teams!

You know where to find me 😉

- MattyMo

rjthibod · ‎10-12-2017

Matt, thanks for sharing. The concerns/issues at this point are the ones I have detailed in the original post.

mattymo · ‎10-12-2017

Thanks as always for sharing RT! Keep the feedback coming as it will help shape the best features possible!

- MattyMo

rjthibod · ‎10-11-2017

Any updates or input Splunk ppl?

mattymo · ‎09-29-2017

hey RT,

was this an outlier/mistake in the data collection? what condition would cause a poller to generate 2 values for the same metric?

It brings up a great item for us to ensure expected behaviour is well known. Based on your test I would suggest we made the decision to retain all events we see.

I would think SPL provides a pretty robust protection mechanism against this with something like:

| mstats pretsats=t latest(_value) AS _value WHERE metric_name=server.power.kw span=60s | timechart span=1m avg(_value)

will get back with the official answer when I get it!

- MattyMo

rjthibod · ‎09-30-2017

You are correct that latest would work for the example, but that side-steps the bigger question about the feature's intent. See the updated question for more contenxt.

DalJeanis · ‎09-29-2017

Given the above data, I would expect that the two events both contain valid information. If you want to know the average metric value for that metric timestamp, you would need them both.

rjthibod · ‎09-30-2017

The specifics of the example are not really the point, and I am not sure that duplicates do make sense in the example (running aggregation of total power consumed). See my updated comments in the original post for more context about what I am really trying to get to.

Should Metrics Indexes support overwriting events instead of duplicating events

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

Monitoring Amazon Elastic Kubernetes Service (EKS)

Cloud Platform & Enterprise: Classic Dashboard Export Feature Deprecation