In Splunk 7.0.0, when sending data to a metrics index, it looks like one can send duplicate metric measurement events (e.g., the same tuple of time, metric name, and dimensions) and the metric index will store all duplicates, thereby affecting the statistics that come out.
Is that the intended behavior for the metric index?
Other time-series metric stores/indexes/dbs I have played with use overwrite/last-in logic that only preserves the most-recently indexed value for a given metric tuple. Using similar logic here would seem to make more sense for the the use cases I would see for the metric store, but I freely admit to making assumptions.
Please clarify how allowing duplicate metric events is intended to be used / handles.
Note, my understanding of a distinct metric tuple is the timestamp (to milliseconds), metric name, and dimension fields. So, assuming you see the following two metric tuples that arrive at the indexer at different times (the first column), only the later one (the top row) would be saved in the index. Right now (as of Splunk 7.0.0), both are saved in the metrics index/store.
| indexing timestamp | metric timestamp | metric name | metric value | server
| 1506708015.390 | 1506708000.000 | server.power.kw | 126.06 | na-server-1
| 1506708010.242 | 1506708000.000 | server.power.kw | 104.56 | na-server-1
Additional Comments after posting
The example data I provided above is simply made-up in order to simplify the discussion. Don't interpret it as relevant to the question - it's just an example.
Some points to consider:
mstats
delete
command doesn't work for metrics ( i.e., you can't delete duplicates if they happen)So, maybe best to sum up as this question: is the Splunk metrics index feature intended to work like other time-series databases with strict write logic limitations, or is it an optimized / pared-down version of the standard Splunk event index?
Hey RT, sorry for the delay.
The metrics indexes DO NOT have logic for guarding against duplicate events (beyond the use of SPL), but I have circulated this post and it's thoughts and am always happy to relay any issues or concerns seen in the community and with customers to the DEV/ENG teams!
You know where to find me 😉
For anyone interested, there is an idea for this (current status is "Future Prospect"): https://ideas.splunk.com/ideas/EID-I-486.
This is my problem as well. I run a daily report to generate metrics from a larger index. (I need the raw information for some analytics, but only the metrics for others, and the metrics queries are faster this way). If the report is ever run twice (by error, or restart, or an external reason) that day's metrics are forever no longer valid, and no way obvious way to filter out the duplicates.
Because the metric is generated from 24 hours of input raw data, it can't be generated at intake time. As far as I know.
And to counter the answers above, if these metrics were external, and were input more than once, it would be the same problem.
In a perfect world, the duplicates should never occur, but it would be safer and superior if there was an option to update a metric instead of duplicate it.
Hey RT, sorry for the delay.
The metrics indexes DO NOT have logic for guarding against duplicate events (beyond the use of SPL), but I have circulated this post and it's thoughts and am always happy to relay any issues or concerns seen in the community and with customers to the DEV/ENG teams!
You know where to find me 😉
Matt, thanks for sharing. The concerns/issues at this point are the ones I have detailed in the original post.
Thanks as always for sharing RT! Keep the feedback coming as it will help shape the best features possible!
Any updates or input Splunk ppl?
hey RT,
was this an outlier/mistake in the data collection? what condition would cause a poller to generate 2 values for the same metric?
It brings up a great item for us to ensure expected behaviour is well known. Based on your test I would suggest we made the decision to retain all events we see.
I would think SPL provides a pretty robust protection mechanism against this with something like:
| mstats pretsats=t latest(_value) AS _value WHERE metric_name=server.power.kw span=60s
| timechart span=1m avg(_value)
will get back with the official answer when I get it!
You are correct that latest
would work for the example, but that side-steps the bigger question about the feature's intent. See the updated question for more contenxt.
Given the above data, I would expect that the two events both contain valid information. If you want to know the average metric value for that metric timestamp, you would need them both.
The specifics of the example are not really the point, and I am not sure that duplicates do make sense in the example (running aggregation of total power consumed). See my updated comments in the original post for more context about what I am really trying to get to.