topic Re: Data cleaning for better performance and usage in Monitoring Splunk

Data cleaning for better performance and usage

guillain — Sat, 06 Jun 2020 02:07:21 GMT

Hello people,

I try to figure out a design for the metric indexing with the following constrainst:
- keep the original raw data
- availability of the metrics (ok for 15/30min)
- high number of indexes and TB by day
- lot of data manipulation for metric name and format alignement (factor the volume)
- high search complexity (accross many indexes...)

In that case, what do you suggest as what I've in mind is not really good...?
- lookup: with the volume, the data manipulation and the search it's not sure to have a good performance result
- kafka: add design complexity (ms, infra...) and imply to rewrite the current transformation rules
- transformation during the indexing: it's not recommanded and it doesn't match with the need to keep the original raw
- reindexing: data in new indexes will duplicate cost (infra but splunk lic also?) and increase the delay to have the metrics

Thanks in advance for your help and enjoy your weekend 🙂

Re: Data cleaning for better performance and usage

skalliger — Fri, 24 May 2019 21:00:39 GMT

Do a savedsearch that runs mcollect (doc reference) into a metrics index. Give this metrics index the desired retention time.
Also, docs: Create metrics indexes

Skalli

Re: Data cleaning for better performance and usage

guillain — Mon, 27 May 2019 07:22:26 GMT

Ok and thanks for the advice.
Do they icnrease the cost as the mcollect command will "convert events into metric data to be stored in a metric index on the search head" ?

Someone as proposed me to use ES+data Model / SIEM to make the job but not sure that it will reply to my expectation. From my understanding it's more to do the metric analytics than clean and format metrics. What do you think?

Re: Data cleaning for better performance and usage

skalliger — Tue, 28 May 2019 16:33:19 GMT

ES doesn't really use metrics. You could build a Data Model and use your own accelerated one for custom dashboards but ES requires an additional license. If you don't have an ES running, you won't need it just for some metrics.
Also, Splunk Enterprise also has the option to build data models and accelerate them. Docs: About data models