Data cleaning for better performance and usage

guillain · ‎05-24-2019

Hello people,

I try to figure out a design for the metric indexing with the following constrainst:
- keep the original raw data
- availability of the metrics (ok for 15/30min)
- high number of indexes and TB by day
- lot of data manipulation for metric name and format alignement (factor the volume)
- high search complexity (accross many indexes...)

In that case, what do you suggest as what I've in mind is not really good...?
- lookup: with the volume, the data manipulation and the search it's not sure to have a good performance result
- kafka: add design complexity (ms, infra...) and imply to rewrite the current transformation rules
- transformation during the indexing: it's not recommanded and it doesn't match with the need to keep the original raw
- reindexing: data in new indexes will duplicate cost (infra but splunk lic also?) and increase the delay to have the metrics

Thanks in advance for your help and enjoy your weekend 🙂

skalliger · ‎05-24-2019

Do a savedsearch that runs mcollect (doc reference) into a metrics index. Give this metrics index the desired retention time.
Also, docs: Create metrics indexes

Skalli

guillain · ‎05-27-2019

Ok and thanks for the advice.
Do they icnrease the cost as the mcollect command will "convert events into metric data to be stored in a metric index on the search head" ?

Someone as proposed me to use ES+data Model / SIEM to make the job but not sure that it will reply to my expectation. From my understanding it's more to do the metric analytics than clean and format metrics. What do you think?

skalliger · ‎05-28-2019

ES doesn't really use metrics. You could build a Data Model and use your own accelerated one for custom dashboards but ES requires an additional license. If you don't have an ES running, you won't need it just for some metrics.
Also, Splunk Enterprise also has the option to build data models and accelerate them. Docs: About data models

Data cleaning for better performance and usage

indexing performance

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits

Join the Conversation

Data cleaning for better performance and usage

indexing performance

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits