Reporting

What are some of the Best Practices Recommendation for Data Model Acceleration vs Metrics?

pramaswamy
Path Finder

I have about 1TB of machine data as CSV files primarily feeding health/metric data about the different-sub systems in machines ( cpu stats, io stats, san stats etc..)

I wanted to build some summaries ( daily metric aggregations using stat commands, across about 100 servers ) for about 6 months, that perform really well, to get some top-level insights across the fleet ( raw data size ~ 1TB for all servers ). I plan to maintain data for rolling 6 months, where historic data beyond 6 months would be deleted periodically.

This link (http://docs.splunk.com/Documentation/Splunk/7.0.2/Knowledge/Aboutsummaryindexing) has some great suggestions on when to use report acceleration vs data model acceleration vs summary indexing. But is there a documentation that adds the new metrics store to the performance/aggregation comparison or a similar guide with points to consider?

The CSV I have is formatted in the following fashion

Timestamp | ServerName | DBStatValue | AppStatValue | IOStatValue | CacheStatValue ...( few other metrics )

Now, to comply with the "Metrics" store standard, I would have messaged the CSV's I receive from my data stores to the following format, if I am not mistaken, which adds a static "metric name" column, for each metric being collected.
Timestamp | ServerName | DBMetricName | DBMetricValue | AppMetricName | AppMetricValue | IOMetricName | IOMetricValue |CacheMetricName| CacheStatValue ...( few other metrics )

Few questions in my mind, where I need help.

  1. Should I be worried about "potentially" increasing the size of the file, by adding metric name components, that are going to add more system workload on Splunk to process historic dataset aggregation?
  2. For these type of historic metric aggregation use cases, when the Metrics data are fed from CSV, are there points to consider on when "Metrics" perform better vs when "data model acceleration/report acceleration summary" perform better?
  3. Is the anticipated performance gain from the "Metrics" store for this use case, justify the additional effort to transform every single CSV file ( historic and going forward...)?

I am just getting started with the implementation and have the luxury and flexibility to choose the path that would best support the described use case. Any tips, best practices or suggestions from community experts are greatly appreciated.

Get Updates on the Splunk Community!

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...

.conf24 | Personalize your .conf experience with Learning Paths!

Personalize your .conf24 Experience Learning paths allow you to level up your skill sets and dive deeper ...

Threat Hunting Unlocked: How to Uplevel Your Threat Hunting With the PEAK Framework ...

WATCH NOWAs AI starts tackling low level alerts, it's more critical than ever to uplevel your threat hunting ...