Splunk Search
Highlighted

Multiple stat intervals over a large dataset/summary index

Communicator

We have logging with user data for the requests each use does. We have created some averages and dashboards with this data but the search is very slow, we are looking to do some trends so time range is 30+ days, if our summary index storage is small this could go out longer. I currently have a lookup table doing a weekly mean, std dev, and a few other calcs; but we want the option to look into one day also, so we are looking to implement a summary index.

At this time I would not have any true stat calculations in my summary index. I feel they would come after the fact since I cant do a 7,30, 90 day average in one stats command; if this is possible that would be best I assume.

| table client,host,_time, component, operation, user, response_time

My thought was to have this summary index based on the table which will get us less data (but no calculations), and then implement the stat commands necessary. Is there a better way to go about this task? Ideally we want some stat calculations as I mentioned, mean, std dev but we want to trend it over time; so a few time intervals would be good. Is this the best way to handle this or would there be a better scenario? I have a request that the rendering times be fast as possible since the dashboards are interactive.

0 Karma
Highlighted

Re: Multiple stat intervals over a large dataset/summary index

Esteemed Legend

Yes, I would do a summary index but also make it a metrics index so that it will be even faster. The whole point of a summary index is to incorporate some degree of aggregation so figure out your minimum required granularity, which many times is hourly and do an hourly stats every hour and dump that into the summary index.

0 Karma
Highlighted

Re: Multiple stat intervals over a large dataset/summary index

Communicator

So with my initial thought there would not be any aggregation, just a table. We have a lot of noise in this logging so I wanted to strip out only the 7 or so datapoints. I planned on doing the aggregation in a lookup as it would be smaller and run on the weekend. We planned on having this in close to real time to monitor the responsiveness of our application. Having this generate in close to real-time is best and then use the lookups I planned to create as some type of baseline. Would this actually be a good use of a summary index or would it not make sense. The averaged data we are not using a lot and having just a quick few data points summarized in a lookup would fit my need fine.

0 Karma
Highlighted

Re: Multiple stat intervals over a large dataset/summary index

Esteemed Legend

If that's the case, I would eliminate the noise on the way in, before it ever gets indexed. Another option is to create an Accelerated Data Model that includes only the table fields that you need.

0 Karma
Highlighted

Re: Multiple stat intervals over a large dataset/summary index

Communicator

@woodcock I am really the only real end users; I wouldnt have access to change anything on the way in. I shouldn't say really it is noise but it is other lines of data I do not care about with this effort which is application response times. I will look at the accelerated data model. My thought was the stripped out table as its own index would get me only the data I making the runtime faster for less than 7 days. Then the longer term I would be averaging everything daily or weekly for trending.

0 Karma