Reporting

Running report from large amount of data

yuwtennis
Communicator

Hi!

When you are creating a report from millions of data, I believe using summary indexing is a good solution.

However , if you have a requirement as on demand, would this still be a solution? In my case, I need to create a report that is mixture of average,.sum ,.most frequent value ,.etc, this makea complicated.

I appreciate if someone can give me an advice.

Tags (2)

ShaneNewman
Motivator

I have several summary indexes that do this. The most important thing, should be fairly easy, is to figure out a time span. This is a saved search template I use to populate summary indexes capturing data you described above:

[savedsearchname]
    enableSched = 1
    cron_schedule = */5 * * * *
    dispatch.earliest_time = -8m@m
    dispatch.latest_time = -3m@m
    action.summary_index = 1
    action.summary_index._name = sum_index
    action.summary_index.stat_tag = statistics
    search = index=your_index sourcetype=your_sourcetype | bucket _time span=1m | sistats\stats avg(your_field) AS your_field_avg, median(your_field) AS your_field_median, mode(your_field) AS your_field_mode, count(your_field) AS your_field_count, dc(your_field) AS your_field_dc, max(your_field) AS your_field_max, min(your_field) AS your_field_min, stdev(your_field) AS your_field_stdev, var(your_field) as your_field_var by _time

You can use a macros.conf to make the search look cleaner, as I do. I just wrote it all out to show how to set up the values you need using sistats. As you can see above, this data is on (up to) an 8 minute delay from real-time. You can adjust the delay by changing the earliest_time and latest_time parameters.

Also, when getting data back out after using sistats, you will need to rerun the stats command to "reheat" the data for use.

0 Karma

ShaneNewman
Motivator

Easy enough, just use sub-searches in your search string. There is no real reason to create a temp index, you are just adding another failure point.

0 Karma

yuwtennis
Communicator

Hello ShaneNewman.

Thank you for the reply.
I did not mention but we have 3 indexes to summarize and
each has approximately

index A : 50,000,000 events (150,000 events indexed per day)
index B : 10,000,000 events (50,000 events indexed per day)
index C : 50,000 events (few events indexed per day)

and 31 summary items to calculate.
Some summary items needs to be calculated in different dimension thus we need to create search separately.

I believe I would need to create temporary summary index and then concatenate it to single daily summary index where user can use time modifier .

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...