Reporting

Running report from large amount of data

yuwtennis
Communicator

Hi!

When you are creating a report from millions of data, I believe using summary indexing is a good solution.

However , if you have a requirement as on demand, would this still be a solution? In my case, I need to create a report that is mixture of average,.sum ,.most frequent value ,.etc, this makea complicated.

I appreciate if someone can give me an advice.

Tags (2)

ShaneNewman
Motivator

I have several summary indexes that do this. The most important thing, should be fairly easy, is to figure out a time span. This is a saved search template I use to populate summary indexes capturing data you described above:

[savedsearchname]
    enableSched = 1
    cron_schedule = */5 * * * *
    dispatch.earliest_time = -8m@m
    dispatch.latest_time = -3m@m
    action.summary_index = 1
    action.summary_index._name = sum_index
    action.summary_index.stat_tag = statistics
    search = index=your_index sourcetype=your_sourcetype | bucket _time span=1m | sistats\stats avg(your_field) AS your_field_avg, median(your_field) AS your_field_median, mode(your_field) AS your_field_mode, count(your_field) AS your_field_count, dc(your_field) AS your_field_dc, max(your_field) AS your_field_max, min(your_field) AS your_field_min, stdev(your_field) AS your_field_stdev, var(your_field) as your_field_var by _time

You can use a macros.conf to make the search look cleaner, as I do. I just wrote it all out to show how to set up the values you need using sistats. As you can see above, this data is on (up to) an 8 minute delay from real-time. You can adjust the delay by changing the earliest_time and latest_time parameters.

Also, when getting data back out after using sistats, you will need to rerun the stats command to "reheat" the data for use.

0 Karma

ShaneNewman
Motivator

Easy enough, just use sub-searches in your search string. There is no real reason to create a temp index, you are just adding another failure point.

0 Karma

yuwtennis
Communicator

Hello ShaneNewman.

Thank you for the reply.
I did not mention but we have 3 indexes to summarize and
each has approximately

index A : 50,000,000 events (150,000 events indexed per day)
index B : 10,000,000 events (50,000 events indexed per day)
index C : 50,000 events (few events indexed per day)

and 31 summary items to calculate.
Some summary items needs to be calculated in different dimension thus we need to create search separately.

I believe I would need to create temporary summary index and then concatenate it to single daily summary index where user can use time modifier .

0 Karma
Get Updates on the Splunk Community!

Celebrating Fast Lane: 2025 Authorized Learning Partner of the Year

At .conf25, Splunk proudly recognized Fast Lane as the 2025 Authorized Learning Partner of the Year. This ...

Tech Talk Recap | Mastering Threat Hunting

Mastering Threat HuntingDive into the world of threat hunting, exploring the key differences between ...

Observability for AI Applications: Troubleshooting Latency

If you’re working with proprietary company data, you’re probably going to have a locally hosted LLM or many ...