Reporting
Highlighted

Running report from large amount of data

Communicator

Hi!

When you are creating a report from millions of data, I believe using summary indexing is a good solution.

However , if you have a requirement as on demand, would this still be a solution? In my case, I need to create a report that is mixture of average,.sum ,.most frequent value ,.etc, this makea complicated.

I appreciate if someone can give me an advice.

Tags (2)
Highlighted

Re: Running report from large amount of data

Motivator

I have several summary indexes that do this. The most important thing, should be fairly easy, is to figure out a time span. This is a saved search template I use to populate summary indexes capturing data you described above:

[savedsearchname]
    enableSched = 1
    cron_schedule = */5 * * * *
    dispatch.earliest_time = -8m@m
    dispatch.latest_time = -3m@m
    action.summary_index = 1
    action.summary_index._name = sum_index
    action.summary_index.stat_tag = statistics
    search = index=your_index sourcetype=your_sourcetype | bucket _time span=1m | sistats\stats avg(your_field) AS your_field_avg, median(your_field) AS your_field_median, mode(your_field) AS your_field_mode, count(your_field) AS your_field_count, dc(your_field) AS your_field_dc, max(your_field) AS your_field_max, min(your_field) AS your_field_min, stdev(your_field) AS your_field_stdev, var(your_field) as your_field_var by _time

You can use a macros.conf to make the search look cleaner, as I do. I just wrote it all out to show how to set up the values you need using sistats. As you can see above, this data is on (up to) an 8 minute delay from real-time. You can adjust the delay by changing the earliest_time and latest_time parameters.

Also, when getting data back out after using sistats, you will need to rerun the stats command to "reheat" the data for use.

0 Karma
Highlighted

Re: Running report from large amount of data

Communicator

Hello ShaneNewman.

Thank you for the reply.
I did not mention but we have 3 indexes to summarize and
each has approximately

index A : 50,000,000 events (150,000 events indexed per day)
index B : 10,000,000 events (50,000 events indexed per day)
index C : 50,000 events (few events indexed per day)

and 31 summary items to calculate.
Some summary items needs to be calculated in different dimension thus we need to create search separately.

I believe I would need to create temporary summary index and then concatenate it to single daily summary index where user can use time modifier .

0 Karma
Highlighted

Re: Running report from large amount of data

Motivator

Easy enough, just use sub-searches in your search string. There is no real reason to create a temp index, you are just adding another failure point.

0 Karma