Dashboards & Visualizations

Should I use report acceleration or the loadjob command for dashboard performance optimization?

Motivator

Hi,

Let's say we have a dashboard with the following panels:

  • daily active users
  • daily active users by server
  • weekly active users
  • weekly active users by server
  • monthly active users
  • monthly active users by server

And the panels are all saved searches like this:

...
| timechart span=1*d/w/mon* dc(user_id) AS users

or

...
| timechart span=1d/w/m dc(user_id) AS users BY server_id

I'm thinking about how to optimize the performance of this dashboard.

1. Report Acceleration

Accelerate this base search (the output will be at least >3.000.000 rows)

 ...
| bucket span=1d _time
| stats count by _time, user_id, server_id

Create saved searches using this acceleration summary for the dashboard panels, e.g.

...
| bucket span=1d _time
| stats count by _time, user_id, server_id
| timechart span=1d dc(user_id) AS active_users

2. Loadjob

Schedule this saved search (the output will be at least >3.000.000 rows again):

...
| bucket span=1d _time
| stats count by _time, user_id, server_id

and use the results afterwards, e.g.

...
| loadjob savedsearch="user:app:mysearch"
| timechart span=1d dc(user_id) AS users

I haven't used loadjob before and am not sure whether this is intended usage. First test runs show that loadjob would be 3x faster than acceleration.
Do I have to keep in mind possible problems using acceleration or loadjob here? What would be your preferred approach?

Thanks in advance

Heinz

0 Karma
1 Solution

Legend

Here are my thoughts:

Does the data have to be "up to the minute"? In other words, do the statistics include data from the last hour?

If yes, then IMO report acceleration is a very good idea. You might also consider using post-process searches to make the dashboard panels faster. See here for info about post-process searches - you will need to scroll down to the "Post-process searches" heading.

If no, then you could use scheduled searches to power your dashboard panels. If you reference a scheduled search in a dashboard, Splunk will automatically populate the dashboard with the most recent cached results. If all of the panels can be powered from scheduled searches, then no searches will be run when the dashboard is loaded. This is very fast and efficient. This info is on the same page as the first link, but under the heading "Reference a search from a report" (earlier on the page).

You could use loadjob in a similar way to the scheduled searches, but it is a more complex way of doing the work.

Report acceleration should also make scheduled searches run faster. But the ongoing overhead of report acceleration might not be worth it, especially if you can run your scheduled searches at 4:00 am.

Do not bother with summary indexing, unless you are willing to spend the ongoing maintenance effort. Report acceleration summaries are maintained automatically. In your case, I think report acceleration is a better choice.

View solution in original post

Legend

Here are my thoughts:

Does the data have to be "up to the minute"? In other words, do the statistics include data from the last hour?

If yes, then IMO report acceleration is a very good idea. You might also consider using post-process searches to make the dashboard panels faster. See here for info about post-process searches - you will need to scroll down to the "Post-process searches" heading.

If no, then you could use scheduled searches to power your dashboard panels. If you reference a scheduled search in a dashboard, Splunk will automatically populate the dashboard with the most recent cached results. If all of the panels can be powered from scheduled searches, then no searches will be run when the dashboard is loaded. This is very fast and efficient. This info is on the same page as the first link, but under the heading "Reference a search from a report" (earlier on the page).

You could use loadjob in a similar way to the scheduled searches, but it is a more complex way of doing the work.

Report acceleration should also make scheduled searches run faster. But the ongoing overhead of report acceleration might not be worth it, especially if you can run your scheduled searches at 4:00 am.

Do not bother with summary indexing, unless you are willing to spend the ongoing maintenance effort. Report acceleration summaries are maintained automatically. In your case, I think report acceleration is a better choice.

View solution in original post

Motivator

Hi,

the data does not have to be up to date, latest would be @d. And they will run over night.
Currently I'm using scheduled searches for the panels. But I thought about how to increase system performance by using one of the options.

I haven't used the loadjob command before. I thought it could be a good solution, to populate the base search and then adjust the results to the different formats I need. Timecharting a result table should not be a problem regarding the maxtime of a subsearch?! And Splunk has to analyse the raw data only 1 time, instead of 6 times for every panel.

The mantenance effot of summay indexing is the reason why I always try to avoid it 🙂

0 Karma

Legend

So you could just have one scheduled search - the base search and then use post-process searches for the panels... Post-process searches are surely doing loadjob behind the scenes.

Also, you could accelerate the base search as well.

0 Karma

Motivator

I think I will use the normal loadjob command on a base search. The Post-process has a timeout limit that I want to avoid as possible source of errors in the dashboards.

Thanks a lot for your input

0 Karma

SplunkTrust
SplunkTrust

Do you get 3 million rows after per day, after the timechart? What's the time range for which your dashboard shows data for day/week/month?
Have you explore options of summary indexing?

0 Karma

Motivator

Hey,
the timerange would be "all-time", because I want to cover the whole lifecycle. Earliest start would be in 2013.
After the timechart there would be one row per day/month/week and one column for the KPI.

I try to avoid summary indexing, because what I've read so far that it needs care & attention, to keep the results correctly (gaps, overlaps etc)

0 Karma