I have a fairly complex dashboard that has 6 graphs on it. Dashboard runs from a search head against multiple indexers. The page originally started off with 5 real time searches, but I've been able to compress the chart to using 3 real time searches, and then apply post processing to the results for the 6 graphs.
I have noticed, however, that the dashboard takes much longer to load because of this. Is it more efficient to have cleaner & more efficient real time searches, or take the approach that I have above?
It depends on the characteristics of the search.
When you fold up two or more searches into a single base search plus N postprocess searches, your new base search of course has more dimensions or more statistics or both. Depending on how many rows the new rolled-up search has, it might perform much better as 1 search and N postProcesses, but it can perform better split apart.
To illustrate how, lets start with the fact that the postProcess searches all have to be run against the base-search results, on every update request. in other words each time all 4 charts update, you're asking Splunk run 4 postprocess searches against the base results.
So if the base search only has a couple hundred lines, they'll all run very very fast, and running them every 2 seconds against realtime search results is no big deal. And it's much better than getting the same events off disk four times to run four slightly different simple reports.
However lets say your new base search result has 10,000 rows or 100,000 or a million rows - well, running 4 postProcess searches against those rows every 2 seconds might start to take significant resources. Of course the resources you're hitting are quite different than if you were running 4 parallel searches, but it still can eventually come up against limits.
Say that you have usernames and clientips, and we're running a search over a couple hours. We want two charts - one showing the top users, and one showing distinct count of users by clientip or something. We're getting the same events off disk so it's a clear candidate for using postprocess.
So our base search would be
stats count by username clientip, and our postprocess searches would be
stats sum(count) as count by username and
stats dc(username) by clientip.
Now let's say that generally each clientip is only associated with one user or a few users. And vice versa - each user will generally be on only one clientip. and there's a few hundred users and a few hundred clientips. Well this means that the "rolled-up" search will have a number of rows that's only a couple times larger than either the distinct number of users or clientips considered alone. This will be fine. The postprocesses will run fast.
Lets change it though and say that we're correlating usernames with another field that has a very high distinct-count. Let's say it's just a request id and every single event will have its own request id. If we do
stats count by username, request_id, this search result set be just as long as the underlying set of events, possibly in the hundreds of thousands or millions. PostProcess searches running against this result will take a lot longer. So 4 of them running every 2 seconds might start to tax the CPU or RAM, which would come at the expense of something else. And when the search is being distributed it gets a little more complicated and technical but the same guidelines apply.
I hope this explanation makes sense - as a further point this is also why it's so important to use the
bin command to bin the _time values when _time is one of your "by fields" in a base search. The _time values almost always have a very high distinct count, so
stats count by username clientip _time will be very large - basically as large as the underlying set of events you're trying to aggregate.