Dashboards & Visualizations

Why is the dashboard taking hours to complete or failing?

angersleek
Path Finder

The system I am working with gets logs about 500k per hour. I have a dashboard with multiple queries on these logs. And I am trying to get a report out for the last 1 year. I do expect it to take sometime. But the dashboard not completing after like 5 hours or jus failing seems plain ridiculous.

Believe I am doing some really inefficient work on the dashboard. I am new to this and looking for some advice on how I could make my queries and charts deliver faster results without failing.

I have 5 of the following searches on the dashboard and each search is presented as a single value chart:

service="this changes for the 5 different searches" | chart avg(REQUEST_DURATION) as "Service (ms)"

I have 5 of the following searches on the dashboard and each search is presented as a single value chart:

market="this changes for the 5 different searches" | timechart span=5m avg(REQUEST_DURATION) as average | fillnull | sort average

I have 5 of the following searches on the dashboard and each search is presented as a line chart:

locale="this changes for the 5 different searches"|fields REQUEST_DURATION| eventstats avg(REQUEST_DURATION) as average  | timechart span=5m avg(REQUEST_DURATION) as actual ,first(average) as average | eval max = 500 | filldown
0 Karma

woodcock
Esteemed Legend

perhaps you mean span=5mon for months, not span=5m for minutes?

0 Karma

FrankVl
Ultra Champion

1 Year search window seems awfully long for single value display, or 5 minute time span time charts. I'd strongly consider reducing that time window.

If you really need to show summaries over such long periods, you might want to investigate options to use a summary index to track these statistics, such that you can load the dashboard based on that, instead of searching all the raw data.

Also: if you're calculating multiple statistics / visualizations based on the same overall dataset, it is also recommended to configure your dashboard with a common base search that retrieves the data and then panel-specific sub searches.

"I am not entirely what you meant by filtering for specific index and / or sourcetype."

Usually people structure their data in Splunk by putting different types of data into different indexes (and assigning different sourcetypes). For example: your windows logs go into windows index, linux logs to the linux index, webserver logs into webserver index etc. That way, if you have a dashboard that needs the webserver logs, you can let it search only the webserver index, so it doesn't have to work through all the windows and linux logs which are irrelevant.
Now, I don't know what data you are collecting into Splunk, perhaps it is really only the data you are using for this dashboard, but seeing searches that do not specify a certain index and sourcetype is rather rare, that is why I commented on that.

niketn
Legend

@angersleek as @FrankVI has suggested go for summary indexing. 5 minute window would be too small so you should ensure that summary is collected only once all the events for specific time window are already collected, for example delays in indexing the event may push out event to next time window. So, summary collection would be if minute of hour is 15 then collect 0-5 min data (assuming 5-10 min data would still be coming in). Also this would mean that if your system feeding data is down or Splunk Forwarder goes down or has congestion, you might have to backfill the data from time to time (unless collection window provides you caters to such down times by adding buffer). Refer to this documentation: http://docs.splunk.com/Documentation/Splunk/latest/Knowledge/Usesummaryindexing

In an hour there will be 12 bucket i.e. 0-5, 5-10,10-15, 15-20 ... till 55-60. So theoretically, your search will have 12 times less events to go through each time it runs. However, like it has been mentioned already that 5 min bucket for one year data is huge. I am not sure how you would show the same through timechart in Chart or Single Value as the visualizations by default can show upto 1000 data points only and your query gives 365*24*12=105120 data points. This implies your results in the Single Value would be Truncated. Refer to this question: https://answers.splunk.com/answers/658129/single-value-viz-these-results-may-be-truncated-th.html

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

angersleek
Path Finder

@niketnilay "Ensure summary is collected only once all the events for specific time window are already collected."

I was under the impression I was already doing this.. Clearly I'm doing something wrong then.. I can change the 5 min interval. I am not hard pressed on that interval figure of 5 mins. Can be anything. Is there a way it can be set to auto vary based on my time scales. Example 5mins if the time scale is 60mins and maybe 1 day if the time scale is 1 week.

0 Karma

niketn
Legend

@angersleek, try going through Splunk Video (very old but good explanation) https://www.splunk.com/view/SP-CAAACZW

Also check out David Veuve's blog: https://www.davidveuve.com/tech/how-i-use-summary-indexes-in-splunk/

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

FrankVl
Ultra Champion

That's what it will do when you don't specify it explicitely. But that doesn't solve that each time your dashboard loads, Splunk needs to retrieve 1 year of data and calculate statistics over it.

By using a summary index, you periodically calculate statistics over a short timespan and store those statistics in the summary index. Your dashboard then only needs to present the statistics from the summary index, rather than having to calculate it over a whole year from scratch each time.

0 Karma

FrankVl
Ultra Champion

Over what time span does the dashboard searches run?

Is there a reason you're not filtering for specific index and / or sourcetype to restrict the scope of the searches?

How do these searches behave when you run each of them individually manually?

0 Karma

angersleek
Path Finder

@FrankVI Even when searched individually for example 1 of each of thre chart types (3), they are slow or fails. I am searching over a period of 1 year. I am not entirely what you meant by filtering for specific index and / or sourcetype.

0 Karma

Sukisen1981
Champion

hmmm can you run this query individually and check if there is any improvement on a individual basis? I am using your irst search query, so compare an individual run of this service="this changes for the 5 different searches" | stats avg(REQUEST_DURATION) as "Service (ms)"

Mainly using stats instead of chart. Also can you use fast mode to search , in case you have verbose mode enabled by default?

0 Karma

Sukisen1981
Champion

As @FrankVI mentions, this is a huge search you are running (even taking into account that Splunk is a big data solution), even if one of these searches execute manually , I will wager that even when run in an individual mode they take an inordinate amount of time to execute. In the dashboard, you are probably running 10-15 such searches concurrently. In a sense this is a search of the category - high data volume logged per unit time AND run over a high time period as well.

Now, is it possible for you to try putting some drop down filters in the dashboard so that users , for example, select a particular service (form your first search example) and then you run the search only for the user selected service?
Even this might bee too huge to handle, but you do need to think of some other options in a functional sense.

0 Karma

angersleek
Path Finder

@Sukisen1981 I am trying to show all the different services at the same time thus didn't use drop downs for this. Plus like you mentioned even that is still slow. I do need to display these information. I don't see a way a around it.

0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...