Splunk Search

How do I improve the performance my dashboard load times for a large amount of data?

Builder

I have a form that uses a searchTemplate:

index=java earliest=$timerange.earliest$ latest=$timerange.latest$ app_name=API (location=bunchOfLocations) (operation="bunchOfOperations" ) NOT acceptLanguage="*q=*" | table _time, location, priority, status, respTime, operation, NumFound, applicationName, geoExpansion, q

I have a dashboard that uses a search as a base, and then specific panels on the dashboard use different variations along with the base search for example :

search NumFound=0 geoExpansion=true q=* | rex "q=(?<q>.+),radius" | eval q=lower(q) | timechart count as "Count of Null Results" by q

There are about 10 panels on this dashboard. The searchTemplate search returns about 12Mil results over a 24 hour span, so it takes a very long time to run.
I am trying to avoid using acceleration as it uses resources I am trying not to use.
How can I speed up the dashboard while being as minimally impactful as possible?

0 Karma

Splunk Employee
Splunk Employee

Since almost all your examples use

search NumFound=0 q=*

You could use a post process search to lump that data into one search, then you'd have the other ones feed off it. Right now, you're retrieving very similar information, multiple times. This way there would be one main search. I'm not really sure if it would speed up the whole thing by that much, however your first search (in the comments) is pulling back all the data that the other ones need... so there is at least some speedup from that. There are a few limitations though:

  • If the base search is a non-transforming search, Splunk retains only the first 500,000 events returned. In this case, events in excess of this 500,000 limit are not processed by the post process search, resulting in incomplete data. Splunk recommends that you use a transforming search for the base search to avoid this problem.
  • If the post-processing operation takes too long, it can exceed Splunk Web client’s non-configurable timeout value of 30 seconds. This can result in a timeout due to an unresponsive splunkd daemon/service. This scenario typically happens when you use a non-transforming search as the base search. Splunk recommends that you use a transforming search for the base search to avoid this problem.

You could probably pack those rex commands into the search before the timechart, then do the search for applicationName on the main, giant timechart, since you're pulling that data in anyway with the first search.

Here is what I would do:

search NumFound=0 q=* 
| eval q =lower(q) 
| rex "q=(?<q>.+),radius" 
| timechart 
count 
count(eval('geoexpansion'="true")) as geoexpansion
count(eval(match(applicationName, "ios"))) as ios
count(eval(match(applicationName, "device")) as device
count(eval(match(applicationName, "android")) as android
by q

So timechart is a transforming command which allows us to get around the aforementioned limitations (hopefully). If the search above is your main global search, your post process searches can be really simple.

The first one,

search NumFound=0 q=* | eval q=lower(q) | timechart count as "Count of Null Results" by q

would simply be

| fields _time tee*

The second one,

search NumFound=0 geoExpansion=true q=* applicationName="*ios*" | rex "q=(?<q>.+),radius" | eval q=lower(q) | timechart count as "Count of Null Results" by q

would simply be

| fields _time ios*

and so on. @ mention me (@aljohnson_splunk) here in the comments if you need more help.

SplunkTrust
SplunkTrust

It depends on the metrics that you're showing in the panels. You need to provide query of at least 4-5 panels to see if there is some pre aggregation that can be done on the search template itself.

0 Karma

Builder

Here are some searches for some of the other panels:

 1. search NumFound=0 q=* | eval q=lower(q) | timechart count as "Count of Null Results" by q
 2. search NumFound=0 geoExpansion=true q=* applicationName="*ios*" | rex "q=(?&lt;q&gt;.+),radius" | eval q=lower(q) | timechart count as "Count of Null Results" by q
 3. search NumFound=0 geoExpansion=true q=* applicationName="*device*"  | rex "q=(?&lt;q&gt;.+),radius" | eval q=lower(q) | timechart count as "Count of Null Results" by q
 4. search NumFound=0 geoExpansion=true q=* applicationName="*android*" | rex "q=(?&lt;q&gt;.+),radius" | eval q=lower(q) | timechart count as "Count of Null Results" by q
0 Karma