Dashboards & Visualizations

Why are advanced XML dashboards with saved searches showing incomplete or missing data. Is this a performance problem?

Motivator

I'm running Splunk 6.2.2. I have a number of saved searches which work properly and always produce correct results when run from the search bar. However, when I try to run them in a dashboard, the results that are shown in the charts are sometimes incomplete or missing.

These dashboards have worked before, so I'm not quite sure what's going on. They are written in advanced XML, and each has a number of underlying searches.

For instance, I have a saved search that produces a timechart. When I run it over 7 days, I get all of the results I would expect to get, with the proper values. However, when this same search is the basis for a chart on a dashboard, I might only get the first bar drawn, and the others are missing. Or I might get a "No results found. Inspect..." bar instead of the chart I want, but when I inspect it and run the search that is highlighted as not having returned any data over the time range from the search bar, I get the numbers I expect.

Restarting Splunk doesn't make a difference, except that sometimes it results in different charts failing on a particular dashboard.

There doesn't seem to be a relationship between charts that are using HiddenPostProcess and ones that are failing - sometimes it's those, sometimes it's not. There doesn't seem to be a relationship to charts that using a particular kind of output: sometimes it's tabular output that doesn't show up, sometimes it's a time chart that's incomplete, sometimes it's a single value that isn't correct.

(Also, I'm not trying to do any realtime searches in this context. This is all with regular time boundaries.)

It seems like there might be some kind of performance issue, but I'm having a hard time diagnosing it. What might be going on here, and how might I fix it?

Thank you.

0 Karma

Splunk Employee
Splunk Employee

There could be a few things going on here, but at a high level here is how you should consider things:

1 - Do the searches run successfully on their own, individually? If so, then we can move on.

2 - How many searches are being spawned by the dashboard?

Remember, you can only run so many searches until you max out the available cpu cores. Each search will consume up to a single cpu core, although you can run quite a few per core. Also consider that your post process might be putting significant load if you don't have any cpu to run them.

3 - Are you using any acceleration or data models?

These will need to initially populate and will absolutely cause you to be baffled by why one system runs searches well and the other does not.

My suggestion: Slowly pull out/remove individual searches (especially if they are accelerated or have data models) and pinpoint where the source of the problem occurs. It should not be hard to remove/disable a search from the dashboard. Also, keep an eye on the cpu/search processes in the OS to see how many are being run in the background (ps -ef | grep splunk). You might find that something else is actually preventing the dashboard from completing, such as many realtime searches or background searches.

0 Karma

Motivator

Thank you for your response.

1 - They run successfully on their own.

2 - Each dashboard runs between 4 and 9 searches, and have anywhere from 2 to 5 post-process searches.

3 - The vast majority of the underlying searches are accelerated saved searches.

I understand why these factors would contribute towards making a dashboard slow, but I don't quite get why they would lead to data being missing. These are dashboards that have worked in the past over similar data sets (allowing time for acceleration to kick in, etc), so this is effectively new behavior over the same data.

Are there any logs I can check that would indicate why a particular search in a dashboard didn't run completely? What sort of things in the logs should I be looking for? Initial indications (like inspecting the jobs) don't reveal much in that regard.

I will try removing searches from the dashboards to see if they perform any better.

0 Karma

Splunk Employee
Splunk Employee

There will be a search.log that pertains to the job id that was run for your searches. You can find the job directory name by the timestamp of when the search ran in addition to using the job manager. All jobs get stored in the $SPLUNK_HOME/var/run/dispatch directory (IIRC). Within that directory there will be timing information within the search.log file (IIRC). An easier alternative would be to open a terminal window and actively run "ps -ef | grep splunk" a few times before, during, and after you launch the dashboard. This will tell you which searches are started before, during, and after.

0 Karma

Splunk Employee
Splunk Employee

I don't know when we added it, but I very recently learned that the search log is available in the job inspector, right near the top. Saves some time finding the PID and looking at the filesystem.

0 Karma

SplunkTrust
SplunkTrust

Are there any Javascript errors showing up in the Browser's error console?
Also, are you using Sideview Utils? I see you mention the legacy Splunk module HiddenPostProcess so I'm thinking maybe not. If you are using Sideview Utils, feel free to send the xml to me at nick [at] sideviewapps.com and I'll take a quick look. If you are not, send over the xml too and if I can convert it to Sideview XML for you quickly I will.

0 Karma

Motivator

I sometimes (but not always) get the following Javascript error:

Uncaught Error: Null response object. Connection may have been lost. common.min.js:24841
(Stack)
self.addEventListener common.min.js:24841 
Splunk.Module.FlashChart.$.klass.onConnect modules-10de2d15a6e88b36cd5d820163cf55ef7f8d8074.min.js:2704
$.extend.wrap common.min.js:6231
(anonymous function) modules-10de2d15a6e88b36cd5d820163cf55ef7f8d8074.min.js:1811
notifyConnect common.min.js:24599

I am not using Sideview Utils.

0 Karma

SplunkTrust
SplunkTrust

This is most likely something like a minor wrinkle or syntax error in postprocess syntax, that shouldn't cause a problem, but that trips this javascript error from inside the FlashChart modules. If you email me the xml ( nick [at] sideviewapps.com ) I'll take a look. A JS error happening during a render, is consistent with what you're seeing, in that sometimes things render all or most of the way and sometimes the charts/tables only refresh once or zero times and then stop. I may be able to construct a reproducible case locally using your XML, and then figure it out.

0 Karma