We are experiencing some weird issues with a custom developed dashboard application, and after a couple of days trying to debug the issue, I feel it is time I reached out to the community. If anyone can help, even if it's extra debug steps, that would be great!
I developed a dashboard application with 2 views using a standalone search head to test and then deployed the custom app to the search head cluster in the standard way.
For certain users (as yet we cannot determine what the commonality is) the dashboard application does not work. We get a red triangle with the following text:
Search process did not exit cleanly, exit_code=-1, description="exited with code -1". Please look in search.log for this peer in the Job Inspector for more info.
Clicking the spyglass and using the integrated search within the app initially shows the errors (and allows us to look at the Job Inspector), however, if we retry the search using the same search parameters and time - we get results as expected.
Within the failed search, the search log has some differences from the successful search. We see WARN messages from the AuthorizationManager about an unknown role - '', something that does not appear in the successful search. I have confirmed that our authorization.conf and authorize.conf are the same on all members of the search head cluster, and that there are no etc/system/local versions of these files either.
Lastly, the permissions on this application are pretty wide, as defined in the default.meta for the app:
access = read : [ * ], write : [ admin, power ]
export = system
- Splunk 6.4.1 on Windows Server 2012,
- we have a working Search Head Cluster running, that connects to 2 clustered indexers
- Using LDAP for authentication and deploying authorize.conf and authentication.conf as part of the Search Head Cluster bundle.
I am really at a loss for what to do or look at next, any help is very much appreciated
Update: We created two users with the same roles - one works, the other does not
So you have one app with two dashboards? If so, do both dashboards have the same problem? Anything consistent? Is it always the same users that don't work? Is it the same data on the same peer? Does the problem persist on each sh in the cluster?
If were to create a dashboard from Splunk web in your app and just copy the source over, does the issue surface for it as well?
No real ideas here, just some troubleshooting steps to narrow down some pattern.
Thanks for the response, it's good to have some external input to help solve this issue!
Yes we have one app, with two dashboards. Both dashboards exhibit the same behaviour, and the same errors. If a user has a problem with one dashboard, they have a problem with both. The problem is consistent on all search head members.
If I copy the app dashboard XML and create a new Dashboard in the Search & Reporting App - everything is fine - all users can view the data without a problem
What is weird is as the following behaviour, which has me thinking it's some issue with permissions (somewhere):
User sees error within App on dashboard
User clicks on the spyglass, this opens a search view within the app. From here the user can view the job inspector - the errors are real (but odd)
User simply refreshes the search, no error is received and the results are returned
I am wondering if there is a bug that needs to be looked at, but I can't figure out what it might be!