I have installed and configured the DB connect under my deployer. Added identities and connections.
Then copied the etc/apps/splunk_app_db_connect to /etc/shcluster/apps/ and pushed the bundle to shcluster.
As per doc: https://help.splunk.com/en/splunk-cloud-platform/connect-relational-databases/deploy-and-use-splunk-...
App is deployed but identity.dat file is generated every 30s on sh members and that is different than on my deployer.
DB Connect GUI on SH members gives me an error: "Identity password is corrupted."
What did I miss?
This is how I got it fixed.
So, what I did instead was:
- Stop Splunkd on deployer.
- curl delete secret on every shcluster member.
curl -k -u username -X DELETE https://<host>:<management-port>/servicesNS/nobody/splunk_app_db_connect/storage/collections/data/secret
- Delete full directory /opt/spluk/etc/apps/splunk_app_db_connect from shcluster members.
- Restart Splunkd on all shcluster members.
- Start deployer and push bundle to deploy the app again.
Now the identity.dat is the same on all sh1-3 and deployer.
But I think the issue comes from my deployment process due to this nasty bug:
https://splunk.my.site.com/customer/s/article/Pushing-App-config-changes-from-search-head-cluster-de...
As I cannot just change the .conf files under shcluster/apps/<app>/ and push the bundle, I have to do:
1. mv shcluster/apps/splunk_app_db_connect shcluster/apps/splunk_app_db_connect_temp
2. Push the bundle and wait for the rolling restart of shcluster members.
3. mv shcluster/apps/splunk_app_db_connect_temp shcluster/apps/splunk_app_db_connect
4. Push the bundle and wait for the rolling testart of shcluster members.
And while the _temp app gets deployed and started, I assume that will mess up the KV-store secret (and/or the identity.dat)?
As after my above procedure, the identity.dat on shcluster members != deployer and the app gives an error again.
This is how I got it fixed.
So, what I did instead was:
- Stop Splunkd on deployer.
- curl delete secret on every shcluster member.
curl -k -u username -X DELETE https://<host>:<management-port>/servicesNS/nobody/splunk_app_db_connect/storage/collections/data/secret
- Delete full directory /opt/spluk/etc/apps/splunk_app_db_connect from shcluster members.
- Restart Splunkd on all shcluster members.
- Start deployer and push bundle to deploy the app again.
Now the identity.dat is the same on all sh1-3 and deployer.
But I think the issue comes from my deployment process due to this nasty bug:
https://splunk.my.site.com/customer/s/article/Pushing-App-config-changes-from-search-head-cluster-de...
As I cannot just change the .conf files under shcluster/apps/<app>/ and push the bundle, I have to do:
1. mv shcluster/apps/splunk_app_db_connect shcluster/apps/splunk_app_db_connect_temp
2. Push the bundle and wait for the rolling restart of shcluster members.
3. mv shcluster/apps/splunk_app_db_connect_temp shcluster/apps/splunk_app_db_connect
4. Push the bundle and wait for the rolling testart of shcluster members.
And while the _temp app gets deployed and started, I assume that will mess up the KV-store secret (and/or the identity.dat)?
As after my above procedure, the identity.dat on shcluster members != deployer and the app gives an error again.
Hello @JykkeDaMan This can be addressed by following below steps:
Stop the Deployer
First, stop the deployer service to begin the resolution process.
Stop the Splunk Service on Each Cluster Node
On each cluster node, stop the Splunk service before proceeding.
Remove Keystore and Password Files
On each cluster node, remove the following files:
keystore/default.jks
certs/keystore_password.dat
Delete Secret Data from Splunk Storage Collections
Run the following command on each cluster node to delete the secret data:
curl -k -u username -X DELETE https://<host>:<management-port>/servicesNS/nobody/splunk_app_db_connect/storage/collections/data/secret
Repeat the Process on All Cluster Nodes
Perform steps 2 through 4 on all nodes in the cluster to ensure consistency.
Start the Splunk Service on All Nodes
After completing the above steps on all nodes, start the Splunk service again
Also raising support case can make your work more easily in such issues.
Hmmm, how do I run the curl to 8089 if I have the Splunkd stopped on all the shc nodes?
Ok, i'll try.
But was there something I did wrong when deploying the app? Why did I end up into this state and how to prevent it in the future?