After a long-overdue upgrade from 6.x to 7.1.3 -- this release it the latest one supported by my vendor, who interoperates with Splunk -- I have a problem. The search head no longer works with the indexers.
On the search head:
The full message in splunkd.log is:
"Global key files are invalid. This server cannot distribute searches to other servers."
In Settings » Distributed search » Search peers , we have error messages:
Error [00000100] Instance name "<deleted>" REST interface to peer is not responding. Check var/log/splunk/splunkd_access.log on the peer. Last Connect Time:2020-09-14T20:04:01.000+00:00; Failed 1 out of 1 times.
If I delete the distributed search head and attempt to re-validate it, I get the error:
I welcome other suggestions -- as this includes suggestions for the right questions to ask.
@isoutamoFixed! The problem was a straightforward consequence of the way the vendor software interacted with Splunk.
The vendor software would overwrite $SPLUNK_HOME/etc/auth/distServerKeys/trusted.pem - unless I specifically copied a new version in.
On startup of a new installation, the process in Splunk that generated files would look into $SPLUNK_HOME/etc/auth/distServerKeys/ and find a key there already, and therefore refuse to overwrite any keys there. (Deduction based on what happens when I try to generate keys that overwrite extant keys.)
Also, $SPLUNK_HOME/etc/auth/distServerKeys files trusted.pem and private.pem are, in fact, a set of public/private RSA keys.
To fix this:
1. Ran this command:
$SPLUNK_HOME/bin/splunk createssl audit-keys -d /tmp/testkeys -p /tmp/testkeys/private.pem -k /tmp/testkeys/trusted.pem -l 2048
Note that is is a 2048-long key. It's not documented anywhere that I can find, and the default output of audit-keys is 1024. Aside from trial and error, I noticed that, e.g., the indexers had a trusted.pem that was twice as large.
2) Copy the keys into $SPLUNK_HOME/etc/auth/distServerKeys
3) Restart splunk
4) Delete the indexers and re-add them -- they now can be added using the new keys.
I'm going to keep this topic open for a few days in case anyone would like edits to this explanation; if there are no requests I'll close it.
@isoutamoFixed! The problem was a straightforward consequence of the way the vendor software interacted with Splunk.
The vendor software would overwrite $SPLUNK_HOME/etc/auth/distServerKeys/trusted.pem - unless I specifically copied a new version in.
On startup of a new installation, the process in Splunk that generated files would look into $SPLUNK_HOME/etc/auth/distServerKeys/ and find a key there already, and therefore refuse to overwrite any keys there. (Deduction based on what happens when I try to generate keys that overwrite extant keys.)
Also, $SPLUNK_HOME/etc/auth/distServerKeys files trusted.pem and private.pem are, in fact, a set of public/private RSA keys.
To fix this:
1. Ran this command:
$SPLUNK_HOME/bin/splunk createssl audit-keys -d /tmp/testkeys -p /tmp/testkeys/private.pem -k /tmp/testkeys/trusted.pem -l 2048
Note that is is a 2048-long key. It's not documented anywhere that I can find, and the default output of audit-keys is 1024. Aside from trial and error, I noticed that, e.g., the indexers had a trusted.pem that was twice as large.
2) Copy the keys into $SPLUNK_HOME/etc/auth/distServerKeys
3) Restart splunk
4) Delete the indexers and re-add them -- they now can be added using the new keys.
I'm going to keep this topic open for a few days in case anyone would like edits to this explanation; if there are no requests I'll close it.
I'd like add a solution detail:
In my original notes on this problem, I stated that trying to add a peer gave me this error message:
Encountered the following error while trying to save: Invalid action for this internal handler (handler: distsearch-peer, supported: list|edit|remove|_reload|new|disable|enable|doc, wanted: create).
In addition, the GUI did not display the option to add a new peer when I was in the list of search peers. I had to be the "distributed peers menu" to see the "+ search peer" option on that page; and it did not work.
As soon as I created a file called private.pem, even though it and its trusted.pem was not acceptable because the key length was too short, the option of "add search peer" appeared on the list of distributed peers. I am afraid that I can't recall the error message I received when I used "add search peer" with an invalid key pair.
Thank you for your response. Here's the answers:
Please feel free to make suggestions or ask further questions.
Hi
couple of questions:
r. Ismo
@isoutamo I'm re-sending this reply -- I suspect that my previous reply was not properly directed to you.
Thank you for your response. Here's the answers:
Please feel free to make suggestions or ask further questions.
@isoutamoThanks for the ideas.
I have not managed to get the peers to work correctly. I tried several different variations, e.g., I turned off both the sh and peer simultaneously, and then started the sh and then the peer.
I also replaced 'trusted.pem' with a new copy that I created via
$splunk/bin/splunk createssl server-cert -d $splunk/etc/auth -n 'trusted.pem' -c <ip address>
which did not help -- I thought it might fix the issue. I may have some ideas about that -- perhaps the "-c" should be replaced with a FQDN and the name of the instance, or if possible left out entirely.
@isoutamoThanks for your help. With any luck, later today I will have time to test a different version of a new trusted.pem - or try a labor-intesive re-configuration of the operating environment to see if I can persuade the my original sh to work.
I am current waiting on my vendor's Splunk expert to discuss next steps with me. I will update this page (and you directly) when I have further progress.
@isoutamoHere's an update on this problem. After intensive work with vendor, I have no resolution, but I did finally find what I think is the problem in my current installation.
Before my upgrade to 7.1.3, I had two files in $SPLUNK_HOME/etc/auth/distServerKeys/. One was private.pem, an RSA private key, and another was trusted.pem, a public key. I have been unable to determine from the documentation (yet) exactly what these keys are supposed to secure, or how they are generated.
In my current installation, I have only a "trusted.pem." This file is something I created, IIRC, in order to see if I could get the installation working. This trusted.pem is a full server certificate and *not* a key. It's a multi-part file with a certificate, an encrypted private key, and another certificate.
I can probably fix everything is I can find the exact procedure to generate trusted.pem and private.pem from the files I have in $SPLUNK_HOME/etc/auth: cacert.pem, ca.srl, ca.pem, and a server.pem. (There's a few others there as well, e.g., in audit directory.)
I'm now looking for documentation on what does what, without much success.