Getting Data In

Unable to authenticate to search heads: "Global key files are invalid. This server cannot distribute searches..."

myudkowsky
Communicator

After a long-overdue upgrade from 6.x to 7.1.3 -- this release it the latest one supported by my vendor, who interoperates with Splunk -- I have a problem. The search head no longer works with the indexers.

 

On the search head:

The full message in splunkd.log is:

"Global key files are invalid. This server cannot distribute searches to other servers."

In Settings  » Distributed search  » Search peers , we have error messages:

Error [00000100] Instance name "<deleted>" REST interface to peer is not responding. Check var/log/splunk/splunkd_access.log on the peer. Last Connect Time:2020-09-14T20:04:01.000+00:00; Failed 1 out of 1 times.

If I delete the distributed search head and attempt to re-validate it, I get the error:

Encountered the following error while trying to save: Invalid action for this internal handler (handler: distsearch-peer, supported: list|edit|remove|_reload|new|disable|enable|doc, wanted: create).
 
The only way I've found to re-add the search peer is to restart Splunk on the search head.
 
Also of note on the search head: because of changes by my vendor -- as far as I can tell -- when I install the upgraded Splunk, the vendor automatically restores the old file $splunk/etc/auth//distServerKeys/trusted.pem. As a result, again as far as I can tell,  when I start Splunk for the first time, the file $splunk/etc/auth//distServerKeys/private.pem is never generated on the search head. The search peers, on the other hand, do have both files.
 
Also in splunkd.log, I see messages such as:
 
DistributedPeer - Peer:https://x.x.x.x:y Key problems, see internal logs
 
with no indication of where these "internal logs" can be found.
 
I also see
 
Bundle Replication: Problem replicating config (bundle) to search peer ' x.x.x.x:y ', HTTP response code 401 (HTTP/1.1 401 Unauthorized). call not properly authenticated
 
On the search peers:
 
The search peer logs do not currently show any particular issue. The splunkd.log shows, on both indexers (search peers):
 
WARN HTTPAuthManager - Token not specified in Authorization: Splunk <token> header
 
and in splunkd_access.log,
 
POST /services/receivers/bundle/<search head address> HTTP/1.0" 401 148 - - - 0ms
 
which provides no useful information.
 
On the search peeers, the directory $splunk/etc/auth/distServerKeys/<search head name> has an exact copy of the file $splunk/etc/auth//distServerKeys/trusted.pem.
 
Questions:
 
  1. Why does this fail? Is this due to $splunk/etc/auth//distServerKeys/trusted.pem being present on the search head, with some incorrect key information?
  2. What does "Global key files are invalid" mean, and where can I find further information about how to fix them?

 

I welcome other suggestions -- as this includes suggestions for the right questions to ask.

Labels (2)
0 Karma
1 Solution

myudkowsky
Communicator

@isoutamoFixed! The problem was a straightforward consequence of the way the vendor software interacted with Splunk.

The vendor software would overwrite $SPLUNK_HOME/etc/auth/distServerKeys/trusted.pem - unless I specifically copied a new version in.

On startup of a new installation, the process in Splunk that generated files would look into $SPLUNK_HOME/etc/auth/distServerKeys/ and find a key there already, and therefore refuse to overwrite any keys there. (Deduction based on what happens when I try to generate keys that overwrite extant keys.)

Also, $SPLUNK_HOME/etc/auth/distServerKeys files trusted.pem and private.pem are, in fact, a set of public/private RSA keys.

To fix this:

1. Ran this command:

$SPLUNK_HOME/bin/splunk createssl audit-keys -d /tmp/testkeys -p /tmp/testkeys/private.pem -k /tmp/testkeys/trusted.pem -l 2048

Note that is is a 2048-long key. It's not documented anywhere that I can find, and the default output of audit-keys is 1024. Aside from trial and error, I noticed that, e.g., the indexers had a trusted.pem that was twice as large.

2) Copy the keys into $SPLUNK_HOME/etc/auth/distServerKeys

3) Restart splunk

4) Delete the indexers and re-add them -- they now can be added using the new keys.

I'm going to keep this topic open for a few days in case anyone would like edits to this explanation; if there are no requests I'll close it.

View solution in original post

myudkowsky
Communicator

@isoutamoFixed! The problem was a straightforward consequence of the way the vendor software interacted with Splunk.

The vendor software would overwrite $SPLUNK_HOME/etc/auth/distServerKeys/trusted.pem - unless I specifically copied a new version in.

On startup of a new installation, the process in Splunk that generated files would look into $SPLUNK_HOME/etc/auth/distServerKeys/ and find a key there already, and therefore refuse to overwrite any keys there. (Deduction based on what happens when I try to generate keys that overwrite extant keys.)

Also, $SPLUNK_HOME/etc/auth/distServerKeys files trusted.pem and private.pem are, in fact, a set of public/private RSA keys.

To fix this:

1. Ran this command:

$SPLUNK_HOME/bin/splunk createssl audit-keys -d /tmp/testkeys -p /tmp/testkeys/private.pem -k /tmp/testkeys/trusted.pem -l 2048

Note that is is a 2048-long key. It's not documented anywhere that I can find, and the default output of audit-keys is 1024. Aside from trial and error, I noticed that, e.g., the indexers had a trusted.pem that was twice as large.

2) Copy the keys into $SPLUNK_HOME/etc/auth/distServerKeys

3) Restart splunk

4) Delete the indexers and re-add them -- they now can be added using the new keys.

I'm going to keep this topic open for a few days in case anyone would like edits to this explanation; if there are no requests I'll close it.

View solution in original post

myudkowsky
Communicator

I'd like add a solution detail:

In my original notes on this problem,  I stated that trying to add a peer gave me this error message:

Encountered the following error while trying to save: Invalid action for this internal handler (handler: distsearch-peer, supported: list|edit|remove|_reload|new|disable|enable|doc, wanted: create).

In addition, the GUI did not display the option to add a new peer when I was in the list of search peers. I had to be the "distributed peers menu" to see the "+ search peer" option on that page; and it did not work.

As soon as I created a file called private.pem, even though it and its trusted.pem was not acceptable because the key length was too short, the option of "add search peer" appeared on the list of distributed peers. I am afraid that I can't recall the error message I received when I used "add search peer" with an invalid key pair.

0 Karma

myudkowsky
Communicator

Thank you for your response. Here's the answers:

  1. Time is in sync on all nodes. They are synced via NTP and operate in a small cluster in the same datacenter.
  2. The splunk versions are the same on all nodes. The search head was the last to be converted.
  3. I apologize if I was unclear. If I remove a search peer and then try to re-add it, I cannot. I get an error when I try this via the GUI: Encountered the following error while trying to save: Invalid action for this internal handler (handler: distsearch-peer, supported: list|edit|remove|_reload|new|disable|enable|doc, wanted: create). I have not tried to add via the Splunk command line.

 

Please feel free to make suggestions or ask further questions.

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Hi

couple of questions:

  • is time on sync on all nodes?
  • Are splunk versions same on all nodes?
  • Have you already try to remove + add search peers again?

r. Ismo

0 Karma

myudkowsky
Communicator

@isoutamo I'm re-sending this reply  -- I suspect that my previous reply was not properly directed to you.

Thank you for your response. Here's the answers:

  1. Time is in sync on all nodes. They are synced via NTP and operate in a small cluster in the same datacenter.
  2. The splunk versions are the same on all nodes. The search head was the last to be converted.
  3. I apologize if I was unclear. If I remove a search peer and then try to re-add it, I cannot. I get an error when I try this via the GUI: Encountered the following error while trying to save: Invalid action for this internal handler (handler: distsearch-peer, supported: list|edit|remove|_reload|new|disable|enable|doc, wanted: create). I have not tried to add via the Splunk command line.

 

Please feel free to make suggestions or ask further questions.

0 Karma

isoutamo
SplunkTrust
SplunkTrust
Can you remove search peer from sh and restart it? Then remove also trusted.pem from peer, restart it and then add it again.
Quite obviously there is something weird with this authentication.

One thing for next update, sh should update before peers. Here is simple flow for update order https://docs.splunk.com/images/d/d3/Splunk_upgrade_order_of_ops.pdf
0 Karma

myudkowsky
Communicator

@isoutamoThanks for the ideas.

  1. I removed the search peers from the sh and restarted the sh.
  2. I then removed trusted.pem from the peer and restarted the peer
  3. I restarted the peer.

I have not managed to get the peers to work correctly. I tried several different variations, e.g., I turned off both the sh and peer simultaneously, and then started the sh and then the peer.

I also replaced 'trusted.pem' with a new copy that I created via

$splunk/bin/splunk createssl server-cert -d $splunk/etc/auth -n 'trusted.pem' -c <ip address>

which did not help -- I thought it might fix the issue. I may have some ideas about that -- perhaps the "-c" should be replaced with a FQDN and the name of the instance, or if possible left out entirely.

 

0 Karma

isoutamo
SplunkTrust
SplunkTrust
Then it’s time to create support case to Splunk if you haven’t do it yet.
0 Karma

myudkowsky
Communicator

@isoutamoThanks for your help. With any luck, later today I will have time to test a different version of a new trusted.pem - or try a labor-intesive re-configuration of the operating environment to see if I can persuade the my original sh to work.

I am current waiting on my vendor's Splunk expert to discuss next steps with me. I will update this page (and you directly) when I have further progress.

0 Karma

myudkowsky
Communicator

@isoutamoHere's an update on this problem. After intensive work with vendor, I have no resolution, but I did finally find what I think is the problem in my current installation.

Before my upgrade to 7.1.3, I had two files in $SPLUNK_HOME/etc/auth/distServerKeys/. One was private.pem, an RSA private key, and another was trusted.pem, a public key. I have been unable to determine from the documentation  (yet) exactly what these keys are supposed to secure, or how they are generated.

In my current installation, I have only a "trusted.pem." This file is something I created, IIRC, in order to see if I could get the installation working. This trusted.pem is a full server certificate and *not* a key. It's a multi-part file with a certificate, an encrypted private key, and another certificate.

I can probably fix everything is I can find the exact procedure to generate trusted.pem and private.pem from the files I have in $SPLUNK_HOME/etc/auth:  cacert.pem, ca.srl, ca.pem, and a server.pem. (There's a few others there as well, e.g., in audit directory.)

I'm now looking for documentation on what does what, without much success.

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!