Hey guys,
Been troubleshooting my distributed searching for the last two days. My environment:
Primary facility:
Cluster Master
deployment server
2 Search Heads
2 indexers
Secondary:
2 Search Heads ( captain is here for testing)
2 indexers
the 2 search headers in Primary can't reach the captain and splunkd.log is showing the following:
WARN SSLCommon - Received fatal SSL3 alert. ssl_state='SSL negotiation finished successfully', alert_description='bad record mac'.
08-14-2019 18:04:48.986 +0000 ERROR HttpClientRequest - HTTP client error=error:140943FC:SSL routines:ssl3_read_bytes:sslv3 alert bad record mac while accessing server=https://:8089 for request=https://:8089/services/shcluster/captain/members.
On the cluster master i'm seeing the following:
(obfuscating the IPs)
Bundle Replication: Problem replicating config (bundle) to search peer ' secondary indexer 1 ', Unknown write error
Bundle Replication: Problem replicating config (bundle) to search peer ' secondary indexer 2 ', Unknown write error
so it seems to me that what ever Master ( search or clustermaster) in either site, can't talk to devices in the other site.
few thoughts:
There is a F5 in between
other is it's going over a WAN and latency is effecting it.
Anyone have ideas?
This alert is returned if a record is received with an incorrect MAC. This alert also MUST be returned if an alert is sent because a TLSCiphertext decrypted in an invalid way: either it wasn't an even multiple of the block length, or its padding values, when checked, weren't correct.
Possible causes:
SNAT isn't enabled
F5 is unloading ssl and rebaking with F5's cert
Wrong ciphers in use on both ends
Check server.conf [ssl] & [shclustering] stanzas on each search head.
Also you mentioned one search head is captain. Are you running static captain? If so that's only supported for when you're recovering a failed cluster.
You have an F5 between sites? Why?
this is because this is two separate subnets. We use the F5 as a gateway to route the traffic