Unwanted self-deleted Peer

vinigreen · ‎07-23-2019

Lately i've been having many problems with my peers disponibility. Many times it stops working and cause me issues. I figured that if i could make a script up my problems would end. I use the splunk atom feed to this, follow the script:

PowerShell

`param([string]$Peer="")
ignore invalid SSL Certs - Do Not Change
try {
add-type @"
using System.Net;
using System.Security.Cryptography.X509Certificates;
public class TrustAllCertsPolicy : ICertificatePolicy {
public bool CheckValidationResult(
ServicePoint srvPoint, X509Certificate certificate,
WebRequest request, int certificateProblem) {
return true;
}
}
"@
}
catch { }
[System.Net.ServicePointManager]::CertificatePolicy = New-Object TrustAllCertsPolicy

function Set-PeerStats{
Invoke-WebRequest -Method POST -Headers @{"Authorization"="Basic xxxxxxxxxbase64xxxxxxxxxx="} -Uri "hxxps://172.20.20.188:8089/servicesNS/-/search/search/distributed/peers/$Peer%3A8089/disable" >$null
Start-Sleep 5;
Invoke-WebRequest -Method POST -Headers @{"Authorization"="Basic xxxxxxxxxxxxxbase64xxxxxxxxxxxx="} -Uri "hxxps://172.20.20.188:8089/servicesNS/-/search/search/distributed/peers/$Peer%3A8089/enable" >$null
}
Set-PeerStats`

If my peer goes down several times, like 20 times in a row, the peer automatically desappears as if it was deleted, but genereates no logs from it, i also can ensure that isn't quarentined.

Does anyone have any clue about what could be happening?

skalliger · ‎07-24-2019

Could you tell us what you're actually trying to do? So, what was the problem causing you to switch over to a script doing some REST calls?

Nothing in the splunkd.logs?

Skalli

vinigreen · ‎07-26-2019

I'm facing an everyday problem with my customers peers. Every single day the peer goes down about 20 times. We use this ditribute search peer to get data from our costumers splunk. Every time that the peer goes down, we disable and enable the peer and it begins to work again.

I looked up the logs to see what i could find, but nothing really usefull.

=================================================

Audit log if we delete the peer manually:

{query}
index=_audit* user="vandrade" action=edit_dist_peer operation=remove

Result:

7/22/19
12:50:29.462 PM
Audit:[timestamp=07-22-2019 12:50:29.462, user=vandrade, action=edit_dist_peer, info=granted object="192.168.100.246:8089" operation=remove][n/a]
source = audittrail sourcetype = audittrail user = vandrade

===================================================

Search Audit log for the unwanted auto-deleted peer:

{query}
index=_audit* user="service_prtg" action=edit_dist_peer

Results:

(They're all the same)

Audit:[timestamp=07-22-2019 17:54:27.679, user=service_prtg, action=edit_dist_peer, info=granted object="10.1.1.90:8089" operation=list][n/a]
action = edit_dist_peer source = audittrail sourcetype = audittrail user = service_prtg

===========================================================

If i use the operation ="remove" at the second query i'll get no result, altought, if the script run several times, the peer get deleted and generate no logs of it.

skalliger · ‎07-27-2019

We need actual data of the peer that had problems.

There are so many questions about this problem I'm not sure we can help you here. You need to find out why the indexer is going down. Is it overloaded, crashing due to running out of memory or anythinf else happening?

Look for any errors around the time in splunkd.log.

Skli