Splunk Enterprise

Search Head Cluster - CPU spike and Risk datamodel changes?

a_kearney
Path Finder

Hi,

I have noticed over the last 4 days I had an increased number of Search Bundle replication errors:

12-21-2023 09:50:12.604 +0000 WARN ConfReplicationThread [9209 ConfReplicationThread] - Error pushing configurations to captain=https://searchHeadCaptain:8089, consecutiveErrors=1 msg="Error in acceptPush: Non-200 status_code=400: ConfReplicationException: Cannot accept push with outdated_baseline_op_id=16ed9160640170315673324237791a4cfe256d59; current_baseline_op_id=cd93950208af34df00957e721b87128d3629d2d1"

These occur in groups every 4 hours. I have also seen CPU spikes on the Search Heads that started occuring at the same time and also every 4 hours.

Further investigation has shown that the following events from conf.log have also been occuring at the same time every 4 hours

{ [-]  
   component: ConfOp
   data: { [-]
     applied_at: 1703264397
     asset_id: 220d8bbce6d790850cda3980c5784c62b1a9f9ff
     asset_uri: [ [+]
     ]

     from_repo: https://searchHeadCaptain:8089
     op_id: 102aa206f930da5eef0d47163b354c61254566c5
     optype: 2
     optype_desc: WRITE_STANZA
     payload: { [-]
       alias: Risk
       metadata: { [-]
         permissions: { [-]
         }

       }

       value: ***TRANSIENT***://6613
     }

     payload_extra: ***ALLOW_SKIP_ON_WRITE***
     status: applied
     task: pullFrom
     to_repo: https://searchHeadPeer.com:8089
     to_repo_change_count: 20214
   }

   datetime: 12-22-2023 16:59:57.097 +0000
   log_level: INFO

}

Does anyone know what these events mean and how I can find out what is causing them?

Bundle replication errors:

a_kearney_3-1703266594384.png

 

conf.log events:

a_kearney_1-1703266278620.png

 

CPU spikes:

a_kearney_2-1703266443624.png

 

Labels (1)
0 Karma

VatsalJagani
SplunkTrust
SplunkTrust

@a_kearney -

  • How many Search heads do you have in a cluster?
  • Are any cluster members down?
  • In recent incidents of cluster members being down?
  • Are you also seeing "Consider a lower value of conf_replication_max_push_count" warning messages in your logs??


Usually "consecutiveErrors=1" isn't bad, unlike in your situation it happens a lot, which is concerning.

 

I hope this helps!!! Kindly upvote if it does!!!

0 Karma
Get Updates on the Splunk Community!

Accelerate Service Onboarding, Decomposition, Troubleshooting - and more with ITSI’s ...

Accelerate Service Onboarding, Decomposition, Troubleshooting - and more! Faster Time to ValueManaging and ...

New Release | Splunk Enterprise 9.3

Admins and Analyst can benefit from:  Seamlessly route data to your local file system to save on storage ...

2024 Splunk Career Impact Survey | Earn a $20 gift card for participating!

Hear ye, hear ye! The time has come again for Splunk's annual Career Impact Survey!  We need your help by ...