Splunk Enterprise

Search Head Cluster - CPU spike and Risk datamodel changes?

a_kearney
Path Finder

Hi,

I have noticed over the last 4 days I had an increased number of Search Bundle replication errors:

12-21-2023 09:50:12.604 +0000 WARN ConfReplicationThread [9209 ConfReplicationThread] - Error pushing configurations to captain=https://searchHeadCaptain:8089, consecutiveErrors=1 msg="Error in acceptPush: Non-200 status_code=400: ConfReplicationException: Cannot accept push with outdated_baseline_op_id=16ed9160640170315673324237791a4cfe256d59; current_baseline_op_id=cd93950208af34df00957e721b87128d3629d2d1"

These occur in groups every 4 hours. I have also seen CPU spikes on the Search Heads that started occuring at the same time and also every 4 hours.

Further investigation has shown that the following events from conf.log have also been occuring at the same time every 4 hours

{ [-]  
   component: ConfOp
   data: { [-]
     applied_at: 1703264397
     asset_id: 220d8bbce6d790850cda3980c5784c62b1a9f9ff
     asset_uri: [ [+]
     ]

     from_repo: https://searchHeadCaptain:8089
     op_id: 102aa206f930da5eef0d47163b354c61254566c5
     optype: 2
     optype_desc: WRITE_STANZA
     payload: { [-]
       alias: Risk
       metadata: { [-]
         permissions: { [-]
         }

       }

       value: ***TRANSIENT***://6613
     }

     payload_extra: ***ALLOW_SKIP_ON_WRITE***
     status: applied
     task: pullFrom
     to_repo: https://searchHeadPeer.com:8089
     to_repo_change_count: 20214
   }

   datetime: 12-22-2023 16:59:57.097 +0000
   log_level: INFO

}

Does anyone know what these events mean and how I can find out what is causing them?

Bundle replication errors:

a_kearney_3-1703266594384.png

 

conf.log events:

a_kearney_1-1703266278620.png

 

CPU spikes:

a_kearney_2-1703266443624.png

 

Labels (1)
0 Karma

VatsalJagani
SplunkTrust
SplunkTrust

@a_kearney -

  • How many Search heads do you have in a cluster?
  • Are any cluster members down?
  • In recent incidents of cluster members being down?
  • Are you also seeing "Consider a lower value of conf_replication_max_push_count" warning messages in your logs??


Usually "consecutiveErrors=1" isn't bad, unlike in your situation it happens a lot, which is concerning.

 

I hope this helps!!! Kindly upvote if it does!!!

0 Karma
Get Updates on the Splunk Community!

Optimize Cloud Monitoring

  TECH TALKS Optimize Cloud Monitoring Tuesday, August 13, 2024  |  11:00AM–12:00PM PST   Register to ...

What's New in Splunk Cloud Platform 9.2.2403?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.2.2403! Analysts can ...

Stay Connected: Your Guide to July and August Tech Talks, Office Hours, and Webinars!

Dive into our sizzling summer lineup for July and August Community Office Hours and Tech Talks. Scroll down to ...