Splunk Enterprise

Search Head Cluster - CPU spike and Risk datamodel changes?

a_kearney
Path Finder

Hi,

I have noticed over the last 4 days I had an increased number of Search Bundle replication errors:

12-21-2023 09:50:12.604 +0000 WARN ConfReplicationThread [9209 ConfReplicationThread] - Error pushing configurations to captain=https://searchHeadCaptain:8089, consecutiveErrors=1 msg="Error in acceptPush: Non-200 status_code=400: ConfReplicationException: Cannot accept push with outdated_baseline_op_id=16ed9160640170315673324237791a4cfe256d59; current_baseline_op_id=cd93950208af34df00957e721b87128d3629d2d1"

These occur in groups every 4 hours. I have also seen CPU spikes on the Search Heads that started occuring at the same time and also every 4 hours.

Further investigation has shown that the following events from conf.log have also been occuring at the same time every 4 hours

{ [-]  
   component: ConfOp
   data: { [-]
     applied_at: 1703264397
     asset_id: 220d8bbce6d790850cda3980c5784c62b1a9f9ff
     asset_uri: [ [+]
     ]

     from_repo: https://searchHeadCaptain:8089
     op_id: 102aa206f930da5eef0d47163b354c61254566c5
     optype: 2
     optype_desc: WRITE_STANZA
     payload: { [-]
       alias: Risk
       metadata: { [-]
         permissions: { [-]
         }

       }

       value: ***TRANSIENT***://6613
     }

     payload_extra: ***ALLOW_SKIP_ON_WRITE***
     status: applied
     task: pullFrom
     to_repo: https://searchHeadPeer.com:8089
     to_repo_change_count: 20214
   }

   datetime: 12-22-2023 16:59:57.097 +0000
   log_level: INFO

}

Does anyone know what these events mean and how I can find out what is causing them?

Bundle replication errors:

a_kearney_3-1703266594384.png

 

conf.log events:

a_kearney_1-1703266278620.png

 

CPU spikes:

a_kearney_2-1703266443624.png

 

0 Karma

VatsalJagani
SplunkTrust
SplunkTrust

@a_kearney -

  • How many Search heads do you have in a cluster?
  • Are any cluster members down?
  • In recent incidents of cluster members being down?
  • Are you also seeing "Consider a lower value of conf_replication_max_push_count" warning messages in your logs??


Usually "consecutiveErrors=1" isn't bad, unlike in your situation it happens a lot, which is concerning.

 

I hope this helps!!! Kindly upvote if it does!!!

0 Karma
Get Updates on the Splunk Community!

Index This | When is October more than just the tenth month?

October 2025 Edition  Hayyy Splunk Education Enthusiasts and the Eternally Curious!   We’re back with this ...

Observe and Secure All Apps with Splunk

  Join Us for Our Next Tech Talk: Observe and Secure All Apps with SplunkAs organizations continue to innovate ...

What’s New & Next in Splunk SOAR

 Security teams today are dealing with more alerts, more tools, and more pressure than ever.  Join us for an ...