Search Head Cluster - CPU spike and Risk datamodel...

a_kearney · ‎12-22-2023

Hi,

I have noticed over the last 4 days I had an increased number of Search Bundle replication errors:

12-21-2023 09:50:12.604 +0000 WARN ConfReplicationThread [9209 ConfReplicationThread] - Error pushing configurations to captain=https://searchHeadCaptain:8089, consecutiveErrors=1 msg="Error in acceptPush: Non-200 status_code=400: ConfReplicationException: Cannot accept push with outdated_baseline_op_id=16ed9160640170315673324237791a4cfe256d59; current_baseline_op_id=cd93950208af34df00957e721b87128d3629d2d1"

These occur in groups every 4 hours. I have also seen CPU spikes on the Search Heads that started occuring at the same time and also every 4 hours.

Further investigation has shown that the following events from conf.log have also been occuring at the same time every 4 hours

{ [-]  
    component: ConfOp  
    data: { [-] 
      applied_at: 1703264397  
      asset_id: 220d8bbce6d790850cda3980c5784c62b1a9f9ff  
      asset_uri: [ [+] 
     ]  
      from_repo: https://searchHeadCaptain:8089  
      op_id: 102aa206f930da5eef0d47163b354c61254566c5  
      optype: 2  
      optype_desc: WRITE_STANZA  
      payload: { [-] 
        alias: Risk  
        metadata: { [-] 
          permissions: { [-] 
         }  
       }  
        value: ***TRANSIENT***://6613  
     }  
      payload_extra: ***ALLOW_SKIP_ON_WRITE***  
      status: applied  
      task: pullFrom  
      to_repo: https://searchHeadPeer.com:8089  
      to_repo_change_count: 20214  
   }  
    datetime: 12-22-2023 16:59:57.097 +0000  
    log_level: INFO   
}

Does anyone know what these events mean and how I can find out what is causing them?

Bundle replication errors:

conf.log events:

CPU spikes:

VatsalJagani · ‎12-24-2023

@a_kearney -

How many Search heads do you have in a cluster?
Are any cluster members down?
In recent incidents of cluster members being down?
Are you also seeing "Consider a lower value of conf_replication_max_push_count" warning messages in your logs??

Usually "consecutiveErrors=1" isn't bad, unlike in your situation it happens a lot, which is concerning.

I hope this helps!!! Kindly upvote if it does!!!

Search Head Cluster - CPU spike and Risk datamodel changes?

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Painting a Clearer Picture: Creating Cross-Domain Visibility with AI Canvas

Analytics Workspace deprecation

Splunk Developer Day Recap: Building, Publishing, and Growing on the Splunk Platform

Join the Conversation