Deployment Architecture

Frozen Buckets with Error: Cannot replicate as bucket hasn't rolled yet.

dkeck
Influencer

Hi,

I have buckets in my pending fixup tasks on my cluster master that have status: "Cannot replicate as bucket hasn't rolled yet".

What confuses me is they are all frozen and already been deleted:

Fixup Reason
bid=XYZ removed from peer=XYZ, frozen=1

So where does a frozen bucket need to be replicated to?? or is this erros misleading?

I am not able to roll the buckets from the Cluster Master UI.

I would be happy for any hint.

Thank you

David

Labels (1)
0 Karma
1 Solution

isoutamo
SplunkTrust
SplunkTrust

Hi

We are hit this after upgrade to 8.1.4 (I have heard that this same issue has found also on 8.1.3). As you said this seems to be raised when there is some buckets which primary or secondary has already frozen and then e.g. you do a rolling restart. For me it seems to be a bug. I will report this to splunk support to verify it and get it fixed on future releases.

You could temporary clean this task list by rebooting CM, but it's just temporary work around. Those will be there sooner or later again.

Here is one dashboard which I use to verify this issue.

 

 

<form>
  <label>FS-SPL-Bucket-Info</label>
  <fieldset submitButton="false">
    <input type="time" token="timePicker">
      <label>Timeperiod</label>
      <default>
        <earliest>-7d@h</earliest>
        <latest>now</latest>
      </default>
    </input>
    <input type="text" token="bucketGuid" searchWhenChanged="true">
      <label>Bucket GUID</label>
    </input>
    <input type="text" token="bucketIdx">
      <label>Index</label>
    </input>
    <input type="text" token="bucketNbr">
      <label>Bucket #</label>
    </input>
  </fieldset>
  <row>
    <panel>
      <title>Bucket replication status</title>
      <table>
        <search>
          <query>| rest splunk_server=<ADD YOUR CM HERE> /services/cluster/master/buckets 
| search title=$bucketIdx$~$bucketNbr$~$bucketGuid$* 
| rex field=title "^(?&lt;repl_index&gt;[^\~]+)" 
| search repl_index="*" standalone=0 frozen=*
| rename title AS bucketID 
| fields bucketID peers.*.search_state *site* 
| untable bucketID siteState value 
| rex field=siteState "peers\.(?&lt;search_state&gt;[^\.]*?)\.search_state" 
| rex field=siteState "\.(?&lt;primaries_by_site&gt;\S+)" 
| rex field=siteState "\.(?&lt;rep_count_by_site&gt;\S+)" 
| rex field=siteState "\.(?&lt;search_count_by_site&gt;\S+)" 
| eval peerGUID=if(siteState=="primaries_by_site", value, peerGUID) 
| eval site=if(siteState=="origin_site", value, site) 
| eval value=if(siteState=="search_count_by_site", site + ":" + value, value) 
| eval value=if(siteState=="rep_count_by_site", site + ":" + value, value) 
| join type=outer peerGUID 
    [ rest splunk_server=<ADD YOUR CM HERE> /services/cluster/master/peers 
    | fields active_* host* label title status site 
    | eval PeerName= site + ":" + label + ":" + host_port_pair 
    | rename title AS peerGUID 
    | rename site AS peerSite 
    | table peerGUID PeerName peerSite ] 
| eval site=if(siteState=="search_state", peerSite, site) 
| eval value=if(siteState=="primaries_by_site", PeerName + ":For_" + site, value) 
| eval value=if(siteState=="search_state", PeerName + ":" + value, value) 
| fields - PeerName peerGUID peerSite 
| chart limit=0 values(value) over bucketID by siteState</query>
          <earliest>-24h@h</earliest>
          <latest>now</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="count">20</option>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">none</option>
        <option name="percentagesRow">false</option>
        <option name="rowNumbers">false</option>
        <option name="totalsRow">false</option>
        <option name="wrap">true</option>
      </table>
    </panel>
  </row>
  <row>
    <panel>
      <title>Bucket lifetime</title>
      <table>
        <search>
          <query>index=_internal sourcetype="splunkd" source="*/splunkd.log" (*_$bucketNbr$_$bucketGuid$ OR hot_v1_$bucketNbr$ OR $bucketIdx$~$bucketNbr$~$bucketGuid$) $bucketIdx$
| eval reason = coalesce(reason,Reason)
| table _time host log_level peer peer_name component bucketType event event_message search_status reason from to status path  
| sort +_time</query>
          <earliest>$timePicker.earliest$</earliest>
          <latest>$timePicker.latest$</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="count">20</option>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">none</option>
        <option name="percentagesRow">false</option>
        <option name="refresh.display">progressbar</option>
        <option name="rowNumbers">false</option>
        <option name="totalsRow">false</option>
        <option name="wrap">true</option>
      </table>
    </panel>
  </row>
</form>

 

 

 

View solution in original post

msnhd3
Loves-to-Learn

I'm having the same problem, did you find the solution?

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Hi

We are hit this after upgrade to 8.1.4 (I have heard that this same issue has found also on 8.1.3). As you said this seems to be raised when there is some buckets which primary or secondary has already frozen and then e.g. you do a rolling restart. For me it seems to be a bug. I will report this to splunk support to verify it and get it fixed on future releases.

You could temporary clean this task list by rebooting CM, but it's just temporary work around. Those will be there sooner or later again.

Here is one dashboard which I use to verify this issue.

 

 

<form>
  <label>FS-SPL-Bucket-Info</label>
  <fieldset submitButton="false">
    <input type="time" token="timePicker">
      <label>Timeperiod</label>
      <default>
        <earliest>-7d@h</earliest>
        <latest>now</latest>
      </default>
    </input>
    <input type="text" token="bucketGuid" searchWhenChanged="true">
      <label>Bucket GUID</label>
    </input>
    <input type="text" token="bucketIdx">
      <label>Index</label>
    </input>
    <input type="text" token="bucketNbr">
      <label>Bucket #</label>
    </input>
  </fieldset>
  <row>
    <panel>
      <title>Bucket replication status</title>
      <table>
        <search>
          <query>| rest splunk_server=<ADD YOUR CM HERE> /services/cluster/master/buckets 
| search title=$bucketIdx$~$bucketNbr$~$bucketGuid$* 
| rex field=title "^(?&lt;repl_index&gt;[^\~]+)" 
| search repl_index="*" standalone=0 frozen=*
| rename title AS bucketID 
| fields bucketID peers.*.search_state *site* 
| untable bucketID siteState value 
| rex field=siteState "peers\.(?&lt;search_state&gt;[^\.]*?)\.search_state" 
| rex field=siteState "\.(?&lt;primaries_by_site&gt;\S+)" 
| rex field=siteState "\.(?&lt;rep_count_by_site&gt;\S+)" 
| rex field=siteState "\.(?&lt;search_count_by_site&gt;\S+)" 
| eval peerGUID=if(siteState=="primaries_by_site", value, peerGUID) 
| eval site=if(siteState=="origin_site", value, site) 
| eval value=if(siteState=="search_count_by_site", site + ":" + value, value) 
| eval value=if(siteState=="rep_count_by_site", site + ":" + value, value) 
| join type=outer peerGUID 
    [ rest splunk_server=<ADD YOUR CM HERE> /services/cluster/master/peers 
    | fields active_* host* label title status site 
    | eval PeerName= site + ":" + label + ":" + host_port_pair 
    | rename title AS peerGUID 
    | rename site AS peerSite 
    | table peerGUID PeerName peerSite ] 
| eval site=if(siteState=="search_state", peerSite, site) 
| eval value=if(siteState=="primaries_by_site", PeerName + ":For_" + site, value) 
| eval value=if(siteState=="search_state", PeerName + ":" + value, value) 
| fields - PeerName peerGUID peerSite 
| chart limit=0 values(value) over bucketID by siteState</query>
          <earliest>-24h@h</earliest>
          <latest>now</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="count">20</option>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">none</option>
        <option name="percentagesRow">false</option>
        <option name="rowNumbers">false</option>
        <option name="totalsRow">false</option>
        <option name="wrap">true</option>
      </table>
    </panel>
  </row>
  <row>
    <panel>
      <title>Bucket lifetime</title>
      <table>
        <search>
          <query>index=_internal sourcetype="splunkd" source="*/splunkd.log" (*_$bucketNbr$_$bucketGuid$ OR hot_v1_$bucketNbr$ OR $bucketIdx$~$bucketNbr$~$bucketGuid$) $bucketIdx$
| eval reason = coalesce(reason,Reason)
| table _time host log_level peer peer_name component bucketType event event_message search_status reason from to status path  
| sort +_time</query>
          <earliest>$timePicker.earliest$</earliest>
          <latest>$timePicker.latest$</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="count">20</option>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">none</option>
        <option name="percentagesRow">false</option>
        <option name="refresh.display">progressbar</option>
        <option name="rowNumbers">false</option>
        <option name="totalsRow">false</option>
        <option name="wrap">true</option>
      </table>
    </panel>
  </row>
</form>

 

 

 

isoutamo
SplunkTrust
SplunkTrust
Hi
I'm waiting a final confirmation from developers that this is a bug. Splunk support already confirms that this seems to be a bug.
r. Ismo

bkresoja
Engager
2021-06-01SPL-206510, SPL-213903, SPL-213901, SPL-213902CM issues fixup tasks for "frozen in cluster" clustered buckets

Workaround:
Workaround 1:

Restart CM

Workaround 2: Use a REST search on the CM to only show fixup for non-frozen bucket. In versions lower than 8.2.0, the endpoint name is master, not manager.

| rest splunk_server=local /services/cluster/manager/fixup level=replication_factor 
| table index title initial.reason latest.reason level 
| append 
    [| rest splunk_server=local /services/cluster/manager/fixup level=search_factor 
    | table index title initial.reason latest.reason level ] 
| join title 
    [| rest splunk_server=local /services/cluster/manager/buckets f=title f=frozen search="frozen=0"] 
| stats values(level) AS level values(frozen) AS frozen values(initial.reason) AS initial.reason values(latest.reason) AS latest.reason BY index title
Get Updates on the Splunk Community!

New Splunk Observability innovations: Deeper visibility and smarter alerting to ...

You asked, we delivered. Splunk Observability Cloud has several new innovations giving you deeper visibility ...

Synthetic Monitoring: Not your Grandma’s Polyester! Tech Talk: DevOps Edition

Register today and join TekStream on Tuesday, February 28 at 11am PT/2pm ET for a demonstration of Splunk ...

Instrumenting Java Websocket Messaging

Instrumenting Java Websocket MessagingThis article is a code-based discussion of passing OpenTelemetry trace ...