Deployment Architecture

Frozen Buckets with Error: Cannot replicate as bucket hasn't rolled yet.

dkeck
Influencer

Hi,

I have buckets in my pending fixup tasks on my cluster master that have status: "Cannot replicate as bucket hasn't rolled yet".

What confuses me is they are all frozen and already been deleted:

Fixup Reason
bid=XYZ removed from peer=XYZ, frozen=1

So where does a frozen bucket need to be replicated to?? or is this erros misleading?

I am not able to roll the buckets from the Cluster Master UI.

I would be happy for any hint.

Thank you

David

Labels (1)
0 Karma
1 Solution

soutamo
SplunkTrust
SplunkTrust

Hi

We are hit this after upgrade to 8.1.4 (I have heard that this same issue has found also on 8.1.3). As you said this seems to be raised when there is some buckets which primary or secondary has already frozen and then e.g. you do a rolling restart. For me it seems to be a bug. I will report this to splunk support to verify it and get it fixed on future releases.

You could temporary clean this task list by rebooting CM, but it's just temporary work around. Those will be there sooner or later again.

Here is one dashboard which I use to verify this issue.

 

 

<form>
  <label>FS-SPL-Bucket-Info</label>
  <fieldset submitButton="false">
    <input type="time" token="timePicker">
      <label>Timeperiod</label>
      <default>
        <earliest>-7d@h</earliest>
        <latest>now</latest>
      </default>
    </input>
    <input type="text" token="bucketGuid" searchWhenChanged="true">
      <label>Bucket GUID</label>
    </input>
    <input type="text" token="bucketIdx">
      <label>Index</label>
    </input>
    <input type="text" token="bucketNbr">
      <label>Bucket #</label>
    </input>
  </fieldset>
  <row>
    <panel>
      <title>Bucket replication status</title>
      <table>
        <search>
          <query>| rest splunk_server=<ADD YOUR CM HERE> /services/cluster/master/buckets 
| search title=$bucketIdx$~$bucketNbr$~$bucketGuid$* 
| rex field=title "^(?&lt;repl_index&gt;[^\~]+)" 
| search repl_index="*" standalone=0 frozen=*
| rename title AS bucketID 
| fields bucketID peers.*.search_state *site* 
| untable bucketID siteState value 
| rex field=siteState "peers\.(?&lt;search_state&gt;[^\.]*?)\.search_state" 
| rex field=siteState "\.(?&lt;primaries_by_site&gt;\S+)" 
| rex field=siteState "\.(?&lt;rep_count_by_site&gt;\S+)" 
| rex field=siteState "\.(?&lt;search_count_by_site&gt;\S+)" 
| eval peerGUID=if(siteState=="primaries_by_site", value, peerGUID) 
| eval site=if(siteState=="origin_site", value, site) 
| eval value=if(siteState=="search_count_by_site", site + ":" + value, value) 
| eval value=if(siteState=="rep_count_by_site", site + ":" + value, value) 
| join type=outer peerGUID 
    [ rest splunk_server=<ADD YOUR CM HERE> /services/cluster/master/peers 
    | fields active_* host* label title status site 
    | eval PeerName= site + ":" + label + ":" + host_port_pair 
    | rename title AS peerGUID 
    | rename site AS peerSite 
    | table peerGUID PeerName peerSite ] 
| eval site=if(siteState=="search_state", peerSite, site) 
| eval value=if(siteState=="primaries_by_site", PeerName + ":For_" + site, value) 
| eval value=if(siteState=="search_state", PeerName + ":" + value, value) 
| fields - PeerName peerGUID peerSite 
| chart limit=0 values(value) over bucketID by siteState</query>
          <earliest>-24h@h</earliest>
          <latest>now</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="count">20</option>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">none</option>
        <option name="percentagesRow">false</option>
        <option name="rowNumbers">false</option>
        <option name="totalsRow">false</option>
        <option name="wrap">true</option>
      </table>
    </panel>
  </row>
  <row>
    <panel>
      <title>Bucket lifetime</title>
      <table>
        <search>
          <query>index=_internal sourcetype="splunkd" source="*/splunkd.log" (*_$bucketNbr$_$bucketGuid$ OR hot_v1_$bucketNbr$ OR $bucketIdx$~$bucketNbr$~$bucketGuid$) $bucketIdx$
| eval reason = coalesce(reason,Reason)
| table _time host log_level peer peer_name component bucketType event event_message search_status reason from to status path  
| sort +_time</query>
          <earliest>$timePicker.earliest$</earliest>
          <latest>$timePicker.latest$</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="count">20</option>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">none</option>
        <option name="percentagesRow">false</option>
        <option name="refresh.display">progressbar</option>
        <option name="rowNumbers">false</option>
        <option name="totalsRow">false</option>
        <option name="wrap">true</option>
      </table>
    </panel>
  </row>
</form>

 

 

 

View solution in original post

msnhd3
Loves-to-Learn

I'm having the same problem, did you find the solution?

0 Karma

soutamo
SplunkTrust
SplunkTrust

Hi

We are hit this after upgrade to 8.1.4 (I have heard that this same issue has found also on 8.1.3). As you said this seems to be raised when there is some buckets which primary or secondary has already frozen and then e.g. you do a rolling restart. For me it seems to be a bug. I will report this to splunk support to verify it and get it fixed on future releases.

You could temporary clean this task list by rebooting CM, but it's just temporary work around. Those will be there sooner or later again.

Here is one dashboard which I use to verify this issue.

 

 

<form>
  <label>FS-SPL-Bucket-Info</label>
  <fieldset submitButton="false">
    <input type="time" token="timePicker">
      <label>Timeperiod</label>
      <default>
        <earliest>-7d@h</earliest>
        <latest>now</latest>
      </default>
    </input>
    <input type="text" token="bucketGuid" searchWhenChanged="true">
      <label>Bucket GUID</label>
    </input>
    <input type="text" token="bucketIdx">
      <label>Index</label>
    </input>
    <input type="text" token="bucketNbr">
      <label>Bucket #</label>
    </input>
  </fieldset>
  <row>
    <panel>
      <title>Bucket replication status</title>
      <table>
        <search>
          <query>| rest splunk_server=<ADD YOUR CM HERE> /services/cluster/master/buckets 
| search title=$bucketIdx$~$bucketNbr$~$bucketGuid$* 
| rex field=title "^(?&lt;repl_index&gt;[^\~]+)" 
| search repl_index="*" standalone=0 frozen=*
| rename title AS bucketID 
| fields bucketID peers.*.search_state *site* 
| untable bucketID siteState value 
| rex field=siteState "peers\.(?&lt;search_state&gt;[^\.]*?)\.search_state" 
| rex field=siteState "\.(?&lt;primaries_by_site&gt;\S+)" 
| rex field=siteState "\.(?&lt;rep_count_by_site&gt;\S+)" 
| rex field=siteState "\.(?&lt;search_count_by_site&gt;\S+)" 
| eval peerGUID=if(siteState=="primaries_by_site", value, peerGUID) 
| eval site=if(siteState=="origin_site", value, site) 
| eval value=if(siteState=="search_count_by_site", site + ":" + value, value) 
| eval value=if(siteState=="rep_count_by_site", site + ":" + value, value) 
| join type=outer peerGUID 
    [ rest splunk_server=<ADD YOUR CM HERE> /services/cluster/master/peers 
    | fields active_* host* label title status site 
    | eval PeerName= site + ":" + label + ":" + host_port_pair 
    | rename title AS peerGUID 
    | rename site AS peerSite 
    | table peerGUID PeerName peerSite ] 
| eval site=if(siteState=="search_state", peerSite, site) 
| eval value=if(siteState=="primaries_by_site", PeerName + ":For_" + site, value) 
| eval value=if(siteState=="search_state", PeerName + ":" + value, value) 
| fields - PeerName peerGUID peerSite 
| chart limit=0 values(value) over bucketID by siteState</query>
          <earliest>-24h@h</earliest>
          <latest>now</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="count">20</option>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">none</option>
        <option name="percentagesRow">false</option>
        <option name="rowNumbers">false</option>
        <option name="totalsRow">false</option>
        <option name="wrap">true</option>
      </table>
    </panel>
  </row>
  <row>
    <panel>
      <title>Bucket lifetime</title>
      <table>
        <search>
          <query>index=_internal sourcetype="splunkd" source="*/splunkd.log" (*_$bucketNbr$_$bucketGuid$ OR hot_v1_$bucketNbr$ OR $bucketIdx$~$bucketNbr$~$bucketGuid$) $bucketIdx$
| eval reason = coalesce(reason,Reason)
| table _time host log_level peer peer_name component bucketType event event_message search_status reason from to status path  
| sort +_time</query>
          <earliest>$timePicker.earliest$</earliest>
          <latest>$timePicker.latest$</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="count">20</option>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">none</option>
        <option name="percentagesRow">false</option>
        <option name="refresh.display">progressbar</option>
        <option name="rowNumbers">false</option>
        <option name="totalsRow">false</option>
        <option name="wrap">true</option>
      </table>
    </panel>
  </row>
</form>

 

 

 

View solution in original post

soutamo
SplunkTrust
SplunkTrust
Hi
I'm waiting a final confirmation from developers that this is a bug. Splunk support already confirms that this seems to be a bug.
r. Ismo
0 Karma
Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.