When my splunk multi-site indexer cluster comes up, I have some buckets belonging to _audit and _internal which are having issues getting replicated, due to which Indexer clustering dashboard on Cluster Master shows, Replication Factor not met. I can see the bucket names from the dashboard page, by clicking on the bucket status button. Then when I delete those buckets from Cluster Master CLI, everything is back to normal and my dashboard says "Rep. factor met".
I want to know instead of Splunk dashboard UI, is there a way to get the bucket names which are having replication issues, via CLI or REST API?
I think there was a problem with copying the regex extraction. The original search should be
| rest /services/cluster/master/buckets splunk_server=*
| rex field=title "^(?<repl_index>[^\~]+)"
| search repl_index="*" standalone=0 frozen=0
| rename title AS bucketID
| fields bucketID peers.*.search_state *site*
| untable bucketID siteState value
| rex field=siteState "peers\.(?<peerGUID>[^\.]*)\.(search_state)"
| rex field=siteState "(?<siteState>primaries_by_site)\.(\S+)"
| rex field=siteState "(?<siteState>rep_count_by_site)\.(\S+)"
| rex field=siteState "(?<siteState>search_count_by_site)\.(\S+)"| eval peerGUID=if(siteState=="primaries_by_site", value, peerGUID)
| eval site=if(siteState=="origin_site", value, site)
| eval value=if(siteState=="search_count_by_site", site + ":" + value, value)
| eval value=if(siteState=="rep_count_by_site", site + ":" + value, value)
| eval peerGUID=if(siteState=="primaries_by_site", value, peerGUID)
| eval site=if(siteState=="origin_site", value, site)
| eval value=if(siteState=="search_count_by_site", site + ":" + value, value)
| eval value=if(siteState=="rep_count_by_site", site + ":" + value, value)
| join type=outer peerGUID [ rest /services/cluster/master/peers splunk_server=*
| fields active_* host* label title status site
| eval PeerName= site + ":" + label + ":" + host_port_pair
| rename title AS peerGUID
| rename site AS peerSite
| table peerGUID PeerName peerSite ]
| eval site=if(siteState=="search_state", peerSite, site)
| eval value=if(siteState=="primaries_by_site", PeerName + ":For_" + site, value)
| eval value=if(siteState=="search_state", PeerName + ":" + value, value)
| fields - PeerName peerGUID peerSite | chart values(value) over bucketID by siteState
This search comes courtesy of my co-worker @Masa
| rest /services/cluster/master/buckets
| rex field=title "^(?[^\~]+)"
| search repl_index="*" standalone=0 frozen=0
| rename title AS bucketID
| fields bucketID *origin_site* *_by_site*
| untable bucketID siteState value
| rex mode=sed field=siteState "s/\./__/"
| rex mode=sed field=siteState "s/_count_/_/"
| search NOT siteState=primaries_*
| xyseries bucketID siteState value
| fields - search_by_site
| fillnull
| eval rep_total= rep_by_site__site1 + rep_by_site__site2 + rep_by_site__site3
| eval srch_total = search_by_site__site1 + search_by_site__site2 + search_by_site__site3
| rename constrain_to_origin_site AS constrain
| rename origin_site AS origin
| rename rep_by_site__site1 AS rep_site1
| rename rep_by_site__site2 AS rep_site2
| rename rep_by_site__site3 AS rep_site3
| rename search_by_site__site1 AS srch_site1
| rename search_by_site__site2 AS srch_site2
| rename search_by_site__site3 AS srch_site3
bucketID constrain origin rep_site1 rep_site2 rep_site3 rep_total srch_site1 srch_site2 srch_site3 srch_total
_audit~118~FF782A13-8AFB-4617-BCB4-15ED11928DD7 0 site1 2 1 1 4 2 1 1 4
_audit~119~FF782A13-8AFB-4617-BCB4-15ED11928DD7 0 site1 2 1 2 5 2 1 1 4
You can further filter out for buckets where rep or search factor is not met (assuming your rep factor=4 and search factor=3) by appending this to the end of the search:
| search rep_total<4 OR srch_total<3
Note: remove references to site3 in the search if you only have 2 sites in the multi-site cluster
| rest /services/cluster/master/buckets
| rex field=title "^(?[^\~]+)"
| search repl_index="*" standalone=0 frozen=0
| rename title AS bucketID
| fields bucketID peers.*.search_state *site*
| untable bucketID siteState value
| rex field=siteState "peers\.(?[^\.]*?)\.(?search_state)"
| rex field=siteState "(?primaries_by_site)\.(?\S+)"
| rex field=siteState "(?rep_count_by_site)\.(?\S+)"
| rex field=siteState "(?search_count_by_site)\.(?\S+)"
| eval peerGUID=if(siteState=="primaries_by_site", value, peerGUID)
| eval site=if(siteState=="origin_site", value, site)
| eval value=if(siteState=="search_count_by_site", site + ":" + value, value)
| eval value=if(siteState=="rep_count_by_site", site + ":" + value, value)
| join type=outer peerGUID [ rest /services/cluster/master/peers
| fields active_* host* label title status site
| eval PeerName= site + ":" + label + ":" + host_port_pair
| rename title AS peerGUID
| rename site AS peerSite
| table peerGUID PeerName peerSite ]
| eval site=if(siteState=="search_state", peerSite, site)
| eval value=if(siteState=="primaries_by_site", PeerName + ":For_" + site, value)
| eval value=if(siteState=="search_state", PeerName + ":" + value, value)
| fields - PeerName peerGUID peerSite | chart values(value) over bucketID by siteState
bucketID constrain origin primaries_by_site rep_by_site
srch_by_site search_state
_audit~118~FF782A13-8AFB-4617-BCB4-15ED11928DD7 0 site1 site1:centos58-64sup01-620CP:10.140.48.137:55591:For_site1 site1:2 site1:2
site1:centos58-64sup01-620CP:10.140.48.137:55591:Searchable
site2:centos65-64sup14-620CP:10.140.48.150:55591:For_site2 site2:1 site2:1
site1:centos65-64sup06-620CP:10.140.48.142:55591:Searchable
site3:centos62-64sup13-620CP:10.140.48.149:55591:For_site3 site3:1 site3:1
site2:centos65-64sup14-620CP:10.140.48.150:55591:Searchable
site3:centos62-64sup13-620CP:10.140.48.149:55591:Searchable
_audit~119~FF782A13-8AFB-4617-BCB4-15ED11928DD7 0 site1 site1:centos58-64sup01-620CP:10.140.48.137:55591:For_site1 site1:2 site1:2
site1:centos58-64sup01-620CP:10.140.48.137:55591:Searchable
site2:centos65-64sup14-620CP:10.140.48.150:55591:For_site2 site2:1 site2:1
site1:centos65-64sup06-620CP:10.140.48.142:55591:Searchable
site3:centos62-64sup13-620CP:10.140.48.149:55591:For_site3 site3:2 site3:1
site2:centos65-64sup14-620CP:10.140.48.150:55591:Searchable
site3:centos62-64sup12-620CP:10.140.48.148:55591:Unsearchable
site3:centos62-64sup13-620CP:10.140.48.149:55591:Searchable
It seems that this query has gone broken when migrated to the new community platform. Here is fixed versio if someone else also needs it. I need to check bucket status as it seems that splunk 8.1.4 (have heard that also 8.1.3) has broken replication for buckets this has only some buckets left after some has frozen.
| rest splunk_server=<ADD YOUR CM HERE> /services/cluster/master/buckets
``` if you know bucket id add it here ```
``` | search title=$bucketIdx$~$bucketNbr$~$bucketGuid$* ```
| rex field=title "^(<repl_index>[^\~]+)"
| search repl_index="*" standalone=0 frozen=*
| rename title AS bucketID
| fields bucketID peers.*.search_state *site*
| untable bucketID siteState value
| rex field=siteState "peers\.(<search_state>[^\.]*?)\.search_state"
| rex field=siteState "\.(<primaries_by_site>\S+)"
| rex field=siteState "\.(<rep_count_by_site>\S+)"
| rex field=siteState "\.(<search_count_by_site>\S+)"
| eval peerGUID=if(siteState=="primaries_by_site", value, peerGUID)
| eval site=if(siteState=="origin_site", value, site)
| eval value=if(siteState=="search_count_by_site", site + ":" + value, value)
| eval value=if(siteState=="rep_count_by_site", site + ":" + value, value)
| join type=outer peerGUID
[ rest splunk_server=<ADD YOUR CM HERE> /services/cluster/master/peers
| fields active_* host* label title status site
| eval PeerName= site + ":" + label + ":" + host_port_pair
| rename title AS peerGUID
| rename site AS peerSite
| table peerGUID PeerName peerSite ]
| eval site=if(siteState=="search_state", peerSite, site)
| eval value=if(siteState=="primaries_by_site", PeerName + ":For_" + site, value)
| eval value=if(siteState=="search_state", PeerName + ":" + value, value)
| fields - PeerName peerGUID peerSite
| chart limit=0 values(value) over bucketID by siteState
You should replace <ADD YOUR CM HERE> with your Cluster Master name.
r. Ismo
I thought this was a great query to have. But unfortunately it is dangerous on a cluster with 600 indexers. Every time I ran it, Splunk got killed by the kernel due to "out of memory"