Deployment Architecture

Delete corrupt bucket or down index in cluster

1StopBloke
Explorer

Hi,

I've been getting a few errors like this recently as reported by various nodes (shows up in master server messages):

Search peer s2splunk02 has the following message: Failed to make bucket = _internal~148~307D1B57-3D07-45F3-A0FC-A6BB94644886 searchable, retry count = 106.

I've also been failing to reach my search factor (2) for this index in our cluster, it always shows that there is 1 bucket that is not replicated.

I tried repairing the bucket when our node was offline (and the cluster is in maintenance mode) but I received this error:

[splunk@s2splunk02 ~/var/lib/splunk/_internaldb/db]$ splunk fsck repair --one-bucket --bucket-path=/opt/splunk/var/lib/splunk/_internaldb/db/rb_1397557401_1397547003_148_307D1B57-3D07-45F3-A0FC-A6BB94644886
Not loading indexes.conf; will proceed with all defaults
Operating on: idx= bucket='/opt/splunk/var/lib/splunk/_internaldb/db/rb_1397557401_1397547003_148_307D1B57-3D07-45F3-A0FC-A6BB94644886'
Error reading compressed journal while streaming: gzip data truncated, provider=/opt/splunk/var/lib/splunk/_internaldb/db/rb_1397557401_1397547003_148_307D1B57-3D07-45F3-A0FC-A6BB94644886/rawdata/journal.gz
Repair (entire bucket) idx= bucket='/opt/splunk/var/lib/splunk/_internaldb/db/rb_1397557401_1397547003_148_307D1B57-3D07-45F3-A0FC-A6BB94644886' failed: (entire bucket) Rebuild for bkt='/opt/splunk/var/lib/splunk/_internaldb/db/rb_1397557401_1397547003_148_307D1B57-3D07-45F3-A0FC-A6BB94644886' failed: Error reading compressed journal while streaming: gzip data truncated, provider=/opt/splunk/var/lib/splunk/_internaldb/db/rb_1397557401_1397547003_148_307D1B57-3D07-45F3-A0FC-A6BB94644886/rawdata/journal.gz

At this stage we're still developing our cluster and testing so I'm not too worried about the data, however I've no idea how to get rid of the bucket. I tried to stop splunk on all our peers and delete the bucket but it just re-appears. So how can I delete a bucket in a cluster?

In this thread they found a solution of offline-onlining the index, but I can't work out how to do that for a full cluster. Doing it on an individual peer complains that this is not a valid command for a cluster index node. Any help?

Tags (3)
1 Solution

dxu_splunk
Splunk Employee
Splunk Employee

In 6.0.x:

You can remove the bucket by hitting the bucket's remove_all endpoint on the master.

curl -k -u USER:LOGIN (https://) master_uri:mgmt_port/services/cluster/master/buckets/BUCKET_ID/remove_all -X POST

Otherwise if you're on 5.0.x, I'd freeze the bucket and it should leave you alone.

curl -k -u USER:LOGIN (https://) master_uri:mgmt_port/services/cluster/master/buckets/BUCKET_ID/freeze -X POST

UPDATE - there is now also remove one copy of a bucket:

curl -k -u admin:changeme "https://MASTER:MGMT/services/cluster/master/buckets/main~1490~D4A07A5D-3C3C-4D36-BD70-D610B432466F/remove_from_peer" -d peer=BBBBBBBB-BBBB-BBBB-BBBB-BBBBBBBBBBB

View solution in original post

rbal_splunk
Splunk Employee
Splunk Employee

Just to Sum up :

REMOVE FROM ONE Peer:
curl -k -u admin:changeme "https://CM:8089/services/cluster/master/buckets/_audit~5~A5F789C3-22C0-407C-9B6B-10C8705F1C3D/remove..." -d peer=GUID_FOR_PEER

REMOVE FROM ALL:
curl -k -u admin:changeme -X POST "https://CM:80890/services/cluster/master/buckets/ _audit~5~A5F789C3-22C0-407C-9B6B-10C8705F1C3D/remove_all"

0 Karma

rbal_splunk
Splunk Employee
Splunk Employee

You can use curl command to delete bucket from a peer.
Say you have Bucket rest endpoint showing bucket _audit~5~A5F789C3-22C0-407C-9B6B-10C8705F1C3D on two peer like below and you need to delete it from peer with GUD>9D08E9A4-25E1-41ED-876A-737F32840B83

alt text

You can use curl command like

curl -k -u admin: https://CM_URL:8089/services/cluster/master/buckets/_audit~5~A5F789C3-22C0-407C-9B6B-10C8705F1C3D/re... -X POST -d peer=9D08E9A4-25E1-41ED-876A-737F32840B83

Once the curl command is run you will see messages like below

<?xml version="1.0" encoding="UTF-8"?>
<!--This is to override browser formatting; see server.conf[httpServer] to disable. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .-->
<?xml-stylesheet type="text/xml" href="/static/atom.xsl"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:s="http://dev.splunk.com/ns/rest" xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">
  <title>clustermasterbuckets</title>
  <id>https://CM:8089/services/cluster/master/buckets</id>
  <updated>2017-10-18T22:41:03-07:00</updated>
  <generator build="e21ee54bc796" version="6.6.3"/>
0 Karma

dxu_splunk
Splunk Employee
Splunk Employee

In 6.0.x:

You can remove the bucket by hitting the bucket's remove_all endpoint on the master.

curl -k -u USER:LOGIN (https://) master_uri:mgmt_port/services/cluster/master/buckets/BUCKET_ID/remove_all -X POST

Otherwise if you're on 5.0.x, I'd freeze the bucket and it should leave you alone.

curl -k -u USER:LOGIN (https://) master_uri:mgmt_port/services/cluster/master/buckets/BUCKET_ID/freeze -X POST

UPDATE - there is now also remove one copy of a bucket:

curl -k -u admin:changeme "https://MASTER:MGMT/services/cluster/master/buckets/main~1490~D4A07A5D-3C3C-4D36-BD70-D610B432466F/remove_from_peer" -d peer=BBBBBBBB-BBBB-BBBB-BBBB-BBBBBBBBBBB

View solution in original post

uuppuluri_splun
Splunk Employee
Splunk Employee

Also in some cases logging directly on the node itself and doing this seem to work

curl -k -u USER:LOGIN --request DELETE "https://localhost:mgmt_port/services/cluster/slave/buckets/BUCKET_ID" -d bucket_id='BUCKET_ID'

0 Karma

gfuente
Motivator

Thanks

How to delete just one bucket?

If I only include one ID it complains:

<?xml version="1.0" encoding="UTF-8"?>
<response>
  <messages>
    <msg type="ERROR">In handler 'indexes': bucket_ids parameter must be a comma
-separated list of bucket ID's.</msg>
  </messages>
</response>

EDIT: I´m using:

curl -k -u admin:password https://ip:8089/services/data/index
es/indexname/freeze-buckets -d bucket_ids=155_D6589969-94EE-4826-A4E6-96DFF5D3F -X POST

EDIT 2: It seems that there is something wrong with the bucket id format needed by this command
Thanks a lot!

EDIT 3: Hello again, I found, that you only need to pass the first part of the bucket ID, for example 155, then it works

SOLVED! Thanks a lot!!!

gfuente
Motivator

Hello dxu_splunk

the command for the 5.0.x doesn´t seems to work. Im triying to use it on a 5.0.2 and it doesn´t work, saying:

<?xml version="1.0" encoding="UTF-8"?>
<response>
  <messages>
    <msg type="ERROR">In handler 'clustermasterbuckets': Invalid custom action f
or this internal handler (handler: clustermasterbuckets, custom action: freeze,
eai action: edit).</msg>
  </messages>
</response>

Can you check it and let me know the correct endpoint? Thanks

0 Karma

dxu_splunk
Splunk Employee
Splunk Employee

hmm ur right.

we can hit the peers to freeze the bucket instead (which will delete it from the peer, and freeze it on the master so that other copies stay frozen). the command for that is

curl -k -u USER:LOGIN https://peer_uri:mgmt/services/data/indexes/INDEX/freeze-buckets -d bucket_ids=46_11115C7A-E2F0-4225-A740-4ED6BD2D9CE5 -X POST

where bucket_ids is a comma separated list of buckets in the index - that go by the form of BUCKET#_GUID. in the example i gave, im deleting bucket 46_1115... from index INDEX.

1StopBloke
Explorer

Great, that seems to have done the trick. Thanks.
It confused me for a little bit as it takes time to run the action after you issue the command. The nouns are switched around in your example btw, this is the path I ended up using /services/cluster/master/buckets/_internal~148~307D1B57-3D07-45F3-A0FC-A6BB94644886/remove_all
Is there a command reference for that? The API guide only shows GET commands for this path: http://docs.splunk.com/Documentation/Splunk/6.0.3/RESTAPI/RESTcluster

.conf21 Now Fully Virtual!
Register for FREE Today!

We've made .conf21 totally virtual and totally FREE! Our completely online experience will run from 10/19 through 10/20 with some additional events, too!