Hi,
I've been getting a few errors like this recently as reported by various nodes (shows up in master server messages):
Search peer s2splunk02 has the following message: Failed to make bucket = _internal~148~307D1B57-3D07-45F3-A0FC-A6BB94644886 searchable, retry count = 106.
I've also been failing to reach my search factor (2) for this index in our cluster, it always shows that there is 1 bucket that is not replicated.
I tried repairing the bucket when our node was offline (and the cluster is in maintenance mode) but I received this error:
[splunk@s2splunk02 ~/var/lib/splunk/_internaldb/db]$ splunk fsck repair --one-bucket --bucket-path=/opt/splunk/var/lib/splunk/_internaldb/db/rb_1397557401_1397547003_148_307D1B57-3D07-45F3-A0FC-A6BB94644886
Not loading indexes.conf; will proceed with all defaults
Operating on: idx= bucket='/opt/splunk/var/lib/splunk/_internaldb/db/rb_1397557401_1397547003_148_307D1B57-3D07-45F3-A0FC-A6BB94644886'
Error reading compressed journal while streaming: gzip data truncated, provider=/opt/splunk/var/lib/splunk/_internaldb/db/rb_1397557401_1397547003_148_307D1B57-3D07-45F3-A0FC-A6BB94644886/rawdata/journal.gz
Repair (entire bucket) idx= bucket='/opt/splunk/var/lib/splunk/_internaldb/db/rb_1397557401_1397547003_148_307D1B57-3D07-45F3-A0FC-A6BB94644886' failed: (entire bucket) Rebuild for bkt='/opt/splunk/var/lib/splunk/_internaldb/db/rb_1397557401_1397547003_148_307D1B57-3D07-45F3-A0FC-A6BB94644886' failed: Error reading compressed journal while streaming: gzip data truncated, provider=/opt/splunk/var/lib/splunk/_internaldb/db/rb_1397557401_1397547003_148_307D1B57-3D07-45F3-A0FC-A6BB94644886/rawdata/journal.gz
At this stage we're still developing our cluster and testing so I'm not too worried about the data, however I've no idea how to get rid of the bucket. I tried to stop splunk on all our peers and delete the bucket but it just re-appears. So how can I delete a bucket in a cluster?
In this thread they found a solution of offline-onlining the index, but I can't work out how to do that for a full cluster. Doing it on an individual peer complains that this is not a valid command for a cluster index node. Any help?
In 6.0.x:
You can remove the bucket by hitting the bucket's remove_all endpoint on the master.
curl -k -u USER:LOGIN (https://) master_uri:mgmt_port/services/cluster/master/buckets/BUCKET_ID/remove_all -X POST
Otherwise if you're on 5.0.x, I'd freeze the bucket and it should leave you alone.
curl -k -u USER:LOGIN (https://) master_uri:mgmt_port/services/cluster/master/buckets/BUCKET_ID/freeze -X POST
UPDATE - there is now also remove one copy of a bucket:
curl -k -u admin:changeme "https://MASTER:MGMT/services/cluster/master/buckets/main~1490~D4A07A5D-3C3C-4D36-BD70-D610B432466F/remove_from_peer" -d peer=BBBBBBBB-BBBB-BBBB-BBBB-BBBBBBBBBBB
Just to Sum up :
REMOVE FROM ONE Peer:
curl -k -u admin:changeme "https://CM:8089/services/cluster/master/buckets/_audit~5~A5F789C3-22C0-407C-9B6B-10C8705F1C3D/remove..." -d peer=GUID_FOR_PEER
REMOVE FROM ALL:
curl -k -u admin:changeme -X POST "https://CM:80890/services/cluster/master/buckets/ _audit~5~A5F789C3-22C0-407C-9B6B-10C8705F1C3D/remove_all"
You can use curl command to delete bucket from a peer.
Say you have Bucket rest endpoint showing bucket _audit~5~A5F789C3-22C0-407C-9B6B-10C8705F1C3D on two peer like below and you need to delete it from peer with GUD>9D08E9A4-25E1-41ED-876A-737F32840B83
You can use curl command like
curl -k -u admin: https://CM_URL:8089/services/cluster/master/buckets/_audit~5~A5F789C3-22C0-407C-9B6B-10C8705F1C3D/re... -X POST -d peer=9D08E9A4-25E1-41ED-876A-737F32840B83
Once the curl command is run you will see messages like below
<?xml version="1.0" encoding="UTF-8"?>
<!--This is to override browser formatting; see server.conf[httpServer] to disable
<?xml-stylesheet type="text/xml" href="/static/atom.xsl"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:s="http://dev.splunk.com/ns/rest" xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">
<title>clustermasterbuckets</title>
<id>https://CM:8089/services/cluster/master/buckets</id>
<updated>2017-10-18T22:41:03-07:00</updated>
<generator build="e21ee54bc796" version="6.6.3"/>
In 6.0.x:
You can remove the bucket by hitting the bucket's remove_all endpoint on the master.
curl -k -u USER:LOGIN (https://) master_uri:mgmt_port/services/cluster/master/buckets/BUCKET_ID/remove_all -X POST
Otherwise if you're on 5.0.x, I'd freeze the bucket and it should leave you alone.
curl -k -u USER:LOGIN (https://) master_uri:mgmt_port/services/cluster/master/buckets/BUCKET_ID/freeze -X POST
UPDATE - there is now also remove one copy of a bucket:
curl -k -u admin:changeme "https://MASTER:MGMT/services/cluster/master/buckets/main~1490~D4A07A5D-3C3C-4D36-BD70-D610B432466F/remove_from_peer" -d peer=BBBBBBBB-BBBB-BBBB-BBBB-BBBBBBBBBBB
Also in some cases logging directly on the node itself and doing this seem to work
curl -k -u USER:LOGIN --request DELETE "https://localhost:mgmt_port/services/cluster/slave/buckets/BUCKET_ID" -d bucket_id='BUCKET_ID'
Thanks
How to delete just one bucket?
If I only include one ID it complains:
<?xml version="1.0" encoding="UTF-8"?>
<response>
<messages>
<msg type="ERROR">In handler 'indexes': bucket_ids parameter must be a comma
-separated list of bucket ID's.</msg>
</messages>
</response>
EDIT: I´m using:
curl -k -u admin:password https://ip:8089/services/data/index
es/indexname/freeze-buckets -d bucket_ids=155_D6589969-94EE-4826-A4E6-96DFF5D3F -X POST
EDIT 2: It seems that there is something wrong with the bucket id format needed by this command
Thanks a lot!
EDIT 3: Hello again, I found, that you only need to pass the first part of the bucket ID, for example 155, then it works
SOLVED! Thanks a lot!!!
Hello dxu_splunk
the command for the 5.0.x doesn´t seems to work. Im triying to use it on a 5.0.2 and it doesn´t work, saying:
<?xml version="1.0" encoding="UTF-8"?>
<response>
<messages>
<msg type="ERROR">In handler 'clustermasterbuckets': Invalid custom action f
or this internal handler (handler: clustermasterbuckets, custom action: freeze,
eai action: edit).</msg>
</messages>
</response>
Can you check it and let me know the correct endpoint? Thanks
hmm ur right.
we can hit the peers to freeze the bucket instead (which will delete it from the peer, and freeze it on the master so that other copies stay frozen). the command for that is
curl -k -u USER:LOGIN https://peer_uri:mgmt/services/data/indexes/INDEX/freeze-buckets -d bucket_ids=46_11115C7A-E2F0-4225-A740-4ED6BD2D9CE5 -X POST
where bucket_ids is a comma separated list of buckets in the index - that go by the form of BUCKET#_GUID. in the example i gave, im deleting bucket 46_1115... from index INDEX.
Great, that seems to have done the trick. Thanks.
It confused me for a little bit as it takes time to run the action after you issue the command. The nouns are switched around in your example btw, this is the path I ended up using /services/cluster/master/buckets/_internal~148~307D1B57-3D07-45F3-A0FC-A6BB94644886/remove_all
Is there a command reference for that? The API guide only shows GET commands for this path: http://docs.splunk.com/Documentation/Splunk/6.0.3/RESTAPI/RESTcluster