Solved: REST rebalance_primaries

splunk_zen · ‎01-26-2016

We're hitting max capacity in one of our (smaller file system) indexers
and attempted a bucket rebalance with no luck

curl -k -u admin --request POST https://localhost:8089/services/cluster/master/control/control/rebalance_primaries
Enter host password for user 'admin':
<?xml version="1.0" encoding="UTF-8"?>
<!--This is to override browser formatting; see server.conf[httpServer] to     disable. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .-->
<?xml-stylesheet type="text/xml" href="/static/atom.xsl"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:s="http://dev.splunk.com/ns/rest"     xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">
  <title>clustermastercontrol</title>
  <id>https://localhost:8089/services/cluster/master/control</id>
  <updated>2016-01-26T13:30:23+00:00</updated>
  <generator build="f3e41e4b37b2" version="6.3.1"/>
  <author>
    <name>Splunk</name>
  </author>
  <link href="/services/cluster/master/control/_acl" rel="_acl"/>
  <opensearch:totalResults>0</opensearch:totalResults>
  <opensearch:itemsPerPage>30</opensearch:itemsPerPage>
  <opensearch:startIndex>0</opensearch:startIndex>
  <s:messages/>
</feed>

If I try to do it through the web interface
/services/cluster/master/control/control/rebalance_primaries

instead I get

In handler 'clustermastercontrol': Invalid action for this internal handler (handler: clustermastercontrol, supported: list, wanted: list).

Going back a bit on the URL
/services/cluster/master/
the endpoints are

control
list - _acl -

If I attempt
services/cluster/master/control/_acl/rebalance_primaries

<msg type="ERROR">
In handler 'clustermastercontrol': handler=clustermastercontrol method expected=POST does not match actual=GET     customaction=_acl
</msg>

How should I trigger the action ?
Our indexer is still at max capacity whereas we have 20% free in others.

jchampagne_splu · ‎01-26-2016

@splunk_zen, is your end goal to even out the free disk space on the indexers? Unfortunately this command will not do what you're looking for. Rebalancing the primaries simply reassigns the primary (searchable) copies of buckets across your cluster.
Lets assume you've got a 2-node cluster with a replication and search factor of 2. Ideally, each peer node would have 50% of the primaries because they each have 100% of the data. There may be instances where the primary copies become imbalanced, resulting in poor search performance. In this situation, you'd use the rebalance primaries command to even out the search load. However, the actual data sitting on disk would be unaffected.
For more information, see this link: http://docs.splunk.com/Documentation/Splunk/latest/Indexer/Rebalancethecluster

To consume additional disk space on a subset of peer nodes, you have a couple of options:

Utilize the weighted load balancing feature of Indexer Discovery that was introduced in Splunk 6.3
This feature will take free disk space into account and direct your Universal Forwarders to prefer indexers with more free disk space
http://docs.splunk.com/Documentation/Splunk/6.3.1/Indexer/indexerdiscovery
Manually remove the indexer(s) with low disk space from your outputs.conf file for a temporary period. You can re-add the indexer when disk space evens out and data begins to roll off

View solution in original post

jchampagne_splu · ‎01-26-2016

@splunk_zen, is your end goal to even out the free disk space on the indexers? Unfortunately this command will not do what you're looking for. Rebalancing the primaries simply reassigns the primary (searchable) copies of buckets across your cluster.
Lets assume you've got a 2-node cluster with a replication and search factor of 2. Ideally, each peer node would have 50% of the primaries because they each have 100% of the data. There may be instances where the primary copies become imbalanced, resulting in poor search performance. In this situation, you'd use the rebalance primaries command to even out the search load. However, the actual data sitting on disk would be unaffected.
For more information, see this link: http://docs.splunk.com/Documentation/Splunk/latest/Indexer/Rebalancethecluster

To consume additional disk space on a subset of peer nodes, you have a couple of options:

Utilize the weighted load balancing feature of Indexer Discovery that was introduced in Splunk 6.3
This feature will take free disk space into account and direct your Universal Forwarders to prefer indexers with more free disk space
http://docs.splunk.com/Documentation/Splunk/6.3.1/Indexer/indexerdiscovery
Manually remove the indexer(s) with low disk space from your outputs.conf file for a temporary period. You can re-add the indexer when disk space evens out and data begins to roll off

splunk_zen · ‎01-27-2016

Insightful information though it doesn't explain the extreme unbalance we're seeing using the default round robin forwarding the UFs use.
Some % unbalance is to be expected, but I'm clueless on why even accounting for the disks difference we're seeing 300GB bucket usage difference between indexers.

jchampagne_splu · ‎01-27-2016

This really comes down to how you're ingesting data into Splunk. Are all of your data sources coming from Universal Forwarders or are you sending direct TCP/UDP streams to your indexers? Are you indexing any files locally on the indexers?

If all data is coming in via Universal Forwarders and not one of the other two methods I mentioned above, then the next step is to check the data that is becoming imbalanced. Is it from a particular host or set of hosts or a particular source? You'll want to check the outputs.conf file on the hosts that are becoming imbalanced to ensure that all of your indexers are listed in your target tcp out group.

Is your Universal Forwarder tailing extremely large log files or listening to a TCP/UDP stream? If so, this could also be creating a problem because by default the UF will not rotate to the next indexer until it reaches end of file or end of stream. In this case, there is a setting that you can enable in outputs.conf called forceTimebasedAutoLB. If you set this to True, it will force the UF to switch indexers every 30 seconds (default) or whatever time you specify with the autoLBFrequency parameter.

http://docs.splunk.com/Documentation/Splunk/latest/Admin/Outputsconf

splunk_zen · ‎01-27-2016

We only send TCP/UDP streams to HFs, and only then to the Indexer cluster.
We wouldn't index any files locally other than the default Splunk sources,
we try to keep the Indexer cluster lean as we have around 2000 UFs.

Again, very useful info jchampagne,
will take a look into forceTimebasedAutoLB too though I'd expect even with very verbose files, eventually, as days and months go by the indexers would even out.

jchampagne_splu · ‎01-27-2016

Take a look at the forceTImebasedAutoLB setting and see how that helps you after a week or so.

In case you didn't know, the easiest way to check is to run a search of the last week and look at the splunk_server field to see what the distribution of events across indexers is.

If you're still seeing distribution issues, you can adjust autoLBFrequency in outputs.conf down to something as low as 5sec. I generally only need do this when dealing with very large, bulk-loaded files like proxy logs delivered via FTP.

If you've exhausted those options and still seeing issues, let me know.

splunk_zen · ‎01-28-2016

Thanks, considering we index around 2TB/day running index=* across all week isn't an option 🙂
Nice to come across parameters I wasn't familiar with.

So you have bespoke outputs.conf for sets of server if I got that right,
we currently push the same outputs.conf for every one of our UFs

jchampagne_splu · ‎01-28-2016

@splunk_zen, you wouldn't have to search all data, you could search a subset of data that you suspect wasn't well balanced previously.

Regarding the outputs.conf settings, yes, there are all kinds of parameters that can be adjusted in outputs.conf. See the spec doc that I mentioned above.

splunk_zen · ‎01-29-2016

Yeah, got that, that's what I did - picked some minutes time windows but couldn't detect any unbalance.

Know outputs.conf allows setting those parameters, it's just that we're using a generic approach to every UF.
Thanks again

REST rebalance_primaries

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

ATTENTION: We’re Moving! (AGAIN!)

Deep Dive: Optimizing Telemetry Pipelines in Splunk Observability Cloud

Announcing Modern Navigation: A New Era of Splunk User Experience

Join the Conversation