<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic indexing load balancing with [script] input in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/indexing-load-balancing-with-script-input/m-p/83661#M17399</link>
    <description>&lt;P&gt;Hello,&lt;/P&gt;

&lt;P&gt;We have set up a small splunk cluster, with 3 indexers getting data from universal forwarder, which is configured for output as&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[tcpout:default-autolb-group]
autoLBFrequency=40
server = pc-tdq-bst-04:9995, pc-tdq-bst-05:9995, pc-tdq-sfo-06:9995
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;as for input as&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[script:///opt/splunkforwarder/bin/scripts/pbeast_injector.sh &amp;lt;parameters&amp;gt;]
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;The script never stops, it gets data from an external online monitoring system.&lt;BR /&gt;
After having indexed many events in few days, we realized that majority of events were indexed by first indexer in the list, pc-tdq-bst-04. E.g. a typical query returns stats like this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;dispatch.stream.remote.pc-tdq-bst-04.cern.ch    220 -   68,031,902
dispatch.stream.remote.pc-tdq-bst-05.cern.ch    2   -   4,584
dispatch.stream.remote.pc-tdq-sfo-06.cern.ch    1   -   2,386
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;The indexers are almost identical and have sufficient disk space. Earlier they were used for indexing the files, and the load was randomly distributed, but behavior of the script input is quite different.&lt;/P&gt;

&lt;P&gt;Is there a way to enforce sort of round-robin balancing for the [script] input, given that the script is running permanently?&lt;/P&gt;

&lt;P&gt;Thanks&lt;BR /&gt;
Andrei&lt;/P&gt;</description>
    <pubDate>Wed, 03 Jul 2013 13:39:19 GMT</pubDate>
    <dc:creator>akazarov</dc:creator>
    <dc:date>2013-07-03T13:39:19Z</dc:date>
    <item>
      <title>indexing load balancing with [script] input</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/indexing-load-balancing-with-script-input/m-p/83661#M17399</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;

&lt;P&gt;We have set up a small splunk cluster, with 3 indexers getting data from universal forwarder, which is configured for output as&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[tcpout:default-autolb-group]
autoLBFrequency=40
server = pc-tdq-bst-04:9995, pc-tdq-bst-05:9995, pc-tdq-sfo-06:9995
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;as for input as&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[script:///opt/splunkforwarder/bin/scripts/pbeast_injector.sh &amp;lt;parameters&amp;gt;]
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;The script never stops, it gets data from an external online monitoring system.&lt;BR /&gt;
After having indexed many events in few days, we realized that majority of events were indexed by first indexer in the list, pc-tdq-bst-04. E.g. a typical query returns stats like this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;dispatch.stream.remote.pc-tdq-bst-04.cern.ch    220 -   68,031,902
dispatch.stream.remote.pc-tdq-bst-05.cern.ch    2   -   4,584
dispatch.stream.remote.pc-tdq-sfo-06.cern.ch    1   -   2,386
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;The indexers are almost identical and have sufficient disk space. Earlier they were used for indexing the files, and the load was randomly distributed, but behavior of the script input is quite different.&lt;/P&gt;

&lt;P&gt;Is there a way to enforce sort of round-robin balancing for the [script] input, given that the script is running permanently?&lt;/P&gt;

&lt;P&gt;Thanks&lt;BR /&gt;
Andrei&lt;/P&gt;</description>
      <pubDate>Wed, 03 Jul 2013 13:39:19 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/indexing-load-balancing-with-script-input/m-p/83661#M17399</guid>
      <dc:creator>akazarov</dc:creator>
      <dc:date>2013-07-03T13:39:19Z</dc:date>
    </item>
    <item>
      <title>Re: indexing load balancing with [script] input</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/indexing-load-balancing-with-script-input/m-p/83662#M17400</link>
      <description>&lt;P&gt;I believe that if your script injects EOF characters or null bytes into the output stream at appropriate points (e.g., between events) then the Splunk forwarder will allow a switch of that input to another indexer. &lt;/P&gt;</description>
      <pubDate>Wed, 03 Jul 2013 19:36:23 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/indexing-load-balancing-with-script-input/m-p/83662#M17400</guid>
      <dc:creator>gkanapathy</dc:creator>
      <dc:date>2013-07-03T19:36:23Z</dc:date>
    </item>
    <item>
      <title>Re: indexing load balancing with [script] input</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/indexing-load-balancing-with-script-input/m-p/83663#M17401</link>
      <description>&lt;P&gt;Great idea, thanks!&lt;/P&gt;

&lt;P&gt;However, even in present configuration, given that we have replication factor = 2, I expected that 1/2 of events would be coming from bst-04 node the rest 1/4 + 1/4 events from other 2 nodes, because of replication. Replication should work, even if I send all my data to one indexer, no?&lt;/P&gt;</description>
      <pubDate>Thu, 04 Jul 2013 08:08:34 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/indexing-load-balancing-with-script-input/m-p/83663#M17401</guid>
      <dc:creator>akazarov</dc:creator>
      <dc:date>2013-07-04T08:08:34Z</dc:date>
    </item>
    <item>
      <title>Re: indexing load balancing with [script] input</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/indexing-load-balancing-with-script-input/m-p/83664#M17402</link>
      <description>&lt;P&gt;Replication currently does not balance requests across buckets unless there is some kind of failure. Until that happens, the primary indexer for the bucket of data containing the event remains the one that first indexed it. Even when/if replication gains this feature, you may have too few events for that level of granularity to be visible, as replication occurs in bucket-sized increments, and buckets can contain up to a few hundred million events.&lt;/P&gt;</description>
      <pubDate>Fri, 05 Jul 2013 04:27:14 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/indexing-load-balancing-with-script-input/m-p/83664#M17402</guid>
      <dc:creator>gkanapathy</dc:creator>
      <dc:date>2013-07-05T04:27:14Z</dc:date>
    </item>
    <item>
      <title>Re: indexing load balancing with [script] input</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/indexing-load-balancing-with-script-input/m-p/83665#M17403</link>
      <description>&lt;P&gt;It appeared that adding EOF means calling fclose(stdout) and opening it again, which is not doable at the rate of kHz. Note that there is no EOF "character".&lt;/P&gt;

&lt;P&gt;Adding 0 bytes between events did not help, splunk just recorded 0x00 bytes as part of raw data.&lt;/P&gt;

&lt;P&gt;I also tried EOT character (0x04) with no affect.&lt;/P&gt;</description>
      <pubDate>Fri, 05 Jul 2013 14:01:44 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/indexing-load-balancing-with-script-input/m-p/83665#M17403</guid>
      <dc:creator>akazarov</dc:creator>
      <dc:date>2013-07-05T14:01:44Z</dc:date>
    </item>
    <item>
      <title>Re: indexing load balancing with [script] input</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/indexing-load-balancing-with-script-input/m-p/83666#M17404</link>
      <description>&lt;P&gt;You shouldn't need to close and reopen after every event then. You only need to do it once every few seconds (every few thousand events, e.g., keep a counter of events and only do it when counter % 20000 == 0).&lt;/P&gt;</description>
      <pubDate>Fri, 05 Jul 2013 14:20:58 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/indexing-load-balancing-with-script-input/m-p/83666#M17404</guid>
      <dc:creator>gkanapathy</dc:creator>
      <dc:date>2013-07-05T14:20:58Z</dc:date>
    </item>
  </channel>
</rss>

