<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Slow running custom search command in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Slow-running-custom-search-command/m-p/31521#M6508</link>
    <description>&lt;P&gt;can you post your &lt;CODE&gt;commands.conf&lt;/CODE&gt; entry as well?  Specifically you could see different performance with streaming vs not-streaming...&lt;/P&gt;</description>
    <pubDate>Tue, 17 Aug 2010 21:40:53 GMT</pubDate>
    <dc:creator>Lowell</dc:creator>
    <dc:date>2010-08-17T21:40:53Z</dc:date>
    <item>
      <title>Slow running custom search command</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Slow-running-custom-search-command/m-p/31519#M6506</link>
      <description>&lt;P&gt;I have a requirement to provide histograms of performance through Splunk.  Essentially we have a field (for example Page_Load_Time), and we need to find out how may entries for that field (on a particular search) fall into certain fixed categories - e.g. &amp;lt;200ms,200ms-2s etc&lt;/P&gt;

&lt;P&gt;To achieve this I've written a custom search command - splitbins&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;import splunk.Intersplunk
import sys

def sortValueToBin(fieldValue,listOfBins):

    binNumber = 1

    for binRoof in listOfBins:

        if fieldValue &amp;lt; float(binRoof):

            return "Bin-" + str(binNumber)

        else:

            binNumber +=1

    return "Bin-" + str(binNumber)


fieldToSplit = sys.argv[1]
listOfBins = sys.argv[2:]


eventsDict,dummyResults,dummySettings = splunk.Intersplunk.getOrganizedResults()

for event in eventsDict:

    # Check its a number we're trying to split on, otherwise skip the event

    try:

        fieldValue = float(event[fieldToSplit])

    except:

        continue

    event["Bin_Number"] = sortValueToBin(fieldValue,listOfBins)

splunk.Intersplunk.outputResults(eventsDict)
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;This is then being run through a search command like this:&lt;/P&gt;

&lt;P&gt;index="some_indexname" host="some_hostname" some_field="some_otherterm" | splitbins Page_Load_Time 200 2000 4000 8000 | chart count(Bin_Number) over some_other_field by Bin_Number | fields some_other_field Bin-1 Bin-2 bin-3 Bin-4 Bin-5&lt;/P&gt;

&lt;P&gt;...and it works fine if the events passed by the initial search terms is in the thousands.  However, as the number of events grow - two problems occur:&lt;/P&gt;

&lt;OL&gt;
&lt;LI&gt;Results stop being produced once the total number of events processed goes over 50,000&lt;/LI&gt;
&lt;LI&gt;The search is S-L-O-W.  For example 20 minutes for 250K events.  If I write the splitbins code to take a direct dictionary with some random results, it can process hundreds of thousands of events in less than a second: so there is nothing innately slow about the splitbins code.&lt;/LI&gt;
&lt;/OL&gt;

&lt;P&gt;I've tried to adjust everything in limit.conf that is set to 50000 to be a higher number with no change to the events processed.  I've tried adding in a fields pipe after the initial search string to try and slim the search objects down earlier, and it is still slow.&lt;/P&gt;

&lt;P&gt;Running v4.1.2 on Windows, with plenty of spare CPU and memory.&lt;/P&gt;

&lt;P&gt;Any ideas?&lt;/P&gt;

&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Tue, 17 Aug 2010 19:39:14 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Slow-running-custom-search-command/m-p/31519#M6506</guid>
      <dc:creator>sumnerm</dc:creator>
      <dc:date>2010-08-17T19:39:14Z</dc:date>
    </item>
    <item>
      <title>Re: Slow running custom search command</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Slow-running-custom-search-command/m-p/31520#M6507</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;Does the bucket command do what you need?&lt;/P&gt;

&lt;P&gt;&lt;A href="http://www.splunk.com/base/Documentation/4.1.4/SearchReference/Bucket" rel="nofollow"&gt;http://www.splunk.com/base/Documentation/4.1.4/SearchReference/Bucket&lt;/A&gt;&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;bucket field span=200 
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;If you need to aggregate some of those buckets into bigger ones then you could eval them together?&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| stats ... | eval my_big_bucket= bucket_1 + bucket_2
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 17 Aug 2010 20:55:48 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Slow-running-custom-search-command/m-p/31520#M6507</guid>
      <dc:creator>dart</dc:creator>
      <dc:date>2010-08-17T20:55:48Z</dc:date>
    </item>
    <item>
      <title>Re: Slow running custom search command</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Slow-running-custom-search-command/m-p/31521#M6508</link>
      <description>&lt;P&gt;can you post your &lt;CODE&gt;commands.conf&lt;/CODE&gt; entry as well?  Specifically you could see different performance with streaming vs not-streaming...&lt;/P&gt;</description>
      <pubDate>Tue, 17 Aug 2010 21:40:53 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Slow-running-custom-search-command/m-p/31521#M6508</guid>
      <dc:creator>Lowell</dc:creator>
      <dc:date>2010-08-17T21:40:53Z</dc:date>
    </item>
    <item>
      <title>Re: Slow running custom search command</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Slow-running-custom-search-command/m-p/31522#M6509</link>
      <description>&lt;P&gt;Note that the bucket command (which is aliased as &lt;CODE&gt;bin&lt;/CODE&gt;) probably does something like what you want:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index="some_indexname" host="some_hostname" some_field="some_otherterm" 
| bucket Page_Load_Time as Bin_Number span=1.6log2 
| chart count by Page_Load_Time
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;I would suggest that this custom search command is basically entirely unnecessary. Even if &lt;CODE&gt;bucket&lt;/CODE&gt; doesn't give you the exact ranges you want, you can get the same effect with either a &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;rangemap field=Page_Load_Time bin1=0-199 bin2=200-1999 bin3=2000-3999 bin4=4000-7999 default=bin5 | rename range=Bin_Number 
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;command or a line of &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;eval Bin_Number=case(Page_Load_Time&amp;lt;200,"bin1",Page_Load_Time&amp;lt;2000,"bin2",...) 
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;instead.&lt;/P&gt;

&lt;P&gt;For reference, the 50k results limit would be avoided by making the search command "streaming" (see &lt;CODE&gt;commands.conf.spec&lt;/CODE&gt;)&lt;/P&gt;</description>
      <pubDate>Tue, 17 Aug 2010 21:42:17 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Slow-running-custom-search-command/m-p/31522#M6509</guid>
      <dc:creator>gkanapathy</dc:creator>
      <dc:date>2010-08-17T21:42:17Z</dc:date>
    </item>
    <item>
      <title>Re: Slow running custom search command</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Slow-running-custom-search-command/m-p/31523#M6510</link>
      <description>&lt;P&gt;Thanks - great advice.&lt;/P&gt;

&lt;P&gt;It certainly solves the speed and limits issue - but I seem to have problems getting the eval functions to work with the lesser used buckets (beyond the first 9 + OTHER).&lt;/P&gt;

&lt;P&gt;I shall keep reading and fiddling for a bit first before I come back for help.&lt;/P&gt;</description>
      <pubDate>Tue, 17 Aug 2010 21:55:32 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Slow-running-custom-search-command/m-p/31523#M6510</guid>
      <dc:creator>sumnerm</dc:creator>
      <dc:date>2010-08-17T21:55:32Z</dc:date>
    </item>
    <item>
      <title>Re: Slow running custom search command</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Slow-running-custom-search-command/m-p/31524#M6511</link>
      <description>&lt;P&gt;Thanks for the advice.  &lt;/P&gt;

&lt;P&gt;The range function worked but was very slow - however using the case statement in the eval not only works but is also fast.&lt;/P&gt;

&lt;P&gt;Excellent!&lt;/P&gt;</description>
      <pubDate>Tue, 17 Aug 2010 22:31:22 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Slow-running-custom-search-command/m-p/31524#M6511</guid>
      <dc:creator>sumnerm</dc:creator>
      <dc:date>2010-08-17T22:31:22Z</dc:date>
    </item>
    <item>
      <title>Re: Slow running custom search command</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Slow-running-custom-search-command/m-p/31525#M6512</link>
      <description>&lt;P&gt;rangemap is a default external search command, so does the same as yours, while eval runs in-process in Splunk. This indicates to me that either your Splunk config is launching too many external search processes, or that something in your OS/system is limiting communication or context-switching between splunkd and the external process.&lt;/P&gt;</description>
      <pubDate>Tue, 17 Aug 2010 23:03:31 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Slow-running-custom-search-command/m-p/31525#M6512</guid>
      <dc:creator>gkanapathy</dc:creator>
      <dc:date>2010-08-17T23:03:31Z</dc:date>
    </item>
    <item>
      <title>Re: Slow running custom search command</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Slow-running-custom-search-command/m-p/31526#M6513</link>
      <description>&lt;P&gt;Thanks.  My custom search command ran a lot faster once the streaming was set to true - though haven't raced against rangemap yet or against eval.  Happy though that eval &amp;amp; case is the way to go.  If I get some time later in the week I'll race them off.&lt;/P&gt;

&lt;P&gt;No idea on the context-switching constraints, other than just to blame Windows.  The hardware is 64-bit eight processor cores, 16GB of memory - running very little activity, virtually no monitor traffic etc, no software other than Splunk, and no other searches.  Are there some performance risks with splunk on non *nix platforms?&lt;/P&gt;</description>
      <pubDate>Tue, 17 Aug 2010 23:14:36 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Slow-running-custom-search-command/m-p/31526#M6513</guid>
      <dc:creator>sumnerm</dc:creator>
      <dc:date>2010-08-17T23:14:36Z</dc:date>
    </item>
    <item>
      <title>Re: Slow running custom search command</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Slow-running-custom-search-command/m-p/31527#M6514</link>
      <description>&lt;P&gt;Raced off the three methods over 250K events: eval/case - 28s, splitbins (with streaming) - 7m 55s, splitbins (without streaming) - 29m 30s (and only 50K events), rangemap - seemingly forever (got bored waiting - may be some other issue).&lt;/P&gt;

&lt;P&gt;Will retire my funcion and use eval/case.&lt;/P&gt;</description>
      <pubDate>Wed, 18 Aug 2010 01:50:05 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Slow-running-custom-search-command/m-p/31527#M6514</guid>
      <dc:creator>sumnerm</dc:creator>
      <dc:date>2010-08-18T01:50:05Z</dc:date>
    </item>
  </channel>
</rss>

