I have a requirement to provide histograms of performance through Splunk. Essentially we have a field (for example Page_Load_Time), and we need to find out how may entries for that field (on a particular search) fall into certain fixed categories - e.g. <200ms,200ms-2s etc
To achieve this I've written a custom search command - splitbins
import splunk.Intersplunk import sys def sortValueToBin(fieldValue,listOfBins): binNumber = 1 for binRoof in listOfBins: if fieldValue < float(binRoof): return "Bin-" + str(binNumber) else: binNumber +=1 return "Bin-" + str(binNumber) fieldToSplit = sys.argv listOfBins = sys.argv[2:] eventsDict,dummyResults,dummySettings = splunk.Intersplunk.getOrganizedResults() for event in eventsDict: # Check its a number we're trying to split on, otherwise skip the event try: fieldValue = float(event[fieldToSplit]) except: continue event["Bin_Number"] = sortValueToBin(fieldValue,listOfBins) splunk.Intersplunk.outputResults(eventsDict)
This is then being run through a search command like this:
index="some_indexname" host="some_hostname" some_field="some_otherterm" | splitbins Page_Load_Time 200 2000 4000 8000 | chart count(Bin_Number) over some_other_field by Bin_Number | fields some_other_field Bin-1 Bin-2 bin-3 Bin-4 Bin-5
...and it works fine if the events passed by the initial search terms is in the thousands. However, as the number of events grow - two problems occur:
I've tried to adjust everything in limit.conf that is set to 50000 to be a higher number with no change to the events processed. I've tried adding in a fields pipe after the initial search string to try and slim the search objects down earlier, and it is still slow.
Running v4.1.2 on Windows, with plenty of spare CPU and memory.
Does the bucket command do what you need?
bucket field span=200
If you need to aggregate some of those buckets into bigger ones then you could eval them together?
| stats ... | eval my_big_bucket= bucket_1 + bucket_2
Thanks - great advice.
It certainly solves the speed and limits issue - but I seem to have problems getting the eval functions to work with the lesser used buckets (beyond the first 9 + OTHER).
I shall keep reading and fiddling for a bit first before I come back for help.
can you post your
commands.conf entry as well? Specifically you could see different performance with streaming vs not-streaming...
Note that the bucket command (which is aliased as
bin) probably does something like what you want:
index="some_indexname" host="some_hostname" some_field="some_otherterm" | bucket Page_Load_Time as Bin_Number span=1.6log2 | chart count by Page_Load_Time
I would suggest that this custom search command is basically entirely unnecessary. Even if
bucket doesn't give you the exact ranges you want, you can get the same effect with either a
rangemap field=Page_Load_Time bin1=0-199 bin2=200-1999 bin3=2000-3999 bin4=4000-7999 default=bin5 | rename range=Bin_Number
command or a line of
For reference, the 50k results limit would be avoided by making the search command "streaming" (see
Thanks for the advice.
The range function worked but was very slow - however using the case statement in the eval not only works but is also fast.
rangemap is a default external search command, so does the same as yours, while eval runs in-process in Splunk. This indicates to me that either your Splunk config is launching too many external search processes, or that something in your OS/system is limiting communication or context-switching between splunkd and the external process.
Thanks. My custom search command ran a lot faster once the streaming was set to true - though haven't raced against rangemap yet or against eval. Happy though that eval & case is the way to go. If I get some time later in the week I'll race them off.
No idea on the context-switching constraints, other than just to blame Windows. The hardware is 64-bit eight processor cores, 16GB of memory - running very little activity, virtually no monitor traffic etc, no software other than Splunk, and no other searches. Are there some performance risks with splunk on non *nix platforms?
Raced off the three methods over 250K events: eval/case - 28s, splitbins (with streaming) - 7m 55s, splitbins (without streaming) - 29m 30s (and only 50K events), rangemap - seemingly forever (got bored waiting - may be some other issue).
Will retire my funcion and use eval/case.