Splunk Search

Will streamstats bypass postprocess 50000 limit?

Builder

Would be great to know all the commands that will bypass the 50000 postProcess limit

Tags (1)
1 Solution

SplunkTrust
SplunkTrust

No it won't.

The limit you're talking about is the one where, if your base search is just returning raw event rows, Splunk only keeps 50,000 events in the search result. This means that later when you run your postprocess there can be misleading results.

(it actually looks like in 5.0 the limit is at 10,000, not 50,000 - http://docs.splunk.com/Documentation/Splunk/5.0.2/AdvancedDev/PostProcess )

I'm not sure what the canonical list of non-streaming transforming commands is, but the real answer is you should be using the stats command somewhere anyway to make the number of rows smaller, so as long as you're doing that, stats will also be your transforming command and there will be no truncation.

I'll admit that I am biased here, but the best and most detailed description of the various pitfalls, and the clearest explanation of the best practice here, is in the latest Sideview Utils app, under "Key Techniques > Using PostProcess > Introduction". http://sideviewapps.com/apps/sideview-utils

The official docs give only incomplete explanations and they recommend the peculiar pathof using the si* commands in your base search, which I really do not recommend.

Indeed, I ran some tests and it looks like in 5.0 the truncation happens at 10,000 rows. I ran this test and the table displays "10000".

<module name="Search">
  <param name="search"><![CDATA[
    * | head 500000 | streamstats count as rowIndex
  ]]></param>
  <module name="JobProgressIndicator"/>
  <module name="PostProcess">
    <param name="search"><![CDATA[
      | stats max(rowIndex)
    ]]></param>
    <module name="Table" />
  </module>
</module>

View solution in original post

SplunkTrust
SplunkTrust

No it won't.

The limit you're talking about is the one where, if your base search is just returning raw event rows, Splunk only keeps 50,000 events in the search result. This means that later when you run your postprocess there can be misleading results.

(it actually looks like in 5.0 the limit is at 10,000, not 50,000 - http://docs.splunk.com/Documentation/Splunk/5.0.2/AdvancedDev/PostProcess )

I'm not sure what the canonical list of non-streaming transforming commands is, but the real answer is you should be using the stats command somewhere anyway to make the number of rows smaller, so as long as you're doing that, stats will also be your transforming command and there will be no truncation.

I'll admit that I am biased here, but the best and most detailed description of the various pitfalls, and the clearest explanation of the best practice here, is in the latest Sideview Utils app, under "Key Techniques > Using PostProcess > Introduction". http://sideviewapps.com/apps/sideview-utils

The official docs give only incomplete explanations and they recommend the peculiar pathof using the si* commands in your base search, which I really do not recommend.

Indeed, I ran some tests and it looks like in 5.0 the truncation happens at 10,000 rows. I ran this test and the table displays "10000".

<module name="Search">
  <param name="search"><![CDATA[
    * | head 500000 | streamstats count as rowIndex
  ]]></param>
  <module name="JobProgressIndicator"/>
  <module name="PostProcess">
    <param name="search"><![CDATA[
      | stats max(rowIndex)
    ]]></param>
    <module name="Table" />
  </module>
</module>

View solution in original post

Builder

Thank you, your detailed explanations are always much appreciated. And sideiew utils = splunk magic!

0 Karma

Communicator

Hi,

I almost created my own post but I found this one which is close enough to my question.

I am using streamstats to find anomalies between my events. In my test case, I have 450K+ events. There is a process counter that starts at event=1 and runs until the end doing i++ basically. I'm trying to find instances where this counter messes up and identify the event # where it happens and then timechart it.

I ran into the 10K limit of streamstats and it looks like I can't get around it, even though I changed max_stream_window in /Splunk/etc/system/local/limits.conf and restarted Splunk.

[stats]
# for streamstats's maximum window size
max_stream_window = 100000

I even verified it using btool to see if was appearing in the config.

btool limits list
...
[spath]
extract_all = true
extraction_cutoff = 5000
[stats]
max_stream_window = 100000
maxresultrows = 50000
maxvalues = 0
maxvaluesize = 0
rdigest_k = 100
rdigest_maxnodes = 1
[subsearch]
maxout = 10000
maxtime = 60
ttl = 300
...

Can you suggest a different method than using streamstats? Every event in the 450K+ test case has this counter output.

0 Karma

SplunkTrust
SplunkTrust

Well, this is a little confusing, but you're actually talking about two pretty unrelated limits, that just happen to both have 10,000 as the default.

This existing question, was about the 10,000 row limit in the "postprocess" part of the Search API, that applies when the base search is a non-transformed search (aka a "raw event" search).

Whereas max_stream_window in limits.conf is a fairly obscure key that determines for the streamstats command, IF you are using it in its "windowed" mode, and possibly also only if you are using a "by" clause, when Splunk should start truncating the rows being factored into the calculation(s) for each "window"

As an example you can run yourself - streamstats when used without the "windowed" stuff, has no limitations on the number of rows. Run this search over a timerange where there are more than 70,000 rows and you'll see streamstats happily counts up to 70000

index=_internal | head 70000 | streamstats count sum(kb) as kb | stats max(count) max(kb)

In case you are using postprocess as well, there are quite a few answers and explanations about it's various pitfalls including the 10,000 event row limit. A relatively succinct albeit ancient one though is my answer here - https://answers.splunk.com/answers/8642/when-will-splunk-support-passing-more-than-10k-results-to-hi... and a longer much more detailed and complete one can be found within the "Sideview Utils" app under "Key Techniques - > Postprocess"

Communicator

Thanks for responding so quickly. I did the remove-commands-one-by-one to see where the issue was. I had a

sort X,Y 

right before my streamstats. That caused the limit to hit 10K. I'm not sure why, but I don't think I have time to figure it out.

0 Karma

SplunkTrust
SplunkTrust

Yep. Sort has a default limit of 10000. If you want it to sort the whole set you want to sneak in a "0". ie | sort 0 myFieldName