Hello
I have 3 searchmanagers like so (the actual queries are longer)
{% searchmanager id="s1" search="index=abc | top name" %}
{% postprocessmanager id ="s2" managerid="ms" search=" | join name [index=abc | stats count(x) by name ]" %}
{% searchmanager id="s3" search="index=abc | top name | join name [index=abc | stats count(x) by name" %}
You can see how s3 does the same thing s1 and s2 combined.
I have tables that show each search result, and they end in the following order: s1,s3,s2
How can post-processing not save me time here?
"Because join" ... Splunk's join command is useful and sometimes necessary. But for many use cases -- particularly for people coming from SQL background -- they are the worst way to solve the problem.
I realize these are just examples and you have more complex real-life searches but let's consider your "s3" search as an example:
index=abc | top name | join name [index=abc | stats count(x) by name ]
In this search, you have asked Splunk to run a dense search - index=abc
- twice. Then you've asked it to join those results together. Before "s3" can return any results, it has to dispatch both of those searches, gather the events matching both, and then perform the join. It is possible (but unconfirmed) that Splunk can run those two searches in parallel. (But it just as well may not)
However, in a postprocess .. the first search definitely absolutely has to finish before the second can begin. So you've asked Splunk to run a dense search, and then wait for it to finish before starting another dense search to join them together. It is just not going to end well.
Almost any use of join
for the purpose of adding on stats
data can be accomplished much more efficiently through use of the eventstats
command, with the occasional eval
here and there. I would suggest rewriting your search to be less SQL-like and take advantage of the different tools that Splunk offers to avoid relying on join
except where you definitely need it.
"Because join" ... Splunk's join command is useful and sometimes necessary. But for many use cases -- particularly for people coming from SQL background -- they are the worst way to solve the problem.
I realize these are just examples and you have more complex real-life searches but let's consider your "s3" search as an example:
index=abc | top name | join name [index=abc | stats count(x) by name ]
In this search, you have asked Splunk to run a dense search - index=abc
- twice. Then you've asked it to join those results together. Before "s3" can return any results, it has to dispatch both of those searches, gather the events matching both, and then perform the join. It is possible (but unconfirmed) that Splunk can run those two searches in parallel. (But it just as well may not)
However, in a postprocess .. the first search definitely absolutely has to finish before the second can begin. So you've asked Splunk to run a dense search, and then wait for it to finish before starting another dense search to join them together. It is just not going to end well.
Almost any use of join
for the purpose of adding on stats
data can be accomplished much more efficiently through use of the eventstats
command, with the occasional eval
here and there. I would suggest rewriting your search to be less SQL-like and take advantage of the different tools that Splunk offers to avoid relying on join
except where you definitely need it.
That makes sense, thanks.
Postprocesses are primarily for code/query reuse (help maintain it efficienly). Performance is not gauranteed in terms of time, they save resources though. Also, postprocess should be doing more filtering/processing on existing data.