Splunk Search

Using join in postprocess takes longer than duplicating the search

ahmetcepoglu
Engager

Hello

I have 3 searchmanagers like so (the actual queries are longer)

{% searchmanager id="s1" search="index=abc | top name" %}
{% postprocessmanager id ="s2" managerid="ms" search=" | join name [index=abc | stats count(x) by name ]" %}
{% searchmanager id="s3" search="index=abc | top name | join name [index=abc | stats count(x) by name" %}

You can see how s3 does the same thing s1 and s2 combined.
I have tables that show each search result, and they end in the following order: s1,s3,s2

How can post-processing not save me time here?

Tags (2)
0 Karma
1 Solution

dwaddle
SplunkTrust
SplunkTrust

"Because join" ... Splunk's join command is useful and sometimes necessary. But for many use cases -- particularly for people coming from SQL background -- they are the worst way to solve the problem.

I realize these are just examples and you have more complex real-life searches but let's consider your "s3" search as an example:

index=abc | top name | join name [index=abc | stats count(x) by name ]

In this search, you have asked Splunk to run a dense search - index=abc - twice. Then you've asked it to join those results together. Before "s3" can return any results, it has to dispatch both of those searches, gather the events matching both, and then perform the join. It is possible (but unconfirmed) that Splunk can run those two searches in parallel. (But it just as well may not)

However, in a postprocess .. the first search definitely absolutely has to finish before the second can begin. So you've asked Splunk to run a dense search, and then wait for it to finish before starting another dense search to join them together. It is just not going to end well.

Almost any use of join for the purpose of adding on stats data can be accomplished much more efficiently through use of the eventstats command, with the occasional eval here and there. I would suggest rewriting your search to be less SQL-like and take advantage of the different tools that Splunk offers to avoid relying on join except where you definitely need it.

View solution in original post

dwaddle
SplunkTrust
SplunkTrust

"Because join" ... Splunk's join command is useful and sometimes necessary. But for many use cases -- particularly for people coming from SQL background -- they are the worst way to solve the problem.

I realize these are just examples and you have more complex real-life searches but let's consider your "s3" search as an example:

index=abc | top name | join name [index=abc | stats count(x) by name ]

In this search, you have asked Splunk to run a dense search - index=abc - twice. Then you've asked it to join those results together. Before "s3" can return any results, it has to dispatch both of those searches, gather the events matching both, and then perform the join. It is possible (but unconfirmed) that Splunk can run those two searches in parallel. (But it just as well may not)

However, in a postprocess .. the first search definitely absolutely has to finish before the second can begin. So you've asked Splunk to run a dense search, and then wait for it to finish before starting another dense search to join them together. It is just not going to end well.

Almost any use of join for the purpose of adding on stats data can be accomplished much more efficiently through use of the eventstats command, with the occasional eval here and there. I would suggest rewriting your search to be less SQL-like and take advantage of the different tools that Splunk offers to avoid relying on join except where you definitely need it.

ahmetcepoglu
Engager

That makes sense, thanks.

0 Karma

somesoni2
Revered Legend

Postprocesses are primarily for code/query reuse (help maintain it efficienly). Performance is not gauranteed in terms of time, they save resources though. Also, postprocess should be doing more filtering/processing on existing data.

0 Karma
Get Updates on the Splunk Community!

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...