Splunk Search

Using join in postprocess takes longer than duplicating the search

ahmetcepoglu
Engager

Hello

I have 3 searchmanagers like so (the actual queries are longer)

{% searchmanager id="s1" search="index=abc | top name" %}
{% postprocessmanager id ="s2" managerid="ms" search=" | join name [index=abc | stats count(x) by name ]" %}
{% searchmanager id="s3" search="index=abc | top name | join name [index=abc | stats count(x) by name" %}

You can see how s3 does the same thing s1 and s2 combined.
I have tables that show each search result, and they end in the following order: s1,s3,s2

How can post-processing not save me time here?

Tags (2)
0 Karma
1 Solution

dwaddle
SplunkTrust
SplunkTrust

"Because join" ... Splunk's join command is useful and sometimes necessary. But for many use cases -- particularly for people coming from SQL background -- they are the worst way to solve the problem.

I realize these are just examples and you have more complex real-life searches but let's consider your "s3" search as an example:

index=abc | top name | join name [index=abc | stats count(x) by name ]

In this search, you have asked Splunk to run a dense search - index=abc - twice. Then you've asked it to join those results together. Before "s3" can return any results, it has to dispatch both of those searches, gather the events matching both, and then perform the join. It is possible (but unconfirmed) that Splunk can run those two searches in parallel. (But it just as well may not)

However, in a postprocess .. the first search definitely absolutely has to finish before the second can begin. So you've asked Splunk to run a dense search, and then wait for it to finish before starting another dense search to join them together. It is just not going to end well.

Almost any use of join for the purpose of adding on stats data can be accomplished much more efficiently through use of the eventstats command, with the occasional eval here and there. I would suggest rewriting your search to be less SQL-like and take advantage of the different tools that Splunk offers to avoid relying on join except where you definitely need it.

View solution in original post

dwaddle
SplunkTrust
SplunkTrust

"Because join" ... Splunk's join command is useful and sometimes necessary. But for many use cases -- particularly for people coming from SQL background -- they are the worst way to solve the problem.

I realize these are just examples and you have more complex real-life searches but let's consider your "s3" search as an example:

index=abc | top name | join name [index=abc | stats count(x) by name ]

In this search, you have asked Splunk to run a dense search - index=abc - twice. Then you've asked it to join those results together. Before "s3" can return any results, it has to dispatch both of those searches, gather the events matching both, and then perform the join. It is possible (but unconfirmed) that Splunk can run those two searches in parallel. (But it just as well may not)

However, in a postprocess .. the first search definitely absolutely has to finish before the second can begin. So you've asked Splunk to run a dense search, and then wait for it to finish before starting another dense search to join them together. It is just not going to end well.

Almost any use of join for the purpose of adding on stats data can be accomplished much more efficiently through use of the eventstats command, with the occasional eval here and there. I would suggest rewriting your search to be less SQL-like and take advantage of the different tools that Splunk offers to avoid relying on join except where you definitely need it.

ahmetcepoglu
Engager

That makes sense, thanks.

0 Karma

somesoni2
Revered Legend

Postprocesses are primarily for code/query reuse (help maintain it efficienly). Performance is not gauranteed in terms of time, they save resources though. Also, postprocess should be doing more filtering/processing on existing data.

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...