I would like to ask question regarding to search.
I have a search including couple of joins.
| join xxxx [ index=B ] *2
| join xxxx [ index=C ] *3
| join xxxx [ index=D ] *4
each index has more than 20 million records.
According to the search.log,
when I execute the search once it processes
it went back to *2 and then processes same flow.
I have confirmed this in search.log in dispatch directory.
I found a line in the log that says,
Stats Processor - reached limit maxmemusage_mb=200 , results may be incomplete
but couldn't confirm if this is related.
Is this a natural operation for splunk?
I appreciate if someone can give me advice.
join command should be only a last resort, and in this case it will definitely truncate the rows of one or more of your searches here. You can read about the limitations of
join here: http://answers.splunk.com/answers/822/simulating-a-sql-join-in-splunk
The good news is that there are much better, more efficient and more splunk-like ways to do the same thing, generally.
Check out this page, and the flow chart therein.
And the best and most common of the ways to do your "join" here without join, is this way:
index=A OR index=B OR index=C OR index=D | stats count sum(foo) last(aField) as aField values(bField) as bField by xxxx
As for what's in between the
stats and the
by xxxx, you obviously have to think about that stuff.
These docs can tell you all about the options in the stats command:
Often there will be little idiosyncratic search language bits and normalization tricks inside all your separate joined searches. You can usually redo that logic out in the single search pipeline by using Splunk's
eval command, often using the