Is there any performance benefit in :
using one eval with several chained statements
v/s
using separate eval statements ( which may be split to improve SPL readability for extremely large SPL's)
| eval A = "OM"
| eval B = " NOM"
| eval C = " NOM"
| eval D= " NOM"
| eval E = " NOM"
or
| eval A = "OM" , B = " NOM" , C = " NOM" , D= " NOM" , E = " NOM"
TL;DR: It appears chained evals are slightly faster than separated evals.
Methodology:
We are able to go through this ourselves using the job inspector.
Following is a run anywhere example scaled up and ran in verbose mode, so differences might be seen.
The gentimes command is used to generate unique timestamps at 1 second for each event, so we get unique events every time we run the search. In the command below, it generates 8,640,000 events (which is the number of seconds in 100 days).
Chained Command:
| gentimes start=-100 end=0 increment=1s
| eval A = "OM" , B = "NOM" , C = "NOM" , D = "NOM" , E = "NOM" , F = "OM" , G = "NOM" , H = "NOM" , I = "NOM" , J = "NOM" , K = "OM" , L = "NOM" , M = "NOM" , N = "NOM" , O = "NOM" , P = "OM" , Q = "NOM" , R = "NOM" , S = "NOM" , T = "NOM" , U = "OM" , V = "NOM" , W = "NOM" , X = "NOM" , Y = "NOM" , Z = "OM"
Separated Command:
| gentimes start=-100 end=0 increment=1s
| eval A = "OM"
| eval B = "NOM"
| eval C = "NOM"
| eval D = "NOM"
| eval E = "NOM"
| eval F = "OM"
| eval G = "NOM"
| eval H = "NOM"
| eval I = "NOM"
| eval J = "NOM"
| eval K = "OM"
| eval L = "NOM"
| eval M = "NOM"
| eval N = "NOM"
| eval O = "NOM"
| eval P = "OM"
| eval Q = "NOM"
| eval R = "NOM"
| eval S = "NOM"
| eval T = "NOM"
| eval U = "OM"
| eval V = "NOM"
| eval W = "NOM"
| eval X = "NOM"
| eval Y = "NOM"
| eval Z = "OM"
Results:
Chained Evals Search Time = 325.429 (80.13 seconds for the command.eval)
Separated Evals Search Time = 348.053 (98.77 seconds for the command.eval)
I seem to recall, but was not able to locate a reference, that every pipe costs something. In this example of 8.64 million events, that looks to be at least 18 seconds more using separated evals than chained evals (the remaining time is between running command.gentimes). YMMV based on your needs and your infrastructure, but it might be worth the readability to use the extra text and separate the evals.
Thanks for the check efavreau , I did something similar myself.
But the test numbers are too transient due to environmental factors.
Right thanks efavreau , yes makes sense that it probably may take longer for larger datasets.
You can use this methodology to see the impact of the different eval construction. Run it multiple times, and see the trend between the two. That's what I am providing here. There is an observable difference on large data sets.
@stanwin I also feel they are only for making eval more readable. Similarly for rename
command as well. But may be someone from Splunk may confirm that it is just for readability or more!
yes , but in cases like for example normalizing data etc with huge number of eval statements e.g 150+ if we consider.. will it be more efficient/performant? only Splunk can comment on that.
@stanwin,
I'm not sure about the performance benefit using chained eval.
The chained eval supported in Splunk 6.4 version. As per my suggestion to go with separate eval statements for backwards compatibility and readability.