Hi, I'm doing some research for our new architecture and am currently doing some house keeping on our props and transforms.
If we have a regex for a sourcetype, is it faster to put the regex in the transforms or in the props and use line_breaker? Traditionally, we just set should_linemerge to false in the props and then put the regex in the transforms and link it via report in the props. However, I just read something that said doing regular expressions in line_breaker causes a performance increase.
Could someone provide a clear explanation on this for me?
Thanks. I guess my question is could I put the regex in the line_break and not even need anything in the transforms. But, I'm going to use TRANSFORM-### to link to the transforms to ensure our data gets parsed at index time instead of search time. Thanks for letting me ponder.
Thanks. I guess my question is could I put the regex in the line_break and not even need anything in the transforms. But, I'm going to use TRANSFORM-### to link to the transforms to ensure our data gets parsed at index time instead of search time. Thanks for letting me ponder.
Using LINE_BREAKER=
and SHOULD_LINEMERGE=false
will always be WAAAAAAAY faster than using SHOULD_LINEMERGE=true
. Obviously the better the RegEx
in your LINE_BREAKER
, the more efficient event processing will be so always spend extra time optimizing your LINE_BREAKER
.
Where did you read about regex in LINE_BREAKER causing a performance increase?
FWIW, LINE_BREAKER defaults to the regex ([\r\n]+)
so performance shouldn't be affected by regex.
NOTE: You get a significant boost to processing speed when you use
LINE_BREAKER to delimit multi-line events (as opposed to using
SHOULD_LINEMERGE to reassemble individual lines into multi-line events).
* When using LINE_BREAKER to delimit events, SHOULD_LINEMERGE should be set
to false, to ensure no further combination of delimited events occurs.
http://docs.splunk.com/Documentation/Splunk/latest/Admin/Propsconf