What is the main difference between LINE_BREAKER and BREAK_ONLY_BEFORE and what is the use of these?
LINE_BREAKER and BREAK_ONLY_BEFORE are both props.conf settings, and they're used in different parts of the parsing / indexing process.
You can see a detailed chart of this on the Splunk Wiki. But LINE_BREAKER defines what ends a "line" in an input file. By default it's any number of CR and LF characters. (Depending on your format of your input, this could need to be altered for correctness, or if your log format can be separated into events by a simple regex, LINE_BREAKER can be altered to find the event boundary, and SHOULD_LINEMERGE can be set to false to skip the next step of the process).
During the next phase, Splunk takes the individual lines and combines them back together to form events. (Certain log formats may have multi-line events, especially stacktraces). BREAK_ONLY_BEFORE is one of many attributes used to determine where the event boundaries are.
LINE_BREAKER and BREAK_ONLY_BEFORE are both props.conf settings, and they're used in different parts of the parsing / indexing process.
You can see a detailed chart of this on the Splunk Wiki. But LINE_BREAKER defines what ends a "line" in an input file. By default it's any number of CR and LF characters. (Depending on your format of your input, this could need to be altered for correctness, or if your log format can be separated into events by a simple regex, LINE_BREAKER can be altered to find the event boundary, and SHOULD_LINEMERGE can be set to false to skip the next step of the process).
During the next phase, Splunk takes the individual lines and combines them back together to form events. (Certain log formats may have multi-line events, especially stacktraces). BREAK_ONLY_BEFORE is one of many attributes used to determine where the event boundaries are.
Thank you acharlieh
Is there an indexing performance gain by using one over the other? For instance, JSON formatted events?
In terms of parsing events, you may see some gains if you can split events with a simple LINE_BREAKER regex and SHOULD_LINEMERGE=false as you essentially skip a step. That said however, if your logs aren't able to be split with a simple enough regex, you could wind up more time than using the other aggregation settings (but that'd be something to measure as you try things out).
JSON formatted and other structured events (like CSV and IIS/W3C logs), you actually have another option to play with as well, could potentially offload the parsing to your Universal Forwarders, and eliminating search time parsing by using INDEXED_EXTRACTIONS and _json sourcetype.