Wow, so finding any related questions on this has proven very difficult as any searches for "Splunk grouping events together" all points to transactions, etc.
Splunk is grouping events together for some reason into single events and I cannot seem to find a pattern as to why it is doing this.
Here is an example of our events that are grouped together:
2018-08-27T14:23:32.345136+00:00 host01 FOO: FOO6004: SMS from <UNKNOWN> for MDN=00000000 being dispatched to SMSC XYZA for delivery 2018-08-27T14:23:32.483302+00:00 host01 FOO: FOO6002: Received SMS request from HTTPD @ <UNKNOWN> for destination MDN=00000000000 2018-08-27T14:23:32.483325+00:00 host01 FOO: FOO6004: SMS from <UNKNOWN> for MDN=00000000 being dispatched to SMSC XYZA for delivery 2018-08-27T14:23:32.483302+00:00 host01 FOO: FOO6002: Received SMS request from HTTPD @ <UNKNOWN> for destination MDN=00000000 2018-08-27T14:23:32.483325+00:00 host01 FOO: FOO6004: SMS from <UNKNOWN> for MDN=00000000 being dispatched to SMSC XYZA for delivery
Then that grouping ends and then this is the start of the next one... As you can see its not grouped by second. Anyone ever see anything like this? Got any hints?
2018-08-27T14:23:28.325135+00:00 host01 FOO: FOO6002: Received SMS request from HTTPD @ <UNKNOWN> for destination MDN=00000000 2018-08-27T14:23:28.325157+00:00 host01 FOO: FOO6004: SMS from <UNKNOWN> for MDN=0000000 being dispatched to SMSC XYZA for delivery 2018-08-27T14:23:28.325135+00:00 host01 FOO: FOO6002: Received SMS request from HTTPD @ <UNKNOWN> for destination MDN=00000000
I think there are some problems with proper Event-Line-Breaking, that causes the events to group.
Interestingly when I do this according to the documentation it does not return any line breaker issues:
To confirm that your Splunk software has event breaking issues, do one or more of the following:
View the Monitoring Console Data Quality dashboard.
Search for events that are multiple events combined into one.
Check splunkd.log for messages such as the following:
I think the issue is with line breaking . Could you please user LINEBREAKERENABLE in props.conf.
Below link can better help you:
# Use the following attributes to define the length of a line. TRUNCATE = * Change the default maximum line length (in bytes). * Although this is in bytes, line length is rounded down when this would otherwise land mid-character for multi-byte characters. * Set to 0 if you never want truncation (very long lines are, however, often a sign of garbage data). * Defaults to 10000 bytes. LINE_BREAKER = * Specifies a regex that determines how the raw text stream is broken into initial events, before line merging takes place. (See the SHOULD_LINEMERGE attribute, below) * Defaults to ([\r\n]+), meaning data is broken into an event for each line, delimited by any number of carriage return or newline characters. * The regex must contain a capturing group -- a pair of parentheses which defines an identified subcomponent of the match. * Wherever the regex matches, Splunk considers the start of the first capturing group to be the end of the previous event, and considers the end of the first capturing group to be the start of the next event. * The contents of the first capturing group are discarded, and will not be present in any event. You are telling Splunk that this text comes between lines. * NOTE: You get a significant boost to processing speed when you use LINE_BREAKER to delimit multi-line events (as opposed to using SHOULD_LINEMERGE to reassemble individual lines into multi-line events). * When using LINE_BREAKER to delimit events, SHOULD_LINEMERGE should be set to false, to ensure no further combination of delimited events occurs. * Using LINE_BREAKER to delimit events is discussed in more detail in the web documentation at the following url: http://docs.splunk.com/Documentation/Splunk/latest/Data/Configureeventlinebreaking ** Special considerations for LINE_BREAKER with branched expressions ** When using LINE_BREAKER with completely independent patterns separated by pipes, some special issues come into play. EG. LINE_BREAKER = pattern1|pattern2|pattern3 Note, this is not about all forms of alternation, eg there is nothing particular special about example: LINE_BREAKER = ([\r\n])+(one|two|three) where the top level remains a single expression. A caution: Relying on these rules is NOT encouraged. Simpler is better, in both regular expressions and the complexity of the behavior they rely on. If possible, it is strongly recommended that you reconstruct your regex to have a leftmost capturing group that always matches. It may be useful to use non-capturing groups if you need to express a group before the text to discard. EG. LINE_BREAKER = (?:one|two)([\r\n]+) * This will match the text one, or two, followed by any amount of newlines or carriage returns. The one-or-two group is non-capturing via the ?: prefix and will be skipped by LINE_BREAKER. * A branched expression can match without the first capturing group matching, so the line breaker behavior becomes more complex. Rules: 1: If the first capturing group is part of a match, it is considered the linebreak, as normal. 2: If the first capturing group is not part of a match, the leftmost capturing group which is part of a match will be considered the linebreak. 3: If no capturing group is part of the match, the linebreaker will assume that the linebreak is a zero-length break immediately preceding the match. Example 1: LINE_BREAKER = end(\n)begin|end2(\n)begin2|begin3 * A line ending with 'end' followed a line beginning with 'begin' would match the first branch, and the first capturing group would have a match according to rule 1. That particular newline would become a break between lines. * A line ending with 'end2' followed by a line beginning with 'begin2' would match the second branch and the second capturing group would have a match. That second capturing group would become the linebreak according to rule 2, and the associated newline would become a break between lines. * The text 'begin3' anywhere in the file at all would match the third branch, and there would be no capturing group with a match. A linebreak would be assumed immediately prior to the text 'begin3' so a linebreak would be inserted prior to this text in accordance with rule 3. This means that a linebreak will occur before the text 'begin3' at any point in the text, whether a linebreak character exists or not. Example 2: Example 1 would probably be better written as follows. This is not equivalent for all possible files, but for most real files would be equivalent. LINE_BREAKER = end2?(\n)begin(2|3)? LINE_BREAKER_LOOKBEHIND = * When there is leftover data from a previous raw chunk, LINE_BREAKER_LOOKBEHIND indicates the number of bytes before the end of the raw chunk (with the next chunk concatenated) that Splunk applies the LINE_BREAKER regex. You may want to increase this value from its default if you are dealing with especially large or multi-line events. * Defaults to 100 (bytes). # Use the following attributes to specify how multi-line events are handled. SHOULD_LINEMERGE = [true|false] * When set to true, Splunk combines several lines of data into a single multi-line event, based on the following configuration attributes. * Defaults to true. # When SHOULD_LINEMERGE is set to true, use the following attributes to # define how Splunk builds multi-line events. BREAK_ONLY_BEFORE_DATE = [true|false] * When set to true, Splunk creates a new event only if it encounters a new line with a date. * Note, when using DATETIME_CONFIG = CURRENT or NONE, this setting is not meaningful, as timestamps are not identified. * Defaults to true. BREAK_ONLY_BEFORE = * When set, Splunk creates a new event only if it encounters a new line that matches the regular expression. * Defaults to empty. MUST_BREAK_AFTER = * When set and the regular expression matches the current line, Splunk creates a new event for the next input line. * Splunk may still break before the current line if another rule matches. * Defaults to empty. MUST_NOT_BREAK_AFTER = * When set and the current line matches the regular expression, Splunk does not break on any subsequent lines until the MUST_BREAK_AFTER expression matches. * Defaults to empty. MUST_NOT_BREAK_BEFORE = * When set and the current line matches the regular expression, Splunk does not break the last event before the current line. * Defaults to empty. MAX_EVENTS = * Specifies the maximum number of input lines to add to any event. * Splunk breaks after the specified number of lines are read. * Defaults to 256 (lines). # Use the following attributes to handle better load balancing from UF. # Please note the EVENT_BREAKER properties are applicable for Splunk Universal # Forwarder instances only. EVENT_BREAKER_ENABLE = [true|false] * When set to true, Splunk will split incoming data with a light-weight chunked line breaking processor so that data is distributed fairly evenly amongst multiple indexers. Use this setting on the UF to indicate that data should be split on event boundaries across indexers especially for large files. * Defaults to false # Use the following to define event boundaries for multi-line events # For single-line events, the default settings should suffice EVENT_BREAKER = * When set, Splunk will use the setting to define an event boundary at the end of the first matching group instance.` https://docs.splunk.com/Documentation/Splunk/7.1.2/Admin/Propsconf
The solution to this existed in the props.conf file as we had the regex wrong for this particular sourcetype. I was looking at the props.conf on the host where the data was being generated instead of the indexer.