Wow, so finding any related questions on this has proven very difficult as any searches for "Splunk grouping events together" all points to transactions, etc.
Splunk is grouping events together for some reason into single events and I cannot seem to find a pattern as to why it is doing this.
Here is an example of our events that are grouped together:
2018-08-27T14:23:32.345136+00:00 host01 FOO[28683]: FOO6004: SMS from <UNKNOWN> for MDN=00000000 being dispatched to SMSC XYZA for delivery
2018-08-27T14:23:32.483302+00:00 host01 FOO[28683]: FOO6002: Received SMS request from HTTPD @ <UNKNOWN> for destination MDN=00000000000
2018-08-27T14:23:32.483325+00:00 host01 FOO[28683]: FOO6004: SMS from <UNKNOWN> for MDN=00000000 being dispatched to SMSC XYZA for delivery
2018-08-27T14:23:32.483302+00:00 host01 FOO[28683]: FOO6002: Received SMS request from HTTPD @ <UNKNOWN> for destination MDN=00000000
2018-08-27T14:23:32.483325+00:00 host01 FOO[28683]: FOO6004: SMS from <UNKNOWN> for MDN=00000000 being dispatched to SMSC XYZA for delivery
Then that grouping ends and then this is the start of the next one... As you can see its not grouped by second. Anyone ever see anything like this? Got any hints?
2018-08-27T14:23:28.325135+00:00 host01 FOO[5060]: FOO6002: Received SMS request from HTTPD @ <UNKNOWN> for destination MDN=00000000
2018-08-27T14:23:28.325157+00:00 host01 FOO[5060]: FOO6004: SMS from <UNKNOWN> for MDN=0000000 being dispatched to SMSC XYZA for delivery
2018-08-27T14:23:28.325135+00:00 host01 FOO[5060]: FOO6002: Received SMS request from HTTPD @ <UNKNOWN> for destination MDN=00000000
The solution to this existed in the props.conf file as we had the regex wrong for this particular sourcetype. I was looking at the props.conf on the host where the data was being generated instead of the indexer.
The solution to this existed in the props.conf file as we had the regex wrong for this particular sourcetype. I was looking at the props.conf on the host where the data was being generated instead of the indexer.
Use this:
# Use the following attributes to define the length of a line.
TRUNCATE =
* Change the default maximum line length (in bytes).
* Although this is in bytes, line length is rounded down when this would
otherwise land mid-character for multi-byte characters.
* Set to 0 if you never want truncation (very long lines are, however, often
a sign of garbage data).
* Defaults to 10000 bytes.
LINE_BREAKER =
* Specifies a regex that determines how the raw text stream is broken into
initial events, before line merging takes place. (See the SHOULD_LINEMERGE
attribute, below)
* Defaults to ([\r\n]+), meaning data is broken into an event for each line,
delimited by any number of carriage return or newline characters.
* The regex must contain a capturing group -- a pair of parentheses which
defines an identified subcomponent of the match.
* Wherever the regex matches, Splunk considers the start of the first
capturing group to be the end of the previous event, and considers the end
of the first capturing group to be the start of the next event.
* The contents of the first capturing group are discarded, and will not be
present in any event. You are telling Splunk that this text comes between
lines.
* NOTE: You get a significant boost to processing speed when you use
LINE_BREAKER to delimit multi-line events (as opposed to using
SHOULD_LINEMERGE to reassemble individual lines into multi-line events).
* When using LINE_BREAKER to delimit events, SHOULD_LINEMERGE should be set
to false, to ensure no further combination of delimited events occurs.
* Using LINE_BREAKER to delimit events is discussed in more detail in the web
documentation at the following url:
http://docs.splunk.com/Documentation/Splunk/latest/Data/Configureeventlinebreaking
** Special considerations for LINE_BREAKER with branched expressions **
When using LINE_BREAKER with completely independent patterns separated by
pipes, some special issues come into play.
EG. LINE_BREAKER = pattern1|pattern2|pattern3
Note, this is not about all forms of alternation, eg there is nothing
particular special about
example: LINE_BREAKER = ([\r\n])+(one|two|three)
where the top level remains a single expression.
A caution: Relying on these rules is NOT encouraged. Simpler is better, in
both regular expressions and the complexity of the behavior they rely on.
If possible, it is strongly recommended that you reconstruct your regex to
have a leftmost capturing group that always matches.
It may be useful to use non-capturing groups if you need to express a group
before the text to discard.
EG. LINE_BREAKER = (?:one|two)([\r\n]+)
* This will match the text one, or two, followed by any amount of
newlines or carriage returns. The one-or-two group is non-capturing
via the ?: prefix and will be skipped by LINE_BREAKER.
* A branched expression can match without the first capturing group
matching, so the line breaker behavior becomes more complex.
Rules:
1: If the first capturing group is part of a match, it is considered the
linebreak, as normal.
2: If the first capturing group is not part of a match, the leftmost
capturing group which is part of a match will be considered the linebreak.
3: If no capturing group is part of the match, the linebreaker will assume
that the linebreak is a zero-length break immediately preceding the match.
Example 1: LINE_BREAKER = end(\n)begin|end2(\n)begin2|begin3
* A line ending with 'end' followed a line beginning with 'begin' would
match the first branch, and the first capturing group would have a match
according to rule 1. That particular newline would become a break
between lines.
* A line ending with 'end2' followed by a line beginning with 'begin2'
would match the second branch and the second capturing group would have
a match. That second capturing group would become the linebreak
according to rule 2, and the associated newline would become a break
between lines.
* The text 'begin3' anywhere in the file at all would match the third
branch, and there would be no capturing group with a match. A linebreak
would be assumed immediately prior to the text 'begin3' so a linebreak
would be inserted prior to this text in accordance with rule 3. This
means that a linebreak will occur before the text 'begin3' at any
point in the text, whether a linebreak character exists or not.
Example 2: Example 1 would probably be better written as follows. This is
not equivalent for all possible files, but for most real files
would be equivalent.
LINE_BREAKER = end2?(\n)begin(2|3)?
LINE_BREAKER_LOOKBEHIND =
* When there is leftover data from a previous raw chunk,
LINE_BREAKER_LOOKBEHIND indicates the number of bytes before the end of
the raw chunk (with the next chunk concatenated) that Splunk applies the
LINE_BREAKER regex. You may want to increase this value from its default
if you are dealing with especially large or multi-line events.
* Defaults to 100 (bytes).
# Use the following attributes to specify how multi-line events are handled.
SHOULD_LINEMERGE = [true|false]
* When set to true, Splunk combines several lines of data into a single
multi-line event, based on the following configuration attributes.
* Defaults to true.
# When SHOULD_LINEMERGE is set to true, use the following attributes to
# define how Splunk builds multi-line events.
BREAK_ONLY_BEFORE_DATE = [true|false]
* When set to true, Splunk creates a new event only if it encounters a new
line with a date.
* Note, when using DATETIME_CONFIG = CURRENT or NONE, this setting is not
meaningful, as timestamps are not identified.
* Defaults to true.
BREAK_ONLY_BEFORE =
* When set, Splunk creates a new event only if it encounters a new line that
matches the regular expression.
* Defaults to empty.
MUST_BREAK_AFTER =
* When set and the regular expression matches the current line, Splunk
creates a new event for the next input line.
* Splunk may still break before the current line if another rule matches.
* Defaults to empty.
MUST_NOT_BREAK_AFTER =
* When set and the current line matches the regular expression, Splunk does
not break on any subsequent lines until the MUST_BREAK_AFTER expression
matches.
* Defaults to empty.
MUST_NOT_BREAK_BEFORE =
* When set and the current line matches the regular expression, Splunk does
not break the last event before the current line.
* Defaults to empty.
MAX_EVENTS =
* Specifies the maximum number of input lines to add to any event.
* Splunk breaks after the specified number of lines are read.
* Defaults to 256 (lines).
# Use the following attributes to handle better load balancing from UF.
# Please note the EVENT_BREAKER properties are applicable for Splunk Universal
# Forwarder instances only.
EVENT_BREAKER_ENABLE = [true|false]
* When set to true, Splunk will split incoming data with a light-weight
chunked line breaking processor so that data is distributed fairly evenly
amongst multiple indexers. Use this setting on the UF to indicate that
data should be split on event boundaries across indexers especially
for large files.
* Defaults to false
# Use the following to define event boundaries for multi-line events
# For single-line events, the default settings should suffice
EVENT_BREAKER =
* When set, Splunk will use the setting to define an event boundary at the
end of the first matching group instance.`
https://docs.splunk.com/Documentation/Splunk/7.1.2/Admin/Propsconf
Hello @EricLyoyd9
I think the issue is with line breaking . Could you please user LINE_BREAKER_ENABLE in props.conf.
Below link can better help you:
http://docs.splunk.com/Documentation/Splunk/7.1.1/Data/Resolvedataqualityissues#Event_breaking.2C_or_aggregation.2C_issues
Did you just steal my link? :horse:
@pyro_wood: Coincidence i also found the same link 😛
Hi @EricLloyd79,
I think there are some problems with proper Event-Line-Breaking, that causes the events to group.
Interestingly when I do this according to the documentation it does not return any line breaker issues:
To confirm that your Splunk software has event breaking issues, do one or more of the following:
View the Monitoring Console Data Quality dashboard.
Search for events that are multiple events combined into one.
Check splunkd.log for messages such as the following:
@EricLyoyd79
Did you set LINE_BREAKER_ENABLE in propos.conf?