Solved: Re: Why would Splunk be grouping events together w...

EricLloyd79 · ‎08-27-2018

Wow, so finding any related questions on this has proven very difficult as any searches for "Splunk grouping events together" all points to transactions, etc.
Splunk is grouping events together for some reason into single events and I cannot seem to find a pattern as to why it is doing this.
Here is an example of our events that are grouped together:

2018-08-27T14:23:32.345136+00:00 host01 FOO[28683]: FOO6004: SMS from <UNKNOWN> for MDN=00000000 being dispatched to SMSC XYZA for delivery
2018-08-27T14:23:32.483302+00:00 host01 FOO[28683]: FOO6002: Received SMS request from HTTPD @ <UNKNOWN> for destination MDN=00000000000
2018-08-27T14:23:32.483325+00:00 host01 FOO[28683]: FOO6004: SMS from <UNKNOWN> for MDN=00000000 being dispatched to SMSC XYZA for delivery
2018-08-27T14:23:32.483302+00:00 host01 FOO[28683]: FOO6002: Received SMS request from HTTPD @ <UNKNOWN> for destination MDN=00000000
2018-08-27T14:23:32.483325+00:00 host01 FOO[28683]: FOO6004: SMS from <UNKNOWN> for MDN=00000000 being dispatched to SMSC XYZA for delivery

Then that grouping ends and then this is the start of the next one... As you can see its not grouped by second. Anyone ever see anything like this? Got any hints?

2018-08-27T14:23:28.325135+00:00 host01 FOO[5060]: FOO6002: Received SMS request from HTTPD @ <UNKNOWN> for destination MDN=00000000
2018-08-27T14:23:28.325157+00:00 host01 FOO[5060]: FOO6004: SMS from <UNKNOWN> for MDN=0000000 being dispatched to SMSC XYZA for delivery
2018-08-27T14:23:28.325135+00:00 host01 FOO[5060]: FOO6002: Received SMS request from HTTPD @ <UNKNOWN> for destination MDN=00000000

EricLloyd79 · ‎09-06-2018

The solution to this existed in the props.conf file as we had the regex wrong for this particular sourcetype. I was looking at the props.conf on the host where the data was being generated instead of the indexer.

View solution in original post

EricLloyd79 · ‎09-06-2018

The solution to this existed in the props.conf file as we had the regex wrong for this particular sourcetype. I was looking at the props.conf on the host where the data was being generated instead of the indexer.

vishaltaneja070 · ‎08-27-2018

Use this:

# Use the following attributes to define the length of a line.

TRUNCATE = 
* Change the default maximum line length (in bytes).
* Although this is in bytes, line length is rounded down when this would
  otherwise land mid-character for multi-byte characters.
* Set to 0 if you never want truncation (very long lines are, however, often
  a sign of garbage data).
* Defaults to 10000 bytes.

LINE_BREAKER = 
* Specifies a regex that determines how the raw text stream is broken into
  initial events, before line merging takes place. (See the SHOULD_LINEMERGE
  attribute, below)
* Defaults to ([\r\n]+), meaning data is broken into an event for each line,
  delimited by any number of carriage return or newline characters.
* The regex must contain a capturing group -- a pair of parentheses which
  defines an identified subcomponent of the match.
* Wherever the regex matches, Splunk considers the start of the first
  capturing group to be the end of the previous event, and considers the end
  of the first capturing group to be the start of the next event.
* The contents of the first capturing group are discarded, and will not be
  present in any event.  You are telling Splunk that this text comes between
  lines.
* NOTE: You get a significant boost to processing speed when you use
  LINE_BREAKER to delimit multi-line events (as opposed to using
  SHOULD_LINEMERGE to reassemble individual lines into multi-line events).
  * When using LINE_BREAKER to delimit events, SHOULD_LINEMERGE should be set
    to false, to ensure no further combination of delimited events occurs.
  * Using LINE_BREAKER to delimit events is discussed in more detail in the web
    documentation at the following url:
    http://docs.splunk.com/Documentation/Splunk/latest/Data/Configureeventlinebreaking

** Special considerations for LINE_BREAKER with branched expressions  **

When using LINE_BREAKER with completely independent patterns separated by
pipes, some special issues come into play.
    EG. LINE_BREAKER = pattern1|pattern2|pattern3

Note, this is not about all forms of alternation, eg there is nothing
particular special about
    example: LINE_BREAKER = ([\r\n])+(one|two|three)
where the top level remains a single expression.

A caution: Relying on these rules is NOT encouraged.  Simpler is better, in
both regular expressions and the complexity of the behavior they rely on.
If possible, it is strongly recommended that you reconstruct your regex to
have a leftmost capturing group that always matches.

It may be useful to use non-capturing groups if you need to express a group
before the text to discard.
    EG. LINE_BREAKER = (?:one|two)([\r\n]+)
    * This will match the text one, or two, followed by any amount of
      newlines or carriage returns.  The one-or-two group is non-capturing
      via the ?: prefix and will be skipped by LINE_BREAKER.

* A branched expression can match without the first capturing group
  matching, so the line breaker behavior becomes more complex.
  Rules:
  1: If the first capturing group is part of a match, it is considered the
     linebreak, as normal.
  2: If the first capturing group is not part of a match, the leftmost
     capturing group which is part of a match will be considered the linebreak.
  3: If no capturing group is part of the match, the linebreaker will assume
     that the linebreak is a zero-length break immediately preceding the match.

Example 1:  LINE_BREAKER = end(\n)begin|end2(\n)begin2|begin3

  * A line ending with 'end' followed a line beginning with 'begin' would
    match the first branch, and the first capturing group would have a match
    according to rule 1.  That particular newline would become a break
    between lines.
  * A line ending with 'end2' followed by a line beginning with 'begin2'
    would match the second branch and the second capturing group would have
    a match.  That second capturing group would become the linebreak
    according to rule 2, and the associated newline would become a break
    between lines.
  * The text 'begin3' anywhere in the file at all would match the third
    branch, and there would be no capturing group with a match.  A linebreak
    would be assumed immediately prior to the text 'begin3' so a linebreak
    would be inserted prior to this text in accordance with rule 3.  This
    means that a linebreak will occur before the text 'begin3' at any
    point in the text, whether a linebreak character exists or not.

Example 2: Example 1 would probably be better written as follows.  This is
           not equivalent for all possible files, but for most real files
           would be equivalent.

           LINE_BREAKER = end2?(\n)begin(2|3)?

LINE_BREAKER_LOOKBEHIND = 
* When there is leftover data from a previous raw chunk,
  LINE_BREAKER_LOOKBEHIND indicates the number of bytes before the end of
  the raw chunk (with the next chunk concatenated) that Splunk applies the
  LINE_BREAKER regex. You may want to increase this value from its default
  if you are dealing with especially large or multi-line events.
* Defaults to 100 (bytes).

# Use the following attributes to specify how multi-line events are handled.

SHOULD_LINEMERGE = [true|false]
* When set to true, Splunk combines several lines of data into a single
  multi-line event, based on the following configuration attributes.
* Defaults to true.

# When SHOULD_LINEMERGE is set to true, use the following attributes to
# define how Splunk builds multi-line events.

BREAK_ONLY_BEFORE_DATE = [true|false]
* When set to true, Splunk creates a new event only if it encounters a new
  line with a date.
  * Note, when using DATETIME_CONFIG = CURRENT or NONE, this setting is not
    meaningful, as timestamps are not identified.
* Defaults to true.

BREAK_ONLY_BEFORE = 
* When set, Splunk creates a new event only if it encounters a new line that
  matches the regular expression.
* Defaults to empty.

MUST_BREAK_AFTER = 
* When set and the regular expression matches the current line, Splunk
  creates a new event for the next input line.
* Splunk may still break before the current line if another rule matches.
* Defaults to empty.

MUST_NOT_BREAK_AFTER = 
* When set and the current line matches the regular expression, Splunk does
  not break on any subsequent lines until the MUST_BREAK_AFTER expression
  matches.
* Defaults to empty.

MUST_NOT_BREAK_BEFORE = 
* When set and the current line matches the regular expression, Splunk does
  not break the last event before the current line.
* Defaults to empty.

MAX_EVENTS = 
* Specifies the maximum number of input lines to add to any event.
* Splunk breaks after the specified number of lines are read.
* Defaults to 256 (lines).

# Use the following attributes to handle better load balancing from UF.
# Please note the EVENT_BREAKER properties are applicable for Splunk Universal
# Forwarder instances only.

EVENT_BREAKER_ENABLE = [true|false]
* When set to true, Splunk will split incoming data with a light-weight
  chunked line breaking processor so that data is distributed fairly evenly
  amongst multiple indexers. Use this setting on the UF to indicate that
  data should be split on event boundaries across indexers especially
  for large files.
* Defaults to false

# Use the following to define event boundaries for multi-line events
# For single-line events, the default settings should suffice

EVENT_BREAKER = 
* When set, Splunk will use the setting to define an event boundary at the
  end of the first matching group instance.`


https://docs.splunk.com/Documentation/Splunk/7.1.2/Admin/Propsconf

vishaltaneja070 · ‎08-27-2018

Hello @EricLyoyd9

I think the issue is with line breaking . Could you please user LINE_BREAKER_ENABLE in props.conf.

Below link can better help you:

http://docs.splunk.com/Documentation/Splunk/7.1.1/Data/Resolvedataqualityissues#Event_breaking.2C_or_aggregation.2C_issues

horsefez · ‎08-27-2018

Did you just steal my link? :horse:

vishaltaneja070 · ‎08-27-2018

@pyro_wood: Coincidence i also found the same link 😛

horsefez · ‎08-27-2018

Hi @EricLloyd79,

I think there are some problems with proper Event-Line-Breaking, that causes the events to group.

http://docs.splunk.com/Documentation/Splunk/7.1.1/Data/Resolvedataqualityissues#Event_breaking.2C_or...

EricLloyd79 · ‎08-27-2018

Interestingly when I do this according to the documentation it does not return any line breaker issues:

To confirm that your Splunk software has event breaking issues, do one or more of the following:

View the Monitoring Console Data Quality dashboard.
Search for events that are multiple events combined into one.
Check splunkd.log for messages such as the following:

vishaltaneja070 · ‎08-27-2018

@EricLyoyd79

Did you set LINE_BREAKER_ENABLE in propos.conf?

Why would Splunk be grouping events together when I haven't told it to?

Can’t make it to .conf25? Join us online!

Community Content Calendar, September edition

Splunkbase Unveils New App Listing Management Public Preview

Leveraging Automated Threat Analysis Across the Splunk Ecosystem

Are you a member of the Splunk Community?

Why would Splunk be grouping events together when I haven't told it to?

Can’t make it to .conf25? Join us online!

Community Content Calendar, September edition

Splunkbase Unveils New App Listing Management Public Preview

Leveraging Automated Threat Analysis Across the Splunk Ecosystem