Fireeye json truncating lines

fletch13 · ‎11-06-2015

It appears that Splunk is truncating Fireeye (7.4) ext json messages. There are 90 lines in the message it only extracts 81 lines.
I don't see them in the "all fields" section either. Need help getting the other 9 lines indexed.

It's not indexing below fields:
},
"occurred": "2015-11-05 20:48:26+00",
"id": "9200",
"action": "notified",
"dst": {
"ip": "127.0.0.20",
"mac": "00:22:44:66:88:aa",
"port": "20"
},
"name": "infection-match"
}
}

We are using the latest Fireeye Add-on (3.0.7)..

Our props.conf file settings.
[fe_json]
TRUNCATE=0
SHOULD_LINEMERGE = false
LINE_BREAKER = ((?!))
KV_MODE = JSON
TIME_PREFIX = \"occurred\":\s
TIME_FORMAT = \"%Y-%m-%d %H:%M:%S+00\"
TZ = UTC
FIELDALIAS-dest_ip_for_fireeye_app = alert.dst.ip as dest_ip
FIELDALIAS-dest_for_fireeye = alert.dst.ip as dest
FIELDALIAS-dest_port_for_fireeye = alert.dst.port as dest_port
FIELDALIAS-dest_mac_for_fireeye = alert.dst.mac as dest_mac

jkat54 · ‎11-11-2015

Did we get anywhere?

jkat54 · ‎11-06-2015

It appears its breaking on timestamps

TIME_PREFIX = "occurred": s 
#looks like a typo.

...And you may need a line breaker too.

LINE_BREAKER = <regex>

LINE_BREAKER = <regular expression>
* Specifies a regex that determines how the raw text stream is broken into
  initial events, before line merging takes place. (See the SHOULD_LINEMERGE
  attribute, below)
* Defaults to ([\r\n]+), meaning data is broken into an event for each line,
  delimited by any number of carriage return or newline characters.
* The regex must contain a capturing group -- a pair of parentheses which
  defines an identified subcomponent of the match.
* Wherever the regex matches, Splunk considers the start of the first
  capturing group to be the end of the previous event, and considers the end
  of the first capturing group to be the start of the next event.
* The contents of the first capturing group are discarded, and will not be
  present in any event.  You are telling Splunk that this text comes between
  lines.
* NOTE: You get a significant boost to processing speed when you use
  LINE_BREAKER to delimit multiline events (as opposed to using
  SHOULD_LINEMERGE to reassemble individual lines into multiline events).
  * When using LINE_BREAKER to delimit events, SHOULD_LINEMERGE should be set
    to false, to ensure no further combination of delimited events occurs.
  * Using LINE_BREAKER to delimit events is discussed in more detail in the web
    documentation at the following url:
    http://docs.splunk.com/Documentation/Splunk/latest/Data/indexmulti-lineevents

** Special considerations for LINE_BREAKER with branched expressions  **

When using LINE_BREAKER with completely independent patterns separated by
pipes, some special issues come into play.
    EG. LINE_BREAKER = pattern1|pattern2|pattern3

Note, this is not about all forms of alternation, eg there is nothing
particular special about
    example: LINE_BREAKER = ([\r\n])+(one|two|three)
where the top level remains a single expression.

A caution: Relying on these rules is NOT encouraged.  Simpler is better, in
both regular expressions and the complexity of the behavior they rely on.
If possible, it is strongly recommended that you reconstruct your regex to
have a leftmost capturing group that always matches.

It may be useful to use non-capturing groups if you need to express a group
before the text to discard.
    EG. LINE_BREAKER = (?:one|two)([\r\n]+)
    * This will match the text one, or two, followed by any amount of
      newlines or carriage returns.  The one-or-two group is non-capturing
      via the ?: prefix and will be skipped by LINE_BREAKER.

* A branched expression can match without the first capturing group
  matching, so the line breaker behavior becomes more complex.
  Rules:
  1: If the first capturing group is part of a match, it is considered the
     linebreak, as normal.
  2: If the first capturing group is not part of a match, the leftmost
     capturing group which is part of a match will be considered the linebreak.
  3: If no capturing group is part of the match, the linebreaker will assume
     that the linebreak is a zero-length break immediately preceding the match.

Example 1:  LINE_BREAKER = end(\n)begin|end2(\n)begin2|begin3

  * A line ending with 'end' followed a line beginning with 'begin' would
    match the first branch, and the first capturing group would have a match
    according to rule 1.  That particular newline would become a break
    between lines.
  * A line ending with 'end2' followed by a line beginning with 'begin2'
    would match the second branch and the second capturing group would have
    a match.  That second capturing group would become the linebreak
    according to rule 2, and the associated newline would become a break
    between lines.
  * The text 'begin3' anywhere in the file at all would match the third
    branch, and there would be no capturing group with a match.  A linebreak
    would be assumed immediately prior to the text 'begin3' so a linebreak
    would be inserted prior to this text in accordance with rule 3.  This
    means that a linebreak will occur before the text 'begin3' at any
    point in the text, whether a linebreak character exists or not.

Example 2: Example 1 would probably be better written as follows.  This is
           not equivalent for all possible files, but for most real files
           would be equivalent.

           LINE_BREAKER = end2?(\n)begin(2|3)?

LINE_BREAKER_LOOKBEHIND = <integer>
* When there is leftover data from a previous raw chunk,
  LINE_BREAKER_LOOKBEHIND indicates the number of bytes before the end of
  the raw chunk (with the next chunk concatenated) that Splunk applies the
  LINE_BREAKER regex. You may want to increase this value from its default
  if you are dealing with especially large or multiline events.
* Defaults to 100 (bytes).

# Use the following attributes to specify how multiline events are handled.

SHOULD_LINEMERGE = [true|false]
* When set to true, Splunk combines several lines of data into a single
  multiline event, based on the following configuration attributes.
* Defaults to true.

# When SHOULD_LINEMERGE is set to true, use the following attributes to
# define how Splunk builds multiline events.

BREAK_ONLY_BEFORE_DATE = [true|false]
* When set to true, Splunk creates a new event only if it encounters a new
  line with a date.
  * Note, when using DATETIME_CONFIG = CURRENT or NONE, this setting is not
    meaningful, as timestamps are not identified.
* Defaults to true.

BREAK_ONLY_BEFORE = <regular expression>
* When set, Splunk creates a new event only if it encounters a new line that
  matches the regular expression.
* Defaults to empty.

MUST_BREAK_AFTER = <regular expression>
* When set and the regular expression matches the current line, Splunk
  creates a new event for the next input line.
* Splunk may still break before the current line if another rule matches.
* Defaults to empty.

MUST_NOT_BREAK_AFTER = <regular expression>
* When set and the current line matches the regular expression, Splunk does
  not break on any subsequent lines until the MUST_BREAK_AFTER expression
  matches.
* Defaults to empty.

MUST_NOT_BREAK_BEFORE = <regular expression>
* When set and the current line matches the regular expression, Splunk does
  not break the last event before the current line.
* Defaults to empty.

MAX_EVENTS = <integer>
* Specifies the maximum number of input lines to add to any event.
* Splunk breaks after the specified number of lines are read.
* Defaults to 256 (lines).

fletch13 · ‎11-06-2015

Not sure what happened to the "time_prefix" in the question. i doubled checked my props.conf and it is
"TIME_PREFIX = \"occurred\":\s" -- not sure why it pasted it as "TIME_PREFIX = \"occurred\":s" .. So i can assume it's correct with "TIME_PREFIX = \"occurred\":\s"..
I have a elementary understand on how to write regexp to capture data in our other systems (flat files, etc). But not sure how to create an expression on EOF in Splunk.

As for line break the default is "((?!))". <-- this is as negitive lookahead. But it doesn't make sense because there nothing to look back to.

Would something like this be better "(\$(?!}))"

Below is the alerts sent to Splunk
{
"msg": "extended",
"product": "Web MPS",
"version": "7.4.0.254758",
"appliance": "my-fireeye-pri.company.net",
"alert": {
"src": {
"mac": "00:00:00:00:00:00",
"ip": "169.250.0.1",
"host": "IM-testing.fe-notify-examples.com",
"vlan": "0",
"port": "10"
},
"severity": "minr",
"alert-url": "https://127.0.0.1/event_stream/events_for_bot?ev_id=9200&lms_iden=00:24:91:7A:5D:F4",
"explanation": {
"malware-detected": {
"malware": {
"name": "FireEye-TestEvent-SIG-IM",
"stype": "bot-command",
"sid": "30"
}
},
"protocol": "tcp",
"analysis": "content"
},
"occurred": "2015-11-05 20:48:26+00",
"id": "9200",
"action": "notified",
"dst": {
"ip": "127.0.0.20",
"mac": "00:44:44:66:44:BB",
"port": "20"
},
"name": "infection-match"
}
}

jkat54 · ‎11-06-2015

Just make this your time prefix and lets see what happens:

    "occurred"
     or
      occurred

According to the documentation this is where it starts looking for date patterns and it should auto drop the colons and quotes on its quest to find a date on the same line this prefix is found on.

fletch13 · ‎11-06-2015

Do i have to do a splunkd restart everytime i make a change to fireeye addon "props.conf" file.

Change from 
TIME_PREFIX = \"occurred\"\:\s

To 
TIME_PREFIX = occurred

jkat54 · ‎11-07-2015

Good question. I should have mentioned this sooner.

The props changes we're doing affect search extraction. So first thing to note is that some of these changes to props/transforms should be on the search heads and some should be on forwarders / indexers. You can use the same props/transforms in both locations and be fine, but you need these in both locations for sure.

Another thing I didnt mention is that when you change these, sometimes even after a restart it will take a few minutes for the change to start working. You can run this search to help force the reload / update but even it doesnt work all the time.

| extract reload=T

Also this command is the mother of all refreshes

 http[s]://[splunkweb hostname]:[splunkweb port]/debug/refresh

jkat54 · ‎11-07-2015

I generally push props changes, then sit on the server running a search with |extract reload=T for a few minutes until the changes show up.

fletch13 · ‎11-06-2015

Not sure what's going on. But the "\" keep getting removed when i post.
"\" there is one backslash in these quotes. "\" there are two backslashes in these quotes.

jkat54 · ‎11-06-2015

i dont understand the "s" in your time prefix... i guess you're using

     \s
     but its not showing the \ because youre not in code blocks

jkat54 · ‎11-06-2015

use the code blocks 1010101 in the menu bar or press space 5 times on a new line before typing.

jkat54 · ‎11-06-2015

Are the timestamps right on the data... if so then this isnt the problem and you need line breaks instead..... been a while since i did json in splunk .. it wasnt fun..

try this in your props as well.

BREAK_ONLY_BEFORE = {.*"msg"

fletch13 · ‎11-06-2015

Time stamps are correct. And i will add the "break_only_before" as well.

Fireeye json truncating lines

AppDynamics Summer Webinars

SOCin’ it to you at Splunk University

Credit Card Data Protection & PCI Compliance with Splunk Edge Processor

Are you a member of the Splunk Community?

Fireeye json truncating lines

AppDynamics Summer Webinars

SOCin’ it to you at Splunk University

Credit Card Data Protection & PCI Compliance with Splunk Edge Processor