Splunk Search

Creating Custom Field Extractions

snix
Communicator

I am trying to add some field extractions for a log file created by Entrust IdentityGurard authentication solution. Currently when I read it in I read it with a SourceType of log4j as the application outlines it formats the logs in. Things look okay but the fields specific to the log are not being extracted. I am looking into how I can build a custom extraction myself because I have always wanted to learn how it works but figured I would also post the question here to get some tips and best practices.

Here is an example of one event in the log file:
[2020-03-29 18:37:51,020] [IG Audit Writer] [INFO ] [IG.AUDIT] [AUD6012] [UserNameHere] EventMessageHere

Basically, all the fields I want are wrapped in square brackets [] and the message itself is just added at the end with no square brackets.

I think I will have to build out my own custom SourceType in the SplunkHome\etc\system\local\props.conf that will just be a copy of the log4j stanza but with either a REPORT key that references a corresponding extraction in the transforms.conf file or use the EXTRACT key and put it in there using regex. Am I on the right path?

0 Karma
1 Solution

darrenfuller
Contributor

You are absolutely on the right path.

Your sourcetype definition in props.conf would look something like this:

[SOURCETYPENAME]
disabled = false
LINE_BREAKER = [\r\n]+                             # = Break on every line
SHOULD_LINEMERGE = false                           # = Use basic line break detection
TIME_PREFIX = ^\[                                  # = what comes before the timestamp
TIME_FORMAT = %Y-%m-%d %H:%M:%S,%3N                # = strftime representation of the timestamp
MAX_TIMESTAMP_LOOKAHEAD = 25                       # = stop looking for timestamp after 25 chars

EXTRACT-01-Fields = ^\[[^\]]+\]\s+\[(?<firstfieldname>[^\]]+)\]\s+\[(?<secondfieldname>[^\]]+)\]\s+\[(?<thirdfieldname>[^\]]+)\]\s+\[(?<fourthfieldname>[^\]]+)\]\s+\[(?<username>[^\]]+)\]\s+(?<message>.+)$

Here is how the regex looks in regex101: https://regex101.com/r/LOwRwN/1

Hope this helps..

./D

View solution in original post

0 Karma

darrenfuller
Contributor

You are absolutely on the right path.

Your sourcetype definition in props.conf would look something like this:

[SOURCETYPENAME]
disabled = false
LINE_BREAKER = [\r\n]+                             # = Break on every line
SHOULD_LINEMERGE = false                           # = Use basic line break detection
TIME_PREFIX = ^\[                                  # = what comes before the timestamp
TIME_FORMAT = %Y-%m-%d %H:%M:%S,%3N                # = strftime representation of the timestamp
MAX_TIMESTAMP_LOOKAHEAD = 25                       # = stop looking for timestamp after 25 chars

EXTRACT-01-Fields = ^\[[^\]]+\]\s+\[(?<firstfieldname>[^\]]+)\]\s+\[(?<secondfieldname>[^\]]+)\]\s+\[(?<thirdfieldname>[^\]]+)\]\s+\[(?<fourthfieldname>[^\]]+)\]\s+\[(?<username>[^\]]+)\]\s+(?<message>.+)$

Here is how the regex looks in regex101: https://regex101.com/r/LOwRwN/1

Hope this helps..

./D

View solution in original post

0 Karma

snix
Communicator

Holy you know what... That is exactly what I am looking for. Thank you for such a great and specific example! You even built out how to pull in the time from the logs which I had no idea how to do but was going to be the next part to figure out.

I was able to implemented it and verify it works exactly how I wanted.
Thank you!!!!

0 Karma

snix
Communicator

After looking closer at it I did find most of the events contained a combination of multiple events into one event. Not sure why because I would think what you have would work. I don't pretend to understand much about return carriages and new lines in the little amount of programing I have to deal with but it looked good.

I took some of the output from the log file and pasted it into Notpad++ and did a show of all characters and it showed CR LF at the end of each line so that looks good to me.

That said I commented out the LINE_BREAKER line and replaced it with "BREAK_ONLY_BEFORE = \d\d?:\d\d:\d\d" which I found under the log4j stanza and it worked. Since I don't grasp 100% what I am doing I am sure this is not the best way to do it but it did get the results I was looking for.

If someone understands what is going on and would like to explain it I am all ears. I think this will end up being a good post in general for others trying to do something similar and just needs a useful example of what it would look like.

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!