I am trying to add some field extractions for a log file created by Entrust IdentityGurard authentication solution. Currently when I read it in I read it with a SourceType of log4j as the application outlines it formats the logs in. Things look okay but the fields specific to the log are not being extracted. I am looking into how I can build a custom extraction myself because I have always wanted to learn how it works but figured I would also post the question here to get some tips and best practices.
Here is an example of one event in the log file:
[2020-03-29 18:37:51,020] [IG Audit Writer] [INFO ] [IG.AUDIT] [AUD6012] [UserNameHere] EventMessageHere
Basically, all the fields I want are wrapped in square brackets [] and the message itself is just added at the end with no square brackets.
I think I will have to build out my own custom SourceType in the SplunkHome\etc\system\local\props.conf that will just be a copy of the log4j stanza but with either a REPORT key that references a corresponding extraction in the transforms.conf file or use the EXTRACT key and put it in there using regex. Am I on the right path?
You are absolutely on the right path.
Your sourcetype definition in props.conf would look something like this:
[SOURCETYPENAME]
disabled = false
LINE_BREAKER = [\r\n]+ # = Break on every line
SHOULD_LINEMERGE = false # = Use basic line break detection
TIME_PREFIX = ^\[ # = what comes before the timestamp
TIME_FORMAT = %Y-%m-%d %H:%M:%S,%3N # = strftime representation of the timestamp
MAX_TIMESTAMP_LOOKAHEAD = 25 # = stop looking for timestamp after 25 chars
EXTRACT-01-Fields = ^\[[^\]]+\]\s+\[(?<firstfieldname>[^\]]+)\]\s+\[(?<secondfieldname>[^\]]+)\]\s+\[(?<thirdfieldname>[^\]]+)\]\s+\[(?<fourthfieldname>[^\]]+)\]\s+\[(?<username>[^\]]+)\]\s+(?<message>.+)$
Here is how the regex looks in regex101: https://regex101.com/r/LOwRwN/1
Hope this helps..
./D
You are absolutely on the right path.
Your sourcetype definition in props.conf would look something like this:
[SOURCETYPENAME]
disabled = false
LINE_BREAKER = [\r\n]+ # = Break on every line
SHOULD_LINEMERGE = false # = Use basic line break detection
TIME_PREFIX = ^\[ # = what comes before the timestamp
TIME_FORMAT = %Y-%m-%d %H:%M:%S,%3N # = strftime representation of the timestamp
MAX_TIMESTAMP_LOOKAHEAD = 25 # = stop looking for timestamp after 25 chars
EXTRACT-01-Fields = ^\[[^\]]+\]\s+\[(?<firstfieldname>[^\]]+)\]\s+\[(?<secondfieldname>[^\]]+)\]\s+\[(?<thirdfieldname>[^\]]+)\]\s+\[(?<fourthfieldname>[^\]]+)\]\s+\[(?<username>[^\]]+)\]\s+(?<message>.+)$
Here is how the regex looks in regex101: https://regex101.com/r/LOwRwN/1
Hope this helps..
./D
Holy you know what... That is exactly what I am looking for. Thank you for such a great and specific example! You even built out how to pull in the time from the logs which I had no idea how to do but was going to be the next part to figure out.
I was able to implemented it and verify it works exactly how I wanted.
Thank you!!!!
After looking closer at it I did find most of the events contained a combination of multiple events into one event. Not sure why because I would think what you have would work. I don't pretend to understand much about return carriages and new lines in the little amount of programing I have to deal with but it looked good.
I took some of the output from the log file and pasted it into Notpad++ and did a show of all characters and it showed CR LF at the end of each line so that looks good to me.
That said I commented out the LINE_BREAKER line and replaced it with "BREAK_ONLY_BEFORE = \d\d?:\d\d:\d\d" which I found under the log4j stanza and it worked. Since I don't grasp 100% what I am doing I am sure this is not the best way to do it but it did get the results I was looking for.
If someone understands what is going on and would like to explain it I am all ears. I think this will end up being a good post in general for others trying to do something similar and just needs a useful example of what it would look like.