I have a regular expression that works on part of my data.
Given the log entry:
pam_vas: Authentication <succeeded> for <active directory> user: <bobtheperson> account: <bobtheperson@com.com> reason: <N/A> Access cont(upn): <bob>
i can use the regular expression: [\>\:]*\s+(.*?)\:?\s\<(.+?)\>
and get the result I am looking for. (http://regexr.com/3fatg)
Authentication = succeeded
for = active directory
user = bobtheperson
account = bobtheperson@com.com
reason = N/A
Access cont(upn) = bob
Unfortunately, when I was building this regular expression, I was ignoring a vital part of the log -- the first part.
The log actually looks like this:
Feb 16 20:04:37 hostname su[1111]: [id 123456 auth.info] pam_vas: Authentication <succeeded> for <active directory> user: <bobtheperson> account: <bobtheperson@com.com> reason: <N/A> Access cont(upn): <bob>
My extraction no longer works right -- it is thrown off by the first part. (http://regexr.com/3fbod)
How would I exclude the beginning information from this log file?
**Feb 16 20:04:37 hostname su[1111]: [id 123456 auth.info]** pam_vas: Authentication <succeeded> for <active directory> user: <bobtheperson> account: <bobtheperson@com.com> reason: <N/A> Access cont(upn): <bob>
I think I need to start my search after the last occurrence of a ] (right before pam_vas) but I cant figure out how to exclude that.
Looks like this might work
\s+(?<key1>[^\:\<\>]+)(?:\:?\s\<)(?<value1>[^\>]+)\>
on that site (regexr.com), it would be like this
\s+([^\:\<\>]+)(?:\:?\s\<)([^\>]+)\>
If the only thing that you must do is to skip past all end-square-brackets ( ]
), then you need a leading postitive-lookahead
that specifies that everything until the end must contain anything EXCEPT that character. Try this RegEx:
(?=[^\]]*$)[\>\:]*\s+(.*?)\:?\s\<(.+?)\>
This one works as well, except for capturing the pam_vas text. But you call that out, of course, as it does not fit the others and the ] standard item. It is an odd variable.
So does this provide the solution or not?
It does -- im not quite sure what to when more than one valid answer happens, though.
I can only accept one.
IMHO, you should always up-vote correct answers and then select the BEST one by clicking Accept
.
Upvoted! I will keep that in mind for next time.
Looks like this might work
\s+(?<key1>[^\:\<\>]+)(?:\:?\s\<)(?<value1>[^\>]+)\>
on that site (regexr.com), it would be like this
\s+([^\:\<\>]+)(?:\:?\s\<)([^\>]+)\>
I designed this regex by building up from the right. The only decent /clear/permanent boundary was the value in angle brackets, so I started with
\<(?<value1>[^\>]+)\>
That works to grab anything in angle brackets, NOT including other angle brackets.
Then I wanted to extend back to grab the colon, if any, getting this
(?:\:?\s\<)(?<value1>[^\>]+)\>
The "?:" was because we didn't want to capture that group, but I really wanted to think of it as a group.
Next, we had to deal with that last key field -- "Access cont(upn)" -- having a space and parenthesis in it.
Reviewing the rest of the key fields, I ended up deciding that the characters really could be anything but a colon or an angle bracket, getting this.
(?<key1>[^\:\<\>]+)(?:\:?\s\<)(?<value1>[^\>]+)\>
Note that excluding the colon also made sure that pam_vas would not be grabbed. That regex was grabbing everything I wanted, but also grabbing one space before the key field. So the final version became this.
\s+(?<key1>[^\:\<\>]+)(?:\:?\s\<)(?<value1>[^\>]+)\>
I paste these into a site that explains the entire regex to me, and it just overwhelms me with what people can do with this tool. This looks like it will work -- testing now.
Given a solaris BSM authentication log, I was able to extract key/value pairs using the following:
transforms.conf
[MyKVP]
REGEX = \s+([^\:\<\>]+)(?:\:?\s\<)([^\>]+)\>
FORMAT = $1::$2
props.conf
[sol_bsm] #my sourcetype in this test
REPORT-MyKVP = MyKVP
Thank you!
You are quite welcome.
Yes, a complex regex still often looks like gobbledy gook to me, and understanding what changes need to be made to use it in a .conf file instead of in a search is an adventure. This was a chance to explore positive and negative lookaheads, but I ended up not requiring them to meet your needs.
Can you provide a couple more examples? Specifically, do all relevant log events contain "pam_vas:" or are there other items that potentially appear there?
At this time, all logs seem to have pam_vas. I cant be completely sure they will all have it though.
The format DOES seem consistent -- the "] " (not "] [") seems to be a good breaker (right before pam_vas)
I ended up focusing in on the angle brackets as the only "fixed" item, and from there it expanded pretty easily to what you needed.
Also, are the fields name and their order always the same (authentication, user account etc)?
Although the format stays the same, the log content may change. i need the "pairs" to be generic.
Edit: Looking at more logs, they all seem to be the same. I do hate to hard-code the key though, just in case things update.