Complex RegEx Capturing Group Assistance
I have a couple similar cases where I am struggling to get the desired fields extracted with RegEx capturing groups. Please take a look at both cases and share your wisdom.
Thanks!
CASE #1
I am looking for some RegEx help to capture the USERID from logsources where the USERID may be DOMAIN/USERID or just USERID. I do not want to capture 'DOMAIN/'. This way the Field Extractions will not have two different versions of the user ID.
Sample (loginID=s.buttercup-shopping.com/bcs234):
Jan 1 01:1:10 10.10.10.10 CEF:0|Proxy1|Something|1.4.0|121|Transaction permitted|1| act=permitted app=http dvc=10.10.10.10 dst=1.2.3.4 dhost=host.buttercup-games.com dpt=80 src=10.20.30.40 spt=19491 suser=LDAP://usldap.s.buttercup-shopping.com OU\=
City,OU\=Country,OU\=Users,OU\=Region,DC\=s,DC\=buttercup-shopping,DC\=com/FirstName LastName loginID=s.buttercup-shopping.com/bcs234 destinationTranslatedPort=<redacted>
Sample (loginID=bcs234):
Jan 1 09:1:10 10.10.10.10 CEF:0|Proxy2|Something|2.8.0|121|Transaction permitted|1| act=permitted app=http dvc=10.10.10.10 dst=1.2.3.4 dhost=host.buttercup-games.com dpt=80 src=10.20.30.40 spt=19491 suser=LDAP://usldap.s.buttercup-shopping.com OU\=
City,OU\=Country,OU\=Users,OU\=Region,DC\=s,DC\=buttercup-shopping,DC\=com/FirstName LastName loginID=bcs234 destinationTranslatedPort=<redacted>
Desired Field Extraction:
loginID=bcs234
Progress:
RegEx:
loginID=(?P<userid>.*)(?= destination)
The following RegEx seems to work outside of Splunk but Splunk does not support using the capturing group (e.g. (?P) state over and over again (where the (.*) reside).
RegEx:
(?<=\.com\/)(.*)(?= destination)|(?<=\.corp\/)(.*)(?= destination)|(?<=loginID=)([A-Za-z0-9_-]{1,})(?= destination)
CASE #2
I was trying to capture the domain and IP addresses from 3 similar logs.
The below Field Extractions worked for the most part but I still needed a sed statement to remove a '.' since both scenarios with a '.' matched. It seems that when there's are more than two cases for a match that getting the capturing groups right is fairly difficult or even impossible.
Sample (email address + '.' + ' ')
relay=user@buttercup-games.com. [1.1.1.1]
Sample (email address + ' ')
relay=user@buttercup-games.com [1.1.1.1]
Sample (email address + '.')
relay=user@buttercup-games.com.[1.1.1.1]
FIELD EXTRACTIONS
relay=(?P<dest_domain>.*)(?=(\.[\[\s])|(\s\[))
^(?:[^\[\n]*\[){2}(?P<dest_ip>[^\]]+)
SED
| rex field=dest_domain mode=sed "s/\.$//g"
... View more