Splunk Enterprise Security

Complex RegEx Capturing Group Assistance

draracle
Engager

Complex RegEx Capturing Group Assistance

I have a couple similar cases where I am struggling to get the desired fields extracted with RegEx capturing groups. Please take a look at both cases and share your wisdom.

Thanks!

CASE #1
I am looking for some RegEx help to capture the USERID from logsources where the USERID may be DOMAIN/USERID or just USERID. I do not want to capture 'DOMAIN/'. This way the Field Extractions will not have two different versions of the user ID.

Sample (loginID=s.buttercup-shopping.com/bcs234):

Jan 1 01:1:10 10.10.10.10 CEF:0|Proxy1|Something|1.4.0|121|Transaction permitted|1| act=permitted app=http dvc=10.10.10.10 dst=1.2.3.4 dhost=host.buttercup-games.com dpt=80 src=10.20.30.40 spt=19491 suser=LDAP://usldap.s.buttercup-shopping.com OU\=
City,OU\=Country,OU\=Users,OU\=Region,DC\=s,DC\=buttercup-shopping,DC\=com/FirstName LastName loginID=s.buttercup-shopping.com/bcs234 destinationTranslatedPort=<redacted>

Sample (loginID=bcs234):

Jan 1 09:1:10 10.10.10.10 CEF:0|Proxy2|Something|2.8.0|121|Transaction permitted|1| act=permitted app=http dvc=10.10.10.10 dst=1.2.3.4 dhost=host.buttercup-games.com dpt=80 src=10.20.30.40 spt=19491 suser=LDAP://usldap.s.buttercup-shopping.com OU\=
City,OU\=Country,OU\=Users,OU\=Region,DC\=s,DC\=buttercup-shopping,DC\=com/FirstName LastName loginID=bcs234 destinationTranslatedPort=<redacted>

Desired Field Extraction:

loginID=bcs234 

Progress:
RegEx:

loginID=(?P<userid>.*)(?= destination)

The following RegEx seems to work outside of Splunk but Splunk does not support using the capturing group (e.g. (?P) state over and over again (where the (.*) reside).
RegEx:

(?<=\.com\/)(.*)(?= destination)|(?<=\.corp\/)(.*)(?= destination)|(?<=loginID=)([A-Za-z0-9_-]{1,})(?= destination)

CASE #2

I was trying to capture the domain and IP addresses from 3 similar logs.

The below Field Extractions worked for the most part but I still needed a sed statement to remove a '.' since both scenarios with a '.' matched. It seems that when there's are more than two cases for a match that getting the capturing groups right is fairly difficult or even impossible.

Sample (email address + '.' + ' ')

relay=user@buttercup-games.com. [1.1.1.1]

Sample (email address + ' ')

relay=user@buttercup-games.com [1.1.1.1]

Sample (email address + '.')

relay=user@buttercup-games.com.[1.1.1.1]

FIELD EXTRACTIONS

relay=(?P<dest_domain>.*)(?=(\.[\[\s])|(\s\[))
^(?:[^\[\n]*\[){2}(?P<dest_ip>[^\]]+)

SED

| rex field=dest_domain mode=sed "s/\.$//g"
0 Karma
1 Solution

FrankVl
Ultra Champion

For the first case that can be solved by adding a non-capturing group for the part you want to ignore, and require that group to occur 0 or 1 times (?):

loginID=(?:[^\/]+\/)?(?<userid>\S*)

https://regex101.com/r/DO74m7/1

Second case (trick is to end the capturing group for the domain with a \w, to prevent it from grabbing the .):

relay=(?<dest_domain>.*\w+)[\.\s]+\[(?<dest_ip>[^\]]+)

https://regex101.com/r/yjTluC/1

View solution in original post

FrankVl
Ultra Champion

For the first case that can be solved by adding a non-capturing group for the part you want to ignore, and require that group to occur 0 or 1 times (?):

loginID=(?:[^\/]+\/)?(?<userid>\S*)

https://regex101.com/r/DO74m7/1

Second case (trick is to end the capturing group for the domain with a \w, to prevent it from grabbing the .):

relay=(?<dest_domain>.*\w+)[\.\s]+\[(?<dest_ip>[^\]]+)

https://regex101.com/r/yjTluC/1

draracle
Engager

Thank you! The second one worked flawlessly. The first one is not picking up logs where the domain is missing, such as below or simply: loginid=userid. What is being matched in these cases is 'xml' from text/xml. Is there still hope? Thanks in advanced!

 Jan 1 09:35:37 10.10.10.10 CEF:0|Appliance|Security|8.4.0|121|Transaction permitted|1| act=permitted app=http dvc=10.10.10.10 dst=1.1.1.1 dhost=dict.buttercup-shopping.com dpt=80 src=10.20.30.40 spt=20912 suser=LDAP://usldap.s.buttercup-games.com OU\=J,OU\=C,OU\=Users,OU\=A,DC\=s,DC\=buttercup-games,DC\=com/FirstName LastName loginID=bcs234 destinationTranslatedPort=28213 rt=1529393737 in=395 out=848 requestMethod=GET requestClientApplication=buttercup-shopping Desktop Dict (Windows NT 6.1) reason=- cs1Label=Policy cs1=Super Administrator**Domain Base,Super Administrator**s Default cs2Label=DynCat cs2=0 cs3Label=ContentType cs3=text/xml; charset\=utf-8 cn1Label=DispositionCode cn1=1026 cn2Label=ScanDuration cn2=0 request=http://site.com/fsearch?keyfrom\=sdf.setqw.cd.http.0&q\=%20N&pos\=1&doctype\=xml&xmlVersion\=3.2&dogVersion\=1.0&client\=deskdict&id\=0ef47d7cdd3941d96&vendor\=qiang.buttercup-shopping&in\=buttercup-shoppingDictFull&appVer\=6.3.69.8341&appZengqiang\=1&abTest\=8&le\=eng&scradv\=1&wstate\=yes&LTH\=890&LWH\=0&LSDH\=-1&proc\=some.exe&headTxt\=2B05
0 Karma

FrankVl
Ultra Champion

Problem is that there is a / somewhere down the line, that causes my regex to look in the wrong place.

This should fix that (added a \s to prevent it from reading beyond whitespace):

loginID=(?:[^\/\s]+\/)?(?<userid>\S*)
0 Karma

draracle
Engager

That worked! You are a true RegEx genius! Thank you very much!

0 Karma
Get Updates on the Splunk Community!

Get Inspired! We’ve Got Validation that Your Hard Work is Paying Off

We love our Splunk Community and want you to feel inspired by all your hard work! Eric Fusilero, our VP of ...

What's New in Splunk Enterprise 9.4: Features to Power Your Digital Resilience

Hey Splunky People! We are excited to share the latest updates in Splunk Enterprise 9.4. In this release we ...

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...