<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Complex RegEx Capturing Group Assistance in Splunk Enterprise Security</title>
    <link>https://community.splunk.com/t5/Splunk-Enterprise-Security/Complex-RegEx-Capturing-Group-Assistance/m-p/386889#M4073</link>
    <description>&lt;P&gt;Complex RegEx Capturing Group Assistance&lt;/P&gt;

&lt;P&gt;I have a couple similar cases where I am struggling to get the desired fields extracted with RegEx capturing groups.  Please take a look at both cases and share your wisdom.  &lt;/P&gt;

&lt;P&gt;Thanks!  &lt;/P&gt;

&lt;P&gt;CASE #1&lt;BR /&gt;
I am looking for some RegEx help to capture the USERID from logsources where the USERID may be DOMAIN/USERID or just USERID.  I do not want to capture 'DOMAIN/'.  This way the Field Extractions will not have two different versions of the user ID.&lt;/P&gt;

&lt;P&gt;Sample (loginID=s.buttercup-shopping.com/bcs234):&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;Jan 1 01:1:10 10.10.10.10 CEF:0|Proxy1|Something|1.4.0|121|Transaction permitted|1| act=permitted app=http dvc=10.10.10.10 dst=1.2.3.4 dhost=host.buttercup-games.com dpt=80 src=10.20.30.40 spt=19491 suser=LDAP://usldap.s.buttercup-shopping.com OU\=
City,OU\=Country,OU\=Users,OU\=Region,DC\=s,DC\=buttercup-shopping,DC\=com/FirstName LastName loginID=s.buttercup-shopping.com/bcs234 destinationTranslatedPort=&amp;lt;redacted&amp;gt;
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Sample (loginID=bcs234):&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;Jan 1 09:1:10 10.10.10.10 CEF:0|Proxy2|Something|2.8.0|121|Transaction permitted|1| act=permitted app=http dvc=10.10.10.10 dst=1.2.3.4 dhost=host.buttercup-games.com dpt=80 src=10.20.30.40 spt=19491 suser=LDAP://usldap.s.buttercup-shopping.com OU\=
City,OU\=Country,OU\=Users,OU\=Region,DC\=s,DC\=buttercup-shopping,DC\=com/FirstName LastName loginID=bcs234 destinationTranslatedPort=&amp;lt;redacted&amp;gt;
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Desired Field Extraction:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;loginID=bcs234 
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Progress:&lt;BR /&gt;
RegEx:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;loginID=(?P&amp;lt;userid&amp;gt;.*)(?= destination)
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;The following RegEx seems to work outside of Splunk but Splunk does not support using the capturing group (e.g. (?P) state over and over again (where the (.*) reside). &lt;BR /&gt;
RegEx:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;(?&amp;lt;=\.com\/)(.*)(?= destination)|(?&amp;lt;=\.corp\/)(.*)(?= destination)|(?&amp;lt;=loginID=)([A-Za-z0-9_-]{1,})(?= destination)
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;CASE #2&lt;/P&gt;

&lt;P&gt;I was trying to capture the domain and IP addresses from 3 similar logs.&lt;/P&gt;

&lt;P&gt;The below Field Extractions worked for the most part but I still needed a sed statement to remove a '.' since both scenarios with a '.' matched.  It seems that when there's are more than two cases for a match that getting the capturing groups right is fairly difficult or even impossible.&lt;/P&gt;

&lt;P&gt;Sample (email address + '.' + ' ') &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;relay=user@buttercup-games.com. [1.1.1.1]
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Sample (email address + ' ') &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;relay=user@buttercup-games.com [1.1.1.1]
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Sample (email address + '.') &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;relay=user@buttercup-games.com.[1.1.1.1]
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;FIELD EXTRACTIONS&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;relay=(?P&amp;lt;dest_domain&amp;gt;.*)(?=(\.[\[\s])|(\s\[))
^(?:[^\[\n]*\[){2}(?P&amp;lt;dest_ip&amp;gt;[^\]]+)
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;SED&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| rex field=dest_domain mode=sed "s/\.$//g"
&lt;/CODE&gt;&lt;/PRE&gt;</description>
    <pubDate>Mon, 18 Jun 2018 14:41:32 GMT</pubDate>
    <dc:creator>draracle</dc:creator>
    <dc:date>2018-06-18T14:41:32Z</dc:date>
    <item>
      <title>Complex RegEx Capturing Group Assistance</title>
      <link>https://community.splunk.com/t5/Splunk-Enterprise-Security/Complex-RegEx-Capturing-Group-Assistance/m-p/386889#M4073</link>
      <description>&lt;P&gt;Complex RegEx Capturing Group Assistance&lt;/P&gt;

&lt;P&gt;I have a couple similar cases where I am struggling to get the desired fields extracted with RegEx capturing groups.  Please take a look at both cases and share your wisdom.  &lt;/P&gt;

&lt;P&gt;Thanks!  &lt;/P&gt;

&lt;P&gt;CASE #1&lt;BR /&gt;
I am looking for some RegEx help to capture the USERID from logsources where the USERID may be DOMAIN/USERID or just USERID.  I do not want to capture 'DOMAIN/'.  This way the Field Extractions will not have two different versions of the user ID.&lt;/P&gt;

&lt;P&gt;Sample (loginID=s.buttercup-shopping.com/bcs234):&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;Jan 1 01:1:10 10.10.10.10 CEF:0|Proxy1|Something|1.4.0|121|Transaction permitted|1| act=permitted app=http dvc=10.10.10.10 dst=1.2.3.4 dhost=host.buttercup-games.com dpt=80 src=10.20.30.40 spt=19491 suser=LDAP://usldap.s.buttercup-shopping.com OU\=
City,OU\=Country,OU\=Users,OU\=Region,DC\=s,DC\=buttercup-shopping,DC\=com/FirstName LastName loginID=s.buttercup-shopping.com/bcs234 destinationTranslatedPort=&amp;lt;redacted&amp;gt;
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Sample (loginID=bcs234):&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;Jan 1 09:1:10 10.10.10.10 CEF:0|Proxy2|Something|2.8.0|121|Transaction permitted|1| act=permitted app=http dvc=10.10.10.10 dst=1.2.3.4 dhost=host.buttercup-games.com dpt=80 src=10.20.30.40 spt=19491 suser=LDAP://usldap.s.buttercup-shopping.com OU\=
City,OU\=Country,OU\=Users,OU\=Region,DC\=s,DC\=buttercup-shopping,DC\=com/FirstName LastName loginID=bcs234 destinationTranslatedPort=&amp;lt;redacted&amp;gt;
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Desired Field Extraction:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;loginID=bcs234 
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Progress:&lt;BR /&gt;
RegEx:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;loginID=(?P&amp;lt;userid&amp;gt;.*)(?= destination)
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;The following RegEx seems to work outside of Splunk but Splunk does not support using the capturing group (e.g. (?P) state over and over again (where the (.*) reside). &lt;BR /&gt;
RegEx:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;(?&amp;lt;=\.com\/)(.*)(?= destination)|(?&amp;lt;=\.corp\/)(.*)(?= destination)|(?&amp;lt;=loginID=)([A-Za-z0-9_-]{1,})(?= destination)
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;CASE #2&lt;/P&gt;

&lt;P&gt;I was trying to capture the domain and IP addresses from 3 similar logs.&lt;/P&gt;

&lt;P&gt;The below Field Extractions worked for the most part but I still needed a sed statement to remove a '.' since both scenarios with a '.' matched.  It seems that when there's are more than two cases for a match that getting the capturing groups right is fairly difficult or even impossible.&lt;/P&gt;

&lt;P&gt;Sample (email address + '.' + ' ') &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;relay=user@buttercup-games.com. [1.1.1.1]
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Sample (email address + ' ') &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;relay=user@buttercup-games.com [1.1.1.1]
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Sample (email address + '.') &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;relay=user@buttercup-games.com.[1.1.1.1]
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;FIELD EXTRACTIONS&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;relay=(?P&amp;lt;dest_domain&amp;gt;.*)(?=(\.[\[\s])|(\s\[))
^(?:[^\[\n]*\[){2}(?P&amp;lt;dest_ip&amp;gt;[^\]]+)
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;SED&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| rex field=dest_domain mode=sed "s/\.$//g"
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 18 Jun 2018 14:41:32 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Enterprise-Security/Complex-RegEx-Capturing-Group-Assistance/m-p/386889#M4073</guid>
      <dc:creator>draracle</dc:creator>
      <dc:date>2018-06-18T14:41:32Z</dc:date>
    </item>
    <item>
      <title>Re: Complex RegEx Capturing Group Assistance</title>
      <link>https://community.splunk.com/t5/Splunk-Enterprise-Security/Complex-RegEx-Capturing-Group-Assistance/m-p/386890#M4074</link>
      <description>&lt;P&gt;For the first case that can be solved by adding a non-capturing group for the part you want to ignore, and require that group to occur 0 or 1 times (?):&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;loginID=(?:[^\/]+\/)?(?&amp;lt;userid&amp;gt;\S*)
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;&lt;A href="https://regex101.com/r/DO74m7/1"&gt;https://regex101.com/r/DO74m7/1&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Second case (trick is to end the capturing group for the domain with a \w, to prevent it from grabbing the .):&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;relay=(?&amp;lt;dest_domain&amp;gt;.*\w+)[\.\s]+\[(?&amp;lt;dest_ip&amp;gt;[^\]]+)
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;&lt;A href="https://regex101.com/r/yjTluC/1"&gt;https://regex101.com/r/yjTluC/1&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 18 Jun 2018 14:53:00 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Enterprise-Security/Complex-RegEx-Capturing-Group-Assistance/m-p/386890#M4074</guid>
      <dc:creator>FrankVl</dc:creator>
      <dc:date>2018-06-18T14:53:00Z</dc:date>
    </item>
    <item>
      <title>Re: Complex RegEx Capturing Group Assistance</title>
      <link>https://community.splunk.com/t5/Splunk-Enterprise-Security/Complex-RegEx-Capturing-Group-Assistance/m-p/386891#M4075</link>
      <description>&lt;P&gt;Thank you!  The second one worked flawlessly.  The first one is not picking up logs where the domain is missing, such as below or simply: loginid=userid.  What is being matched in these cases is 'xml' from text/xml. Is there still hope?  Thanks in advanced!&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt; Jan 1 09:35:37 10.10.10.10 CEF:0|Appliance|Security|8.4.0|121|Transaction permitted|1| act=permitted app=http dvc=10.10.10.10 dst=1.1.1.1 dhost=dict.buttercup-shopping.com dpt=80 src=10.20.30.40 spt=20912 suser=LDAP://usldap.s.buttercup-games.com OU\=J,OU\=C,OU\=Users,OU\=A,DC\=s,DC\=buttercup-games,DC\=com/FirstName LastName loginID=bcs234 destinationTranslatedPort=28213 rt=1529393737 in=395 out=848 requestMethod=GET requestClientApplication=buttercup-shopping Desktop Dict (Windows NT 6.1) reason=- cs1Label=Policy cs1=Super Administrator**Domain Base,Super Administrator**s Default cs2Label=DynCat cs2=0 cs3Label=ContentType cs3=text/xml; charset\=utf-8 cn1Label=DispositionCode cn1=1026 cn2Label=ScanDuration cn2=0 request=http://site.com/fsearch?keyfrom\=sdf.setqw.cd.http.0&amp;amp;q\=%20N&amp;amp;pos\=1&amp;amp;doctype\=xml&amp;amp;xmlVersion\=3.2&amp;amp;dogVersion\=1.0&amp;amp;client\=deskdict&amp;amp;id\=0ef47d7cdd3941d96&amp;amp;vendor\=qiang.buttercup-shopping&amp;amp;in\=buttercup-shoppingDictFull&amp;amp;appVer\=6.3.69.8341&amp;amp;appZengqiang\=1&amp;amp;abTest\=8&amp;amp;le\=eng&amp;amp;scradv\=1&amp;amp;wstate\=yes&amp;amp;LTH\=890&amp;amp;LWH\=0&amp;amp;LSDH\=-1&amp;amp;proc\=some.exe&amp;amp;headTxt\=2B05
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 19 Jun 2018 18:46:32 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Enterprise-Security/Complex-RegEx-Capturing-Group-Assistance/m-p/386891#M4075</guid>
      <dc:creator>draracle</dc:creator>
      <dc:date>2018-06-19T18:46:32Z</dc:date>
    </item>
    <item>
      <title>Re: Complex RegEx Capturing Group Assistance</title>
      <link>https://community.splunk.com/t5/Splunk-Enterprise-Security/Complex-RegEx-Capturing-Group-Assistance/m-p/386892#M4076</link>
      <description>&lt;P&gt;Problem is that there is a &lt;CODE&gt;/&lt;/CODE&gt; somewhere down the line, that causes my regex to look in the wrong place.&lt;/P&gt;

&lt;P&gt;This should fix that (added a &lt;CODE&gt;\s&lt;/CODE&gt; to prevent it from reading beyond whitespace):&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;loginID=(?:[^\/\s]+\/)?(?&amp;lt;userid&amp;gt;\S*)
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 20 Jun 2018 07:38:12 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Enterprise-Security/Complex-RegEx-Capturing-Group-Assistance/m-p/386892#M4076</guid>
      <dc:creator>FrankVl</dc:creator>
      <dc:date>2018-06-20T07:38:12Z</dc:date>
    </item>
    <item>
      <title>Re: Complex RegEx Capturing Group Assistance</title>
      <link>https://community.splunk.com/t5/Splunk-Enterprise-Security/Complex-RegEx-Capturing-Group-Assistance/m-p/386893#M4077</link>
      <description>&lt;P&gt;That worked!  You are a true RegEx genius!  Thank you very much!&lt;/P&gt;</description>
      <pubDate>Thu, 21 Jun 2018 12:53:45 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Enterprise-Security/Complex-RegEx-Capturing-Group-Assistance/m-p/386893#M4077</guid>
      <dc:creator>draracle</dc:creator>
      <dc:date>2018-06-21T12:53:45Z</dc:date>
    </item>
  </channel>
</rss>

