Splunk Search

Regex/Rex - Non Capture Groups

Curlyshrew
Observer

Hi all. New here.

 

So I have been working with some data strings that contain varied asset numbers for computers and servers.

As unfortunately our naming conventions are all over the place for a small number of assets as well as the server asset names being considerably different from endpoint PC's, I have been left with using the following to capture the sequence:

(?:Computer Name:)(.*)(?:,)                                  -- (Yucky Wildcard)

For Splunk, this would be:

| rex "(?<Computer>(?:Computer Name:)(.*?)(?:,))"

 

The dilemma is that the non-capture group (?:Computer Name) is being captured in the results.

 

I am unsure but I assume it is due to the first capture group "(?<Computer>)

 

From my little experience with playing with rex, I do know that non-capture groups work in-front of a capture group but I have had no success in having them before a capture group.

 

Thanks for listening to my TED talk

Labels (2)
0 Karma

_gkollias
Builder

Could you please provide a small dummy-sample of your data to review and. test the regex? 

On a side note, you may want to normalize the naming conventions.   This can be done at the source (maybe?), at index time, or at search time (e.g. create a lookup that outputs friendly names for server assets  and endpoint PCs, or use the coalesce function of the eval command).  

Thanks!

0 Karma

Curlyshrew
Observer

Apologies on the delay

 

Here's some example raw data that I am working with. Again the computer names change in length, consistency per result

 

I have marked the fields in bold that I am attempting to extract.

 

2020-07-04 21:36:33,Compressed File,IP Address: 192.168.1.1,Computer name: GHCC01SFG435,Source: Scheduled scan,Risk name: Heur.AdvML.B,Occurrences: 1,File path: T:\Tower\Installer\fgg5cfef.msi,Description: Still contains 1 infected items,Actual action: Quarantined,Requested action: Quarantined,Secondary action: Left alone,Event time: 2020-07-04 21:33:58,Event Insert Time: 2020-07-04 21:36:33,End Time: 2020-07-04 21:33:58,Last update time: 2020-07-04 21:36:33,Domain Name: Default,Group Name: My Company\HODW - Server\HODW - Development,Server Name: FGTY1ADA02,User Name: SYSTEM,Source Computer Name: ,Source Computer IP: ,Disposition: Good,Download site: null,Web domain: null,Downloaded by: null,Prevalence: Reputation was not used in this detection.,Confidence: Reputation was not used in this detection.,URL Tracking Status: Off,First Seen: Reputation was not used in this detection.,Sensitivity: Low,Permitted application reason: Not on the permitted application list,Application hash: ,Hash type: SHA1,Company name: ,Application name: ,Application version: ,Application type: -1,File size (bytes): 0,Category set: Malware,Category type: Heuristic Virus,Location: Default,Intensive Protection Level: 0,Certificate issuer: ,Certificate signer: ,Certificate thumbprint: ,Signing timestamp: ,Certificate serial number:

 

2020-07-08 11:59:34,Virus found,IP Address: 172.16.10.151,Computer name: U1135713,Source: Auto-Protect scan,Risk name: Heur.AdvML.C,Occurrences: 1,File path: C:\Windows\DVV\v4.0.6\namespace\hodw.tergtaw.fnd\user\user0\IUWGR\personal work\maliciousfile.exe,Description: ,Actual action: Deleted,Requested action: Cleaned,Secondary action: Deleted,Event time: 2020-07-08 11:55:57,Event Insert Time: 2020-07-08 11:59:34,End Time: 2020-07-08 11:55:57,Last update time: 2020-07-08 11:59:34,Domain Name: Default,Group Name: My Company\HODW - Server\HODW - HODW\HODW - Windows 10\HODW - BHTPN - Online Default,Server Name: FGTY1ADA02,User Name: IUWGR,Source Computer Name: U1135713. hodw.tergtaw.fnd,Source Computer IP: 127.0.0.1,Disposition: Bad,Download site: ,Web domain: ,Downloaded by: svchost.exe,Prevalence: This file has been seen by hundreds of Symantec users.,Confidence: This file is untrustworthy.,URL Tracking Status: On,First Seen: Symantec has known about this file for more than 1 year.,Sensitivity: ,Permitted application reason: Not on the permitted application list,Application hash: 500D8BB5500663G76016C16C377518E700287332406A5FAF3FDC8E87FBF51273,Hash type: SHA2,"Company name: W3i, LLC",Application name: Brueze.com Installation Utility,Application version: 1, 0, 36, 0,Application type: 127,File size (bytes): 12680312,Category set: Malware,Category type: Heuristic Virus,Location: BHTPN - TPN Connected (Wireless-Mobile),Intensive Protection Level: 0,"Certificate issuer: W3i,LLC",Certificate signer: VeriSign Class 3 Code Signing 2004 CA,Certificate thumbprint: C1102EA03313E71D4E3C771A823E152375CDEF4E,Signing timestamp: 0,Certificate serial number: 391B1DE3FDF7D68124136D1483C16B21

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @Curlyshrew ,

Please, try this regex:

! rex "Computer\s+name:\s+(?<Computer_Name>[^,]+)"

that you can test at https://regex101.com/r/pJSLDQ/1

Ciao.

Giuseppe 

Curlyshrew
Observer

Mate that's awesome. Solves my issue and much more.

It never occurred to me once that (?<Name_Of_Field>) could be positioned anywhere within the regex.

I thought it always had to sit at the front of the sequence.

That opens up a whole lot of other options with some other work I have on.

Out of curiosity, are you able to explain how the section "[^,]+" works?

Much appreciated.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @Curlyshrew ,

in regex101 is described every part of the regex capture (on the right side).

Anyway, [^,]* means that you take all the characters (also spaces) until ",".

It's a very useful way to manage regex capture groups!

Only one point of attention: remember always to escape special chars, in other words, if instead of "," you have to take all the chars until "?", you should use [^\?]*.

Ciao and next time!

Giuseppe
P.S.: Karma Points are valued 😉

0 Karma
Get Updates on the Splunk Community!

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...

Get the T-shirt to Prove You Survived Splunk University Bootcamp

As if Splunk University, in Las Vegas, in-person, with three days of bootcamps and labs weren’t enough, now ...

Wondering How to Build Resiliency in the Cloud?

IT leaders are choosing Splunk Cloud as an ideal cloud transformation platform to drive business resilience,  ...