Splunk Search

Regex/Rex - Non Capture Groups

Curlyshrew
Observer

Hi all. New here.

 

So I have been working with some data strings that contain varied asset numbers for computers and servers.

As unfortunately our naming conventions are all over the place for a small number of assets as well as the server asset names being considerably different from endpoint PC's, I have been left with using the following to capture the sequence:

(?:Computer Name:)(.*)(?:,)                                  -- (Yucky Wildcard)

For Splunk, this would be:

| rex "(?<Computer>(?:Computer Name:)(.*?)(?:,))"

 

The dilemma is that the non-capture group (?:Computer Name) is being captured in the results.

 

I am unsure but I assume it is due to the first capture group "(?<Computer>)

 

From my little experience with playing with rex, I do know that non-capture groups work in-front of a capture group but I have had no success in having them before a capture group.

 

Thanks for listening to my TED talk

Labels (2)
0 Karma

_gkollias
Builder

Could you please provide a small dummy-sample of your data to review and. test the regex? 

On a side note, you may want to normalize the naming conventions.   This can be done at the source (maybe?), at index time, or at search time (e.g. create a lookup that outputs friendly names for server assets  and endpoint PCs, or use the coalesce function of the eval command).  

Thanks!

0 Karma

Curlyshrew
Observer

Apologies on the delay

 

Here's some example raw data that I am working with. Again the computer names change in length, consistency per result

 

I have marked the fields in bold that I am attempting to extract.

 

2020-07-04 21:36:33,Compressed File,IP Address: 192.168.1.1,Computer name: GHCC01SFG435,Source: Scheduled scan,Risk name: Heur.AdvML.B,Occurrences: 1,File path: T:\Tower\Installer\fgg5cfef.msi,Description: Still contains 1 infected items,Actual action: Quarantined,Requested action: Quarantined,Secondary action: Left alone,Event time: 2020-07-04 21:33:58,Event Insert Time: 2020-07-04 21:36:33,End Time: 2020-07-04 21:33:58,Last update time: 2020-07-04 21:36:33,Domain Name: Default,Group Name: My Company\HODW - Server\HODW - Development,Server Name: FGTY1ADA02,User Name: SYSTEM,Source Computer Name: ,Source Computer IP: ,Disposition: Good,Download site: null,Web domain: null,Downloaded by: null,Prevalence: Reputation was not used in this detection.,Confidence: Reputation was not used in this detection.,URL Tracking Status: Off,First Seen: Reputation was not used in this detection.,Sensitivity: Low,Permitted application reason: Not on the permitted application list,Application hash: ,Hash type: SHA1,Company name: ,Application name: ,Application version: ,Application type: -1,File size (bytes): 0,Category set: Malware,Category type: Heuristic Virus,Location: Default,Intensive Protection Level: 0,Certificate issuer: ,Certificate signer: ,Certificate thumbprint: ,Signing timestamp: ,Certificate serial number:

 

2020-07-08 11:59:34,Virus found,IP Address: 172.16.10.151,Computer name: U1135713,Source: Auto-Protect scan,Risk name: Heur.AdvML.C,Occurrences: 1,File path: C:\Windows\DVV\v4.0.6\namespace\hodw.tergtaw.fnd\user\user0\IUWGR\personal work\maliciousfile.exe,Description: ,Actual action: Deleted,Requested action: Cleaned,Secondary action: Deleted,Event time: 2020-07-08 11:55:57,Event Insert Time: 2020-07-08 11:59:34,End Time: 2020-07-08 11:55:57,Last update time: 2020-07-08 11:59:34,Domain Name: Default,Group Name: My Company\HODW - Server\HODW - HODW\HODW - Windows 10\HODW - BHTPN - Online Default,Server Name: FGTY1ADA02,User Name: IUWGR,Source Computer Name: U1135713. hodw.tergtaw.fnd,Source Computer IP: 127.0.0.1,Disposition: Bad,Download site: ,Web domain: ,Downloaded by: svchost.exe,Prevalence: This file has been seen by hundreds of Symantec users.,Confidence: This file is untrustworthy.,URL Tracking Status: On,First Seen: Symantec has known about this file for more than 1 year.,Sensitivity: ,Permitted application reason: Not on the permitted application list,Application hash: 500D8BB5500663G76016C16C377518E700287332406A5FAF3FDC8E87FBF51273,Hash type: SHA2,"Company name: W3i, LLC",Application name: Brueze.com Installation Utility,Application version: 1, 0, 36, 0,Application type: 127,File size (bytes): 12680312,Category set: Malware,Category type: Heuristic Virus,Location: BHTPN - TPN Connected (Wireless-Mobile),Intensive Protection Level: 0,"Certificate issuer: W3i,LLC",Certificate signer: VeriSign Class 3 Code Signing 2004 CA,Certificate thumbprint: C1102EA03313E71D4E3C771A823E152375CDEF4E,Signing timestamp: 0,Certificate serial number: 391B1DE3FDF7D68124136D1483C16B21

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @Curlyshrew ,

Please, try this regex:

! rex "Computer\s+name:\s+(?<Computer_Name>[^,]+)"

that you can test at https://regex101.com/r/pJSLDQ/1

Ciao.

Giuseppe 

Curlyshrew
Observer

Mate that's awesome. Solves my issue and much more.

It never occurred to me once that (?<Name_Of_Field>) could be positioned anywhere within the regex.

I thought it always had to sit at the front of the sequence.

That opens up a whole lot of other options with some other work I have on.

Out of curiosity, are you able to explain how the section "[^,]+" works?

Much appreciated.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @Curlyshrew ,

in regex101 is described every part of the regex capture (on the right side).

Anyway, [^,]* means that you take all the characters (also spaces) until ",".

It's a very useful way to manage regex capture groups!

Only one point of attention: remember always to escape special chars, in other words, if instead of "," you have to take all the chars until "?", you should use [^\?]*.

Ciao and next time!

Giuseppe
P.S.: Karma Points are valued 😉

0 Karma
Get Updates on the Splunk Community!

Splunk Classroom Chronicles: Training Tales and Testimonials

Welcome to the "Splunk Classroom Chronicles" series, created to help curious, career-minded learners get ...

Access Tokens Page - New & Improved

Splunk Observability Cloud recently launched an improved design for the access tokens page for better ...

Stay Connected: Your Guide to November Tech Talks, Office Hours, and Webinars!

&#x1f342; Fall into November with a fresh lineup of Community Office Hours, Tech Talks, and Webinars we’ve ...