Hi all. New here.
So I have been working with some data strings that contain varied asset numbers for computers and servers.
As unfortunately our naming conventions are all over the place for a small number of assets as well as the server asset names being considerably different from endpoint PC's, I have been left with using the following to capture the sequence:
(?:Computer Name:)(.*)(?:,) -- (Yucky Wildcard)
For Splunk, this would be:
| rex "(?<Computer>(?:Computer Name:)(.*?)(?:,))"
The dilemma is that the non-capture group (?:Computer Name) is being captured in the results.
I am unsure but I assume it is due to the first capture group "(?<Computer>)
From my little experience with playing with rex, I do know that non-capture groups work in-front of a capture group but I have had no success in having them before a capture group.
Thanks for listening to my TED talk
Could you please provide a small dummy-sample of your data to review and. test the regex?
On a side note, you may want to normalize the naming conventions. This can be done at the source (maybe?), at index time, or at search time (e.g. create a lookup that outputs friendly names for server assets and endpoint PCs, or use the coalesce function of the eval command).
Thanks!
Apologies on the delay
Here's some example raw data that I am working with. Again the computer names change in length, consistency per result
I have marked the fields in bold that I am attempting to extract.
2020-07-04 21:36:33,Compressed File,IP Address: 192.168.1.1,Computer name: GHCC01SFG435,Source: Scheduled scan,Risk name: Heur.AdvML.B,Occurrences: 1,File path: T:\Tower\Installer\fgg5cfef.msi,Description: Still contains 1 infected items,Actual action: Quarantined,Requested action: Quarantined,Secondary action: Left alone,Event time: 2020-07-04 21:33:58,Event Insert Time: 2020-07-04 21:36:33,End Time: 2020-07-04 21:33:58,Last update time: 2020-07-04 21:36:33,Domain Name: Default,Group Name: My Company\HODW - Server\HODW - Development,Server Name: FGTY1ADA02,User Name: SYSTEM,Source Computer Name: ,Source Computer IP: ,Disposition: Good,Download site: null,Web domain: null,Downloaded by: null,Prevalence: Reputation was not used in this detection.,Confidence: Reputation was not used in this detection.,URL Tracking Status: Off,First Seen: Reputation was not used in this detection.,Sensitivity: Low,Permitted application reason: Not on the permitted application list,Application hash: ,Hash type: SHA1,Company name: ,Application name: ,Application version: ,Application type: -1,File size (bytes): 0,Category set: Malware,Category type: Heuristic Virus,Location: Default,Intensive Protection Level: 0,Certificate issuer: ,Certificate signer: ,Certificate thumbprint: ,Signing timestamp: ,Certificate serial number:
2020-07-08 11:59:34,Virus found,IP Address: 172.16.10.151,Computer name: U1135713,Source: Auto-Protect scan,Risk name: Heur.AdvML.C,Occurrences: 1,File path: C:\Windows\DVV\v4.0.6\namespace\hodw.tergtaw.fnd\user\user0\IUWGR\personal work\maliciousfile.exe,Description: ,Actual action: Deleted,Requested action: Cleaned,Secondary action: Deleted,Event time: 2020-07-08 11:55:57,Event Insert Time: 2020-07-08 11:59:34,End Time: 2020-07-08 11:55:57,Last update time: 2020-07-08 11:59:34,Domain Name: Default,Group Name: My Company\HODW - Server\HODW - HODW\HODW - Windows 10\HODW - BHTPN - Online Default,Server Name: FGTY1ADA02,User Name: IUWGR,Source Computer Name: U1135713. hodw.tergtaw.fnd,Source Computer IP: 127.0.0.1,Disposition: Bad,Download site: ,Web domain: ,Downloaded by: svchost.exe,Prevalence: This file has been seen by hundreds of Symantec users.,Confidence: This file is untrustworthy.,URL Tracking Status: On,First Seen: Symantec has known about this file for more than 1 year.,Sensitivity: ,Permitted application reason: Not on the permitted application list,Application hash: 500D8BB5500663G76016C16C377518E700287332406A5FAF3FDC8E87FBF51273,Hash type: SHA2,"Company name: W3i, LLC",Application name: Brueze.com Installation Utility,Application version: 1, 0, 36, 0,Application type: 127,File size (bytes): 12680312,Category set: Malware,Category type: Heuristic Virus,Location: BHTPN - TPN Connected (Wireless-Mobile),Intensive Protection Level: 0,"Certificate issuer: W3i,LLC",Certificate signer: VeriSign Class 3 Code Signing 2004 CA,Certificate thumbprint: C1102EA03313E71D4E3C771A823E152375CDEF4E,Signing timestamp: 0,Certificate serial number: 391B1DE3FDF7D68124136D1483C16B21
Hi @Curlyshrew ,
Please, try this regex:
! rex "Computer\s+name:\s+(?<Computer_Name>[^,]+)"
that you can test at https://regex101.com/r/pJSLDQ/1
Ciao.
Giuseppe
Mate that's awesome. Solves my issue and much more.
It never occurred to me once that (?<Name_Of_Field>) could be positioned anywhere within the regex.
I thought it always had to sit at the front of the sequence.
That opens up a whole lot of other options with some other work I have on.
Out of curiosity, are you able to explain how the section "[^,]+" works?
Much appreciated.
Hi @Curlyshrew ,
in regex101 is described every part of the regex capture (on the right side).
Anyway, [^,]* means that you take all the characters (also spaces) until ",".
It's a very useful way to manage regex capture groups!
Only one point of attention: remember always to escape special chars, in other words, if instead of "," you have to take all the chars until "?", you should use [^\?]*.
Ciao and next time!
Giuseppe
P.S.: Karma Points are valued 😉