Splunk Search

How to write a regular expression to extract this field from my sample sysmon log?

kinkster
Explorer

I am cannot quite get the regex working that I am looking for. I want to extract AcroRd32.exe

Here is the sample text.

<Data Name='ProcessId'>12784</Data>
<Data Name='Image'>C:\Program Files (x86)\Adobe\Reader 11.0\Reader\AcroRd32.exe</Data>
<Data Name='CurrentDirectory'>C:\Users\abcv\AppData\Local\Microsoft\Windows\Temporary Internet Files\</Data>
<Data Name='TerminalSessionId'>1</Data><Data Name='IntegrityLevel'>Low</Data>

(?s)'Image'(?P[^<]+)
This pulls out this

'Image'>C:\Program Files (x86)\Adobe\Reader 11.0\Reader\AcroRd32.exe

Working with this text now

'Image'>C:\Program Files (x86)\Adobe\Reader 11.0\Reader\AcroRd32.exe

This regex pulls out what I want
[^\\]+$
AcroRd32.exe

I need to put this together or do the regex God's have a better way?

0 Karma
1 Solution

woodcock
Esteemed Legend

Like this:

REGEX = (^|[\r\n])<Data Name='(.*?)'>(.*?)<\/Data>
FORMAT = $1::$2

And then you use the other answers on field=Image

View solution in original post

0 Karma

horsefez
SplunkTrust
SplunkTrust

Hi kinkster,

I have another solution for the regular expression you asked for.

(?=\'Image\').+\\(?<program>[^\<]+)(?=\<\/Data\>$)

https://regex101.com/r/Y0hINA/2

PS: Please don't forget to upvote and click accept as woodcock already said, thx!

0 Karma

woodcock
Esteemed Legend

Like this:

REGEX = (^|[\r\n])<Data Name='(.*?)'>(.*?)<\/Data>
FORMAT = $1::$2

And then you use the other answers on field=Image

0 Karma

dstaulcu
Builder

Check out the add-on for Microsoft Sysmon. This app includes field extractions such as the above as well as some helpful workflow actions. If you have a LOT of computers pushing sysmon data you may want to consider creation of an accelerated data model.

0 Karma

mwk1000
Path Finder

Generally as a safety practice I would replace the use of lazy .*? with a negative character class [^>] and a possessive quantifier ++ or (?>...) when you can. It's quite common syntax in Splunk configs because of the elimination of backtracking and performance boost. Splunk has the potential to apply your regex to huge numbers of events that will NOT match and this will speed up the failures. ( Speed up the matches as well )

0 Karma

kinkster
Explorer

I do have the the latest TA for sysmon. The TA does extract out the CurrentDirectory for instance but nothing for the Image filename. I can look at the TA and see if that helps. Great idea on the accelerated data model. I will take a look.

0 Karma

dstaulcu
Builder

The app should extract the file name of the Image as "process".

See line 23 of the props configuration file in the TA.

0 Karma

kinkster
Explorer

Alrighty then! You are correct again. It is right there as "process". This is what I needed. Thanks for you help!

0 Karma

woodcock
Esteemed Legend

Don't forget to UpVote anything helpful and click Accept on the best answer.

0 Karma

somesoni2
Revered Legend

Give this a try

Image\'([^\\]+\\)+(?<Image>\w+\.\w+)\<

OR (I believe this will work in Splunk configs)

Image\'([^\\\]+\\\)+(?<Image>\w+\.\w+)\<

Test result on your sample data.
https://regex101.com/r/JTpEXe/1

0 Karma
Get Updates on the Splunk Community!

This Week's Community Digest - Splunk Community Happenings [9.26.22]

Get the latest news and updates from the Splunk Community here! Upcoming User Group Events! &#x1f44f; Check ...

BSides Splunk 2022 - The Call for Papers is now Open!

TLDR; Main Site: https://bsidessplunk.com CFP Site: https://bsidessplunk.com/cfp CFP Opens: December 15th, ...

Sending Metrics to Splunk Enterprise With the OpenTelemetry Collector

This blog post is part of an ongoing series on OpenTelemetry. The OpenTelemetry project is the second largest ...