Splunk Search

Why is my REGEX and MV_ADD=true in transforms.conf not working as expected to extract fields from Windows event logs?

fairje
Communicator

I am trying to parse out the EMET (Enhanced Mitigation Experience Toolkit) logs (note when I get this whole thing working, I plan to share this far and wide so MS will stop trying to sell you on their crappy products to monitor these same logs). In any case, we currently have the GPO/Registry configurations being kicked to EventCode 50 and they look something similar to below:

01/12/2016 05:00:05 PM
LogName=Application
SourceName=EMET
EventCode=50
EventType=4
Type=Information
ComputerName=host001.com
TaskCategory=%1
OpCode=Info
RecordNumber=267548
Keywords=Classic
Message=EMET settings were refreshed successfully.

EMET configuration for Application mitigations (Registry) is:
<ConfigAppmitREG>
</ConfigAppmitREG>

EMET configuration for Application mitigations (GPO) is:
<ConfigAppmitGPO>
7z.exe *\7-Zip  DEP SEHOP NullPage HeapSpray MandatoryASLR BottomUpASLR LoadLib MemProt Caller SimExecFlow StackPivot
7zFM.exe *\7-Zip  DEP SEHOP NullPage HeapSpray MandatoryASLR BottomUpASLR LoadLib MemProt Caller SimExecFlow StackPivot
7zG.exe *\7-Zip  DEP SEHOP NullPage HeapSpray MandatoryASLR BottomUpASLR LoadLib MemProt Caller SimExecFlow StackPivot
Acrobat.exe *\Adobe\Acrobat*\Acrobat  DEP SEHOP NullPage HeapSpray EAF MandatoryASLR BottomUpASLR LoadLib Caller SimExecFlow StackPivot
AcroRd32.exe *\Adobe\Reader*\Reader  DEP SEHOP NullPage HeapSpray EAF MandatoryASLR BottomUpASLR LoadLib Caller SimExecFlow StackPivot
chrome.exe *\Google\Chrome\Application  DEP NullPage HeapSpray EAF MandatoryASLR BottomUpASLR LoadLib MemProt Caller SimExecFlow StackPivot
communicator.exe *\Microsoft Lync  DEP SEHOP NullPage HeapSpray EAF MandatoryASLR BottomUpASLR LoadLib MemProt Caller SimExecFlow StackPivot
EXCEL.EXE *\OFFICE1*  DEP SEHOP NullPage HeapSpray EAF MandatoryASLR BottomUpASLR LoadLib MemProt Caller SimExecFlow StackPivot
firefox.exe *\Mozilla Firefox  DEP SEHOP NullPage HeapSpray EAF MandatoryASLR BottomUpASLR LoadLib MemProt Caller SimExecFlow StackPivot
...snip more apps...
</ConfigAppmitGPO>

There are a couple other events generated from EMET on Event 50, but this is the important one because it tells you how you are getting certain settings (registry keys or GPO) and it also tells you what your hosts are configured as (in case you have different configs in your environment for different reasons).

Now here is the nightmare. How to extract the REGEX statement on your transforms to parse all this information out. So to start with, I was toying around with the rex search command and got success pulling out all the application names as such:

| rex max_match=0 field=Message "(?m)^(?<App_Name>.*\.[exeEXE]{3})"

I am using the fact that the Message field is already pulled by Splunk having the Windows TA installed and it's general = extractions. Mostly that gives me everything after Message= in the logs. The regex above actually works to pull out (especially with max_match at 0 - unlimited) all the app names in a single event. When I tried to throw that in transforms.conf, it all falls apart and just doesn't work with no apparent reason why not.

[emet_event50_app_from_Message]
SOURCE_KEY = Message
REGEX = (?m)^(?<App_Name>.*\.[exeEXE]{3})
MV_ADD = true

Essentially the MV_ADD should make it pull all the matches, not just the first one. But instead, the results I get is a regrab of the entire message data e.g.:

EMET settings were refreshed successfully. EMET configuration for Application mitigations (Registry) is: <ConfigAppmitREG> </ConfigAppmitREG> EMET configuration for Application mitigations (GPO) is: <ConfigAppmitGPO> 7z.exe *\7-Zip DEP SEHOP NullPage HeapSpray MandatoryASLR BottomUpASLR LoadLib MemProt Caller SimExecFlow StackPivot 7zFM.exe *\7-Zip DEP SEHOP NullPage HeapSpray MandatoryASLR BottomUpASLR LoadLib MemProt Caller... and so on

I have never really tried working with a multiline event in Splunk from the transforms file before, so I am not sure what I am missing here. And reading other Splunk Answers seems to indicate that the above should be right, but it just isn't working.

Thanks for the assist!

0 Karma
1 Solution

fairje
Communicator

So I have worked around the issue with the following:

[emet_event50_app_from_Message]
SOURCE_KEY = Message
REGEX = \n(?<App_Name>.*\.[exeEXE]{3})\s\S
MV_ADD = true

Clearly you can see that the newline character is there, because this totally works for the logs, but it doesn't want to accept the (?m) option in the front so you can actually switch to using the the caret "^" character. This is frustrating because in other logs I have used the (?m) option.

As has been suggested it may have something to do with the way splunk is extracting the "Message" field. I haven't tried an extraction in the transforms using _raw, maybe that would also be a solution.

Note about the above regex, I have to use the * character on this to work correctly, since some application names have whitespace in them along with words. As long as you anchor to the newline and stop when it finds "exe" or "EXE" then that should be sufficient for grabbing this data on EMET logs.

Thank you gcato for the assistance on getting to the bottom of this. Your responses were appreciated!

View solution in original post

0 Karma

fairje
Communicator

So I have worked around the issue with the following:

[emet_event50_app_from_Message]
SOURCE_KEY = Message
REGEX = \n(?<App_Name>.*\.[exeEXE]{3})\s\S
MV_ADD = true

Clearly you can see that the newline character is there, because this totally works for the logs, but it doesn't want to accept the (?m) option in the front so you can actually switch to using the the caret "^" character. This is frustrating because in other logs I have used the (?m) option.

As has been suggested it may have something to do with the way splunk is extracting the "Message" field. I haven't tried an extraction in the transforms using _raw, maybe that would also be a solution.

Note about the above regex, I have to use the * character on this to work correctly, since some application names have whitespace in them along with words. As long as you anchor to the newline and stop when it finds "exe" or "EXE" then that should be sufficient for grabbing this data on EMET logs.

Thank you gcato for the assistance on getting to the bottom of this. Your responses were appreciated!

0 Karma

gcato
Contributor

Good result fairje. It is strange and I wonder if there is a bug here.

I found an old comment by "itinney" here: https://answers.splunk.com/answers/38753/regex-for-multiline-events.html

He indicates that uses (?m) seems to behave like using (?sm), i.e. (?s) gets tuned on if (?m) is used. Note, I've not proved this but it would be strange behaviour as it defeats the purpose of using (?m) which is to cause ^ and $ to match the begin/end of each line (not only begin/end of string). Something to watch out for anyway.

0 Karma

alemarzu
Motivator

Hi fairje,

Try this regex,

(?i)(?<App_Name>\S+(?:\.exe))
0 Karma

gcato
Contributor

Hi fairje,

It would appear that the newlines in your Message field are no longer there (i.e. it is not multiline anymore, but one long string) so your regex no longer works. Or at least the logic no longer works. It matches from the Message string beginning to the last .exe it finds and that's what you see returned.

To fix you need to use a different REGEX. This is a perfect place to use regex's lookahead (?=...) syntax. Try using the following REGEX which should find all .exe files in the string (assuming no whitespace in file names).

REGEX = \s(?<App_Name>\w*(?=\.[exeEXE]{3}( |\z))\.[exeEXE]{3})

I tried this at regex101 and it works on your example data. You can find it here if you want to check what the regex syntax means: https://regex101.com/r/hO9iD8/2

This is also a great regex resource if you get stuck: http://www.rexegg.com/regex-lookarounds.html

Hope this helps.

0 Karma

gcato
Contributor

Is "Reader.exe" the first or last field? Maybe MV_ADD =true wasn't picked up correctly.

0 Karma

fairje
Communicator

Neither, it looks like it was picking up on this line: .

Foxit Reader.exe *\Foxit Reader  DEP SEHOP NullPage HeapSpray EAF MandatoryASLR BottomUpASLR LoadLib MemProt Caller SimExecFlow StackPivot

So by default the \s regex does not search on whitespace that is the newline character and since your provided information was just looking on the \s without having the (?m) in front, my guess is that it wasn't going to match on the newline character. So the only thing that matches that regex is the one application that has a "space" in its name.

0 Karma

fairje
Communicator

Hrmmm, doesn't seem to work as expected. I now only have one extraction which is "Reader.exe" from the events 😞

I'm going to try changing it up from a \s to a \n (for new line) and see if that works since it has worked elsewhere in other events.

0 Karma

gcato
Contributor

Hi fairje,

Did you manage to get the extraction working okay? It would be good to know if the answer worked so it may be useful for other users.

0 Karma

fairje
Communicator

Sorry for the delay getting back. I am reloading my configuration now and will post back when I get more.

I'm confused though why this doesn't read the newline character... I might try what I did in another REGEX on the same logs as well, which looked like this:

REGEX = (?:\nEMET configuration for |\nEMET )(?<EMETEvent50Type>(?:\w+ status|\w+ Trust|\w+)) (?:is|mitigations)

Note that REGEX does work on these same exact logs. Since in the above log example I provided it would extract:

EMETEvent50Type = "Application"

It's strange that the (?m) doesn't work, when I totally use that in another transforms on a different file. And I think I have either used the (?m) or the (?s) option on a different windows event log before... ::confused::

0 Karma

gcato
Contributor

Hmmm... does this one use the Message field as SOURCE_KEY though or _raw data (default)? I guess it comes back to how the Message field is auto extracted by Splunk. If it's something like (?ms)Message=(?.+) then newlines becomes dots and Message is a single line field. Though the rex search command example in you question indicates newlines are in the Message field. How does the message field appear when you table it i.e. ...search ... | table Message Maybe something like would cover both options.

REGEX = (?:^|\s)(?<App_Name>\w*(?=\.[exeEXE]{3})\.[exeEXE]{3})(?: |\z)+

Try it without (?m) at the start also

0 Karma

fairje
Communicator

By the way, both a ... | table _raw and a ... | table Message returns the same unformatted text stripping away any newline characters. So unfortunately that doesn't tell me anything about what is going on in the background...

0 Karma
Get Updates on the Splunk Community!

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...