Splunk Search
Highlighted

Why Does Regex Not Match Ampersand?

Communicator

I have an event field in the format of fieldTitle=Type: This is a description. Sometimes this event field contains an ampersand (&) in it, and when extracting the value of that field Splunk will stop and not pull the rest of the field. For example:

fieldTitle=Type: This & That Splunk will display the value of fieldTitle as This.

In my regex I've tried escaping the ampersand, I've tried its hex and unicode equivalent values, and I've even tried a .* which should match on everything regardless. None of these result in a match beyond the ampersand.

I've also tried the field extraction tool, and aside from it generating a very long and static regex that isn't as dynamic as I need, it also does not work when I call it in a search.

Has anyone had this same issue? I'm on Splunk 6.2.

0 Karma
Highlighted

Re: Why Does Regex Not Match Ampersand?

Motivator

Can i get your sample event?
I think you can do someting like this:

...|rex fields=_raw "fieldTitle\=Type\: (?<fieldname>[^\n])"

depending of the end of your description. If not working, let me get your sample event.
Thanks

0 Karma
Highlighted

Re: Why Does Regex Not Match Ampersand?

Communicator

It's sensor event data from another SIEM. Below is a sample:

May  7 2015 15:36:21 forwarding-system-hostname.domain.com 2015-05-07T15:36:21.201Z|ESM|CEF|358|McAfee NTR Incident start= 1430987152 end= 1430987152 rt=1430990752 deviceExternalId=Sensor-A eventId=1234 nitroNormId=123588 nitroObjectId=Malware: Botnet nitroBehavior=Botnet: GB Custom Signature C&C Traffic From DNS src=1.2.3.4 dst=5.6.7.8 nitroCat=Misc nitroDom=Domain

The above is a representation of the event I'm format I'm having an issue with. The field I'm having an issue with is "nitroBehavior". Splunk auto-parses the field, however, it extracts the value as "Botnet: Custom Signature C", and I've tried numerous regular expressions to include | rex field=nitroBehavior "(?P&lt;fieldname&gt;.*)" and |rex field=nitroBehavior "(?P&lt;fieldname&gt;[^nitro])" and other variations that should work, including using the hex and unicode representations of the ampersand. Every time, it captures "Botnet: Custom Signature C", but never goes beyond the ampersand.

0 Karma
Highlighted

Re: Why Does Regex Not Match Ampersand?

Communicator

I did not try fields=_raw in the rex component, instead I designated field=nitroBehavior, which is the field I wanted to perform the regex on. I may try _raw tomorrow and just ignore everything up to the field I want to see if that changes the result any.

0 Karma
Highlighted

Re: Why Does Regex Not Match Ampersand?

Splunk Employee
Splunk Employee

Excellent moniker IngloriousSplunker!
you need to show sample data and your regex. It's not Splunk stopping on the ampersand... it's your regex syntax and the event. the & isn't special in any way...

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!
0 Karma
Highlighted

Re: Why Does Regex Not Match Ampersand?

Splunk Employee
Splunk Employee

It's a bit confusing as to what you want exactly in the new fieldnamebecause of your second example... but If the src= field is always following the nitroBehavior= field you can use this:

nitroBehavior=(?&lt;nitro&gt;.+)\ssrc

Basically I think Splunk, when it automagically grabs the key value pairs (which it will do when it sees an =) sees the ampersand as another delimiter and stops... so first, you want to re-assign the nitroBehavior field (I called the field nitro above but you can call it nitroBehavior and it will take prescience over the auto assigned one.

You can't use the field as is... since the text isn't surrounded by double quotes... and it's in a space delimited event (not nice 3rd party SIEM!) Splunk really just has to go with "best guess" and in this case, that's not good enough.

So grab the nitroBehavior field:
nitroBehavior=(?&lt;nitroBehavior&gt;.+)\ssrc
And then you cay say
...|rex field=nitroBehavior "Botnet:\s(?&lt;botnet&gt;.+)\ssrc

Or if that subfield is a pattern, you can grab it in transforms with a dynamic field name

[nitroBehaviorInsides]
SOURCE_KEY = nitroBehavior #(the new one)
REGEX = (\w+):\s(.+)\ssrc
FORMAT $1::$2

That'll grab both key and value pair for all the different messages.

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!
0 Karma
Highlighted

Re: Why Does Regex Not Match Ampersand?

Communicator

I don't believe that field always precedes a specific field, I've seen it at the very end of the alert before as well going back through my event data. I will try the above regex, and perhaps provide more examples of variance in the events

0 Karma
Highlighted

Re: Why Does Regex Not Match Ampersand?

Communicator

With the above suggestion:

I'm doing |eval nitroBehavior=(?P&lt;nitroBehavior&gt;.+\ssrc and it's throwing an error saying "An unexpected character is reached at ?P<nitroBehavior>.+"

0 Karma
Highlighted

Re: Why Does Regex Not Match Ampersand?

Communicator

This regex |rex field=_raw "(?:nitroBehavior=)(?&lt;behavior&gt;.+[^\ssrc])" captures the full value, but it does not stop at the next match of "src". It prints: "Botnet: GB Custom Signature C&C Traffic From DNS src", and the same happens if I just do |rex field=_raw "(?:nitroBehavior=)(?&lt;behavior&gt;.+\ssrc)"

0 Karma
Highlighted

Re: Why Does Regex Not Match Ampersand?

Communicator

This regular expression seems to have fixed it, however, it will not work if this field is at the end of the event. In that case I could probably add a \n match as well.

| rex field=_raw "(?:nitroBehavior=)(?&lt;behavior&gt;(.*?)(?=src))"

Thanks for the help and getting me on the right direction everyone.

View solution in original post

0 Karma