Splunk Search

Trouble with regex in transforms.conf

sc0tt
Builder

I am filtering events in transforms.conf but I cannot seem to get the regex to match. When I test the regex in Search it works as expected and even when tested at http://gskinner.com/RegExr/.

I'm trying to match on the MsgType tag.

Sample event:

2013-10-28 4:36:38,322  <?xml version="1.0" encoding="UTF-8"?><INTERFACE><MsgType>SendMessage</MsgType><Emailaddress>user@example.com</Emailaddress><Userid>9999999999999</Userid><FolderName>inbox</FolderName><Alerts>false</Alerts><Ack>true</Ack><To>user@example.com</To></INTERFACE>

Below are variations that I tried that all seem to work but not when used in transforms.conf

^(.*<MsgType>(SendMessage|ReplyMessage)\b<\/MsgType>).*$

^(.*<MsgType.(SendMessage|ReplyMessage)\b<\/).*$

^(.*<MsgType.(SendMessage|ReplyMessage)\b<.MsgType.).*$

^(.*MsgType.(SendMessage|ReplyMessage)\b..MsgType).*$

^(.*<[^<]*MsgType[^>]*>(SendMessage|ReplyMessage)\b<\/[^<\/]*MsgType[^>]*>).*$

This works but isn't ideal ^(.*MsgType.(SendMessage|ReplyMessage)\b).*$

What's the proper way to escape the opening/closing tags?

0 Karma
1 Solution

Ayn
Legend

First of all there's no need for anchor your matches with ^.* and .*$. The regex engine will automatically find what you're after anyway. You don't need to escape either of the characters you're escaping.

<MsgType>(SendMessage|ReplyMessage)</MsgType>

should work just fine.

View solution in original post

Ayn
Legend

First of all there's no need for anchor your matches with ^.* and .*$. The regex engine will automatically find what you're after anyway. You don't need to escape either of the characters you're escaping.

<MsgType>(SendMessage|ReplyMessage)</MsgType>

should work just fine.

sc0tt
Builder

It looks like my issue was due to the fact that SED-* entries are executed prior to TRANSFORMS-*

0 Karma

sc0tt
Builder

As a follow up, running certain sed scripts seem to work without issue while others cause the event to never get indexed. For example, running SEDCMD-format= s/Emailaddress/Email/g after TRANSFORMS-set= setnull,keep in props.conf works but SEDCMD-format= s/(.*)<MsgType>(.*)<\/MsgType>.*/\1 MsgType=\2/ does not and the event is never indexed. Any ideas?

0 Karma

sc0tt
Builder

Thank you. You are correct and this does work just fine. It seems that a sed script running after the transforms was the issue. I thought it was the regex that was the problem.

0 Karma
Get Updates on the Splunk Community!

Dashboards: Hiding charts while search is being executed and other uses for tokens

There are a couple of features of SimpleXML / Classic dashboards that can be used to enhance the user ...

Splunk Observability Cloud's AI Assistant in Action Series: Explaining Metrics and ...

This is the fourth post in the Splunk Observability Cloud’s AI Assistant in Action series that digs into how ...

Brains, Bytes, and Boston: Learn from the Best at .conf25

When you think of Boston, you might picture colonial charm, world-class universities, or even the crack of a ...