Splunk Search

Regex help

sc0tt
Builder

I'm using a sed script to clean up some events before they are indexed by Splunk in order to reduce the license usage. My raw data has some XML tags. Prior to indexing, I reformat these tags as key=value pairs. The below sed script was working correctly. However, there has been a change to the log that introduces an angle bracket character (<) which is causing the data to not be indexed as desired.

Sed script in props.conf

s/<([^\s\>]*)[^\>]*\>([^<].*?)\<\/\1\>/ \1="\2"/g

Sample data

2014-03-20 09:35:46,193 Outgoing UserSessionLog <UserId>55555555555</UserId><MsgType>Menu</MsgType><Title>My Title</Title><MenuId>1</MenuId><Text>This is some text</Text><MenuId>2</MenuId><Text><This is text with an angle bracket</Text><Internal>User Menu</Internal><IsActive>true</IsActive><SessionID>1000</SessionID>

The above sample data is indexed as:

2014-03-20 09:35:46,193 Outgoing UserSessionLog UserId="55555555555" MsgType="Menu" Title="My Title" MenuId="1" Text="This is some text" MenuId="2"<Text><This is text with an angle bracket</Text> Internal="System Menu" IsActive="true" SessionID="1000"

As you can see, the regular expression is not matching the second Text key because of the angle bracket (<) so the value is not getting assigned properly. It should be Text="<This is text with an angle bracket". I have been unable to modify the regular expression to handle this scenario.

Any help or suggestions would be greatly appreciated!

0 Karma
1 Solution

somesoni2
Revered Legend

Try this SED

To remove "<" from value of Text:-

s/<([^\s\>]*)[^\>]*\>[<]*([^<].*?)\<\/\1\>/ \1=\"\2\"/g

To keep the "<" with value of Text:-

s/<([^\s\>]*)[^\>]*\>(.*?)\<\/\1\>/ \1=\"\2\"/g

View solution in original post

somesoni2
Revered Legend

Try this SED

To remove "<" from value of Text:-

s/<([^\s\>]*)[^\>]*\>[<]*([^<].*?)\<\/\1\>/ \1=\"\2\"/g

To keep the "<" with value of Text:-

s/<([^\s\>]*)[^\>]*\>(.*?)\<\/\1\>/ \1=\"\2\"/g

somesoni2
Revered Legend

Try second option.

0 Karma

sc0tt
Builder

Thanks! Is there a way to keep the "<" if it is part of the value? Other than that, this seems to work so I may use it anyways and just discard the bracket.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...