Splunk Search

Regex help

sc0tt
Builder

I'm using a sed script to clean up some events before they are indexed by Splunk in order to reduce the license usage. My raw data has some XML tags. Prior to indexing, I reformat these tags as key=value pairs. The below sed script was working correctly. However, there has been a change to the log that introduces an angle bracket character (<) which is causing the data to not be indexed as desired.

Sed script in props.conf

s/<([^\s\>]*)[^\>]*\>([^<].*?)\<\/\1\>/ \1="\2"/g

Sample data

2014-03-20 09:35:46,193 Outgoing UserSessionLog <UserId>55555555555</UserId><MsgType>Menu</MsgType><Title>My Title</Title><MenuId>1</MenuId><Text>This is some text</Text><MenuId>2</MenuId><Text><This is text with an angle bracket</Text><Internal>User Menu</Internal><IsActive>true</IsActive><SessionID>1000</SessionID>

The above sample data is indexed as:

2014-03-20 09:35:46,193 Outgoing UserSessionLog UserId="55555555555" MsgType="Menu" Title="My Title" MenuId="1" Text="This is some text" MenuId="2"<Text><This is text with an angle bracket</Text> Internal="System Menu" IsActive="true" SessionID="1000"

As you can see, the regular expression is not matching the second Text key because of the angle bracket (<) so the value is not getting assigned properly. It should be Text="<This is text with an angle bracket". I have been unable to modify the regular expression to handle this scenario.

Any help or suggestions would be greatly appreciated!

0 Karma
1 Solution

somesoni2
Revered Legend

Try this SED

To remove "<" from value of Text:-

s/<([^\s\>]*)[^\>]*\>[<]*([^<].*?)\<\/\1\>/ \1=\"\2\"/g

To keep the "<" with value of Text:-

s/<([^\s\>]*)[^\>]*\>(.*?)\<\/\1\>/ \1=\"\2\"/g

View solution in original post

somesoni2
Revered Legend

Try this SED

To remove "<" from value of Text:-

s/<([^\s\>]*)[^\>]*\>[<]*([^<].*?)\<\/\1\>/ \1=\"\2\"/g

To keep the "<" with value of Text:-

s/<([^\s\>]*)[^\>]*\>(.*?)\<\/\1\>/ \1=\"\2\"/g

somesoni2
Revered Legend

Try second option.

0 Karma

sc0tt
Builder

Thanks! Is there a way to keep the "<" if it is part of the value? Other than that, this seems to work so I may use it anyways and just discard the bracket.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

How to find the worst searches in your Splunk environment and how to fix them

Everyone knows Splunk is a powerful platform for running searches and doing data analytics. Your ...

Share Your Feedback: On Admin Config Service (ACS)!

Help Us Build a Better Admin Config Service Experience (ACS)   We Want Your Feedback on Admin Config Service ...

Build the Future of Agentic AI: Join the Splunk Agentic Ops Hackathon

AI is changing how teams investigate incidents, detect threats, automate workflows, and build intelligent ...