Dashboards & Visualizations

Another regular expression question -- need help with "</"

sc0tt
Builder

This is similar to a question that I previously asked here

I am using a sed script to format a log at index time. The raw data contains some XML tags that can include any type of characters. Through previous help from here and lots of trial and error, I am able to extract everything I need except for in cases when I encounter a </ inside a tag. I don't care about the outer tags (i.e. <Message>, just the opening and closing tags that immediately follow each other.

Sample data:

2014-04-09 06:20:04,519 Outgoing IOLog <INTERFACE><Message><UserId>1234567</UserId><Key>1</Key><Item>Some stuff </\>`1234567890~!@#$%^&*()-=_+,.</Item><Key>2</Key><Item>This is some more text &#13;</Item></Message><Action/><MsgType>Menu</MsgType>

Sed script:

s/<([^\s\>]*)[^\>]*\>(((?![<]\/).)+)\<\/\1\>/ \1="\2"/g

This almost works, but fails to capture the first Item value because of </. Is there a way that I can get this to work? The final pairs should be

UserId"=1234567" Key="1" Item="Some stuff </\>`1234567890~!@#$%^&*()-=_+,." Key="2" Item="This is some more text &#13;" MsgType="Menu"

Any help would be greatly appreciated!

UPDATE: Thanks to MuS I was able to get this working. There may be a better regex, but below is an example search that shows the sed script in action.

index=_internal | head 1 | eval _raw = "2014-04-09 06:20:04,519 Outgoing IOLog <INTERFACE><Message><UserId>1234567</UserId><Key>1</Key><Item>Some stuff </\>`1234567890~!@#$%^&*()-=_+,.</Item><Key>2</Key><Item><This is some more text &#13; that starts with an angle bracket</Item></Message><Action/><MsgType>Menu</MsgType>"
| rex mode=sed "s/<([^\s\>]*)[^\>]*\>(((?![<]\/\w).)+)\<\/\1\>/ \1=\"\2\"/g"
| rex mode=sed "s/<INTERFACE>|<Message>|<\/Message>|<Action\/>//g"
0 Karma
1 Solution

MuS
SplunkTrust
SplunkTrust

Hi sc0tt,

UPDATE: try this as matching regex in your sed command:

<\w+>(((?![</]\w).)+)\<\/\w+>

hope that helps ...

cheers, MuS

View solution in original post

MuS
SplunkTrust
SplunkTrust

Hi sc0tt,

UPDATE: try this as matching regex in your sed command:

<\w+>(((?![</]\w).)+)\<\/\w+>

hope that helps ...

cheers, MuS

MuS
SplunkTrust
SplunkTrust

You're welcome and thx for accepting the answer 🙂

0 Karma

sc0tt
Builder

Many thanks! It wasn't exactly what I needed, but it helped me get it working for my needs. Another part of the issue was that the stuff between the tags could also start with a "<" character which caused more issues. I've updated the original question with a final working solution.

0 Karma

MuS
SplunkTrust
SplunkTrust

now it's perfect form my understanding...some regex gurus would probably find some tuning possibilities 🙂

0 Karma

sc0tt
Builder

Thanks for your help, patiently waiting...

0 Karma
Get Updates on the Splunk Community!

Splunk Observability for AI

Don’t miss out on an exciting Tech Talk on Splunk Observability for AI!Discover how Splunk’s agentic AI ...

Splunk Enterprise Security 8.x: The Essential Upgrade for Threat Detection, ...

Watch On Demand the Tech Talk on November 6 at 11AM PT, and empower your SOC to reach new heights! Duration: ...

Splunk Observability as Code: From Zero to Dashboard

For the details on what Self-Service Observability and Observability as Code is, we have some awesome content ...