Dashboards & Visualizations

Another regular expression question -- need help with "</"

sc0tt
Builder

This is similar to a question that I previously asked here

I am using a sed script to format a log at index time. The raw data contains some XML tags that can include any type of characters. Through previous help from here and lots of trial and error, I am able to extract everything I need except for in cases when I encounter a </ inside a tag. I don't care about the outer tags (i.e. <Message>, just the opening and closing tags that immediately follow each other.

Sample data:

2014-04-09 06:20:04,519 Outgoing IOLog <INTERFACE><Message><UserId>1234567</UserId><Key>1</Key><Item>Some stuff </\>`1234567890~!@#$%^&*()-=_+,.</Item><Key>2</Key><Item>This is some more text &#13;</Item></Message><Action/><MsgType>Menu</MsgType>

Sed script:

s/<([^\s\>]*)[^\>]*\>(((?![<]\/).)+)\<\/\1\>/ \1="\2"/g

This almost works, but fails to capture the first Item value because of </. Is there a way that I can get this to work? The final pairs should be

UserId"=1234567" Key="1" Item="Some stuff </\>`1234567890~!@#$%^&*()-=_+,." Key="2" Item="This is some more text &#13;" MsgType="Menu"

Any help would be greatly appreciated!

UPDATE: Thanks to MuS I was able to get this working. There may be a better regex, but below is an example search that shows the sed script in action.

index=_internal | head 1 | eval _raw = "2014-04-09 06:20:04,519 Outgoing IOLog <INTERFACE><Message><UserId>1234567</UserId><Key>1</Key><Item>Some stuff </\>`1234567890~!@#$%^&*()-=_+,.</Item><Key>2</Key><Item><This is some more text &#13; that starts with an angle bracket</Item></Message><Action/><MsgType>Menu</MsgType>"
| rex mode=sed "s/<([^\s\>]*)[^\>]*\>(((?![<]\/\w).)+)\<\/\1\>/ \1=\"\2\"/g"
| rex mode=sed "s/<INTERFACE>|<Message>|<\/Message>|<Action\/>//g"
0 Karma
1 Solution

MuS
Legend

Hi sc0tt,

UPDATE: try this as matching regex in your sed command:

<\w+>(((?![</]\w).)+)\<\/\w+>

hope that helps ...

cheers, MuS

View solution in original post

MuS
Legend

Hi sc0tt,

UPDATE: try this as matching regex in your sed command:

<\w+>(((?![</]\w).)+)\<\/\w+>

hope that helps ...

cheers, MuS

MuS
Legend

You're welcome and thx for accepting the answer 🙂

0 Karma

sc0tt
Builder

Many thanks! It wasn't exactly what I needed, but it helped me get it working for my needs. Another part of the issue was that the stuff between the tags could also start with a "<" character which caused more issues. I've updated the original question with a final working solution.

0 Karma

MuS
Legend

now it's perfect form my understanding...some regex gurus would probably find some tuning possibilities 🙂

0 Karma

sc0tt
Builder

Thanks for your help, patiently waiting...

0 Karma
Get Updates on the Splunk Community!

Optimize Cloud Monitoring

  TECH TALKS Optimize Cloud Monitoring Tuesday, August 13, 2024  |  11:00AM–12:00PM PST   Register to ...

What's New in Splunk Cloud Platform 9.2.2403?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.2.2403! Analysts can ...

Stay Connected: Your Guide to July and August Tech Talks, Office Hours, and Webinars!

Dive into our sizzling summer lineup for July and August Community Office Hours and Tech Talks. Scroll down to ...