Splunk Search

how to use rex command to rex out field starting from < and ending from >

m7787580
Explorer
Hi Splunker,

How would like to learn how can i rex out these fields names and i don't want to rex out startTimestamp and endTimestamp in it.

<activityName>TubeSales<activityName>
<activityStatus>Play<activityStatus>
<startTimestamp>Do not want to extract<startTimestamp>
<endTimestamp>Do not want to extract<endTimestamp>
<JourneyID>3DF62A1191152ED064B039AFD2C6A81E.node-app-1<JourneyID>
<startID>C3FE7047-E9EA-78DE-D719-8D3D66EF4A1F<startID>
<JourneyOrderPointsByProductCode>
<ProductCode>16<ProductCode>
<JourneyOrderPoints>130<JourneyOrderPoints>
<JourneyOrderPointsByProductCode>
<success>
<GetRequiredJourneyOrderPointsend>
</S:Body>

Thanks in advance

Tags (1)
0 Karma
1 Solution

DalJeanis
Legend

Two things -
1) To be proper XML or HTML, the second time the field is named, to close the tag, it must have a slash in front of it. Example:

 <activityName>TubeSales</activityName>

I'm going to assume that is the case, because otherwise you have much bigger problems than how to write the rex.

This one here will extract all the individual fields, including the two timestamps you don't want, but not including the multi-line JourneyOrderPointsByProductCode...

 \<(?<fieldname>\w+)\>(?<fieldvalue>[^\<]+)\<\/?\1\>

Here it is, built up with a negative assertion to ignore the two Timestamps...

\<(?!startTimestamp|endTimestamp)(?<fieldname>\w+)\>(?<fieldvalue>[^\<]+)\<\/?\1\> 

Both of those regexes will work for any tags that are opened and closed, even if they lack the slash in the end tag. If you verify that your markup language has the proper slashes on the close tags, then remove the very last question mark from both regexes.


Now, that all being said, you are much better off using @nikeynilay's advice and using the spath command.

View solution in original post

0 Karma

DalJeanis
Legend

Two things -
1) To be proper XML or HTML, the second time the field is named, to close the tag, it must have a slash in front of it. Example:

 <activityName>TubeSales</activityName>

I'm going to assume that is the case, because otherwise you have much bigger problems than how to write the rex.

This one here will extract all the individual fields, including the two timestamps you don't want, but not including the multi-line JourneyOrderPointsByProductCode...

 \<(?<fieldname>\w+)\>(?<fieldvalue>[^\<]+)\<\/?\1\>

Here it is, built up with a negative assertion to ignore the two Timestamps...

\<(?!startTimestamp|endTimestamp)(?<fieldname>\w+)\>(?<fieldvalue>[^\<]+)\<\/?\1\> 

Both of those regexes will work for any tags that are opened and closed, even if they lack the slash in the end tag. If you verify that your markup language has the proper slashes on the close tags, then remove the very last question mark from both regexes.


Now, that all being said, you are much better off using @nikeynilay's advice and using the spath command.

0 Karma

m7787580
Explorer

Hi DalJeanis,

It was great stuff,queried worked absolutely fine.

Just wanted to ask one question

<(?!startTimestamp|endTimestamp)(?\w+)>(?[^<]+)<\/?\1>

--> <\/?\1> <----

What is actually doing this thing i am able to understand the whole query but not the use of this last part and 1 which you have written in the last.

Thanks again it was really awesome stuff.

Regards,
Tarun Malhotra

0 Karma

DalJeanis
Legend

That whole thing is to find the closing tag for the same opening tag. That is how we avoid picking up the <success> keyword, because it is not followed by a close tag, so it is not calling out a field name and value.

\< means "match only the opening < of the next html/xml tag"
\/? means "match an optional slash \/ if it is there, but due to the ? if it is not there then that's okay too."
\1 means "match another copy of the first group that was previously matched... in this case that would be the group called fieldname"
\> means "match only the ending > of the html/xml tag"

0 Karma

m7787580
Explorer

Hi DalJeanis,

Thanks for the explanation it was really help full. 🙂

0 Karma

niketn
Legend

@m7787580, You should use spath (which is meant to parse XML or JSON data) to Output the fields you need.(http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Spath)

You should also see the feasibility of taking care of extracting XML data at the search time using KV_MODE = xml while defining the sourcetype (http://docs.splunk.com/Documentation/Splunk/latest/Admin/Propsconf)

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

horsefez
Motivator

Hi,

how about this one.

(?:\<activityName\>)(?<activityName>[^\<]+)(?:\<activityName\>)[\r\n](?:\<activityStatus\>)(?<activityStatus>[^\<]+)(?:\<activityStatus\>)[\r\n].+[\r\n].+[\r\n](?:\<JourneyID\>)(?<JourneyID>[^\<]+)(?:\<JourneyID\>)[\r\n](?:\<startID\>)(?<startID>[^\<]+)(?:\<startID\>)[\r\n].+[\r\n](?:\<ProductCode\>)(?<ProductCode>[^\<]+)(?:\<ProductCode\>)[\r\n](?:\<JourneyOrderPoints\>)(?<JourneyOrderPoints>[^\<]+)(?:\<JourneyOrderPoints\>)

https://regex101.com/r/AeXvXo/1

0 Karma

m7787580
Explorer
Hi Pyro_wood,

Thanks for the solution i understood.
but what if i don't want to write whole fields names  again and again.
We can see that all fields are staring from < and ending on />

Can this be possible if we right single rex command like
rex field=_raw starting from <(capturing Name)>(Capturing Value)</

As we can see all the fields are following same format present below starting from < and ending on </

<ProductCode>16</ProductCode>
<JourneyOrderPoints>130</JourneyOrderPoints>

If i can have single standard rex query then i can run it on any service irrespective of any field name and value.

Thanks in advance
0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...