Solved: how to use rex command to rex out field starting f...

m7787580 · ‎06-20-2017

Hi Splunker,

How would like to learn how can i rex out these fields names and i don't want to rex out startTimestamp and endTimestamp in it.

<activityName>TubeSales<activityName>
<activityStatus>Play<activityStatus>
<startTimestamp>Do not want to extract<startTimestamp>
<endTimestamp>Do not want to extract<endTimestamp>
<JourneyID>3DF62A1191152ED064B039AFD2C6A81E.node-app-1<JourneyID>
<startID>C3FE7047-E9EA-78DE-D719-8D3D66EF4A1F<startID>
<JourneyOrderPointsByProductCode>
<ProductCode>16<ProductCode>
<JourneyOrderPoints>130<JourneyOrderPoints>
<JourneyOrderPointsByProductCode>
<success>
<GetRequiredJourneyOrderPointsend>
</S:Body>

Thanks in advance

DalJeanis · ‎06-20-2017

Two things -
1) To be proper XML or HTML, the second time the field is named, to close the tag, it must have a slash in front of it. Example:

 <activityName>TubeSales</activityName>

I'm going to assume that is the case, because otherwise you have much bigger problems than how to write the rex.

This one here will extract all the individual fields, including the two timestamps you don't want, but not including the multi-line JourneyOrderPointsByProductCode...

 \<(?<fieldname>\w+)\>(?<fieldvalue>[^\<]+)\<\/?\1\>

Here it is, built up with a negative assertion to ignore the two Timestamps...

\<(?!startTimestamp|endTimestamp)(?<fieldname>\w+)\>(?<fieldvalue>[^\<]+)\<\/?\1\>

Both of those regexes will work for any tags that are opened and closed, even if they lack the slash in the end tag. If you verify that your markup language has the proper slashes on the close tags, then remove the very last question mark from both regexes.

Now, that all being said, you are much better off using @nikeynilay's advice and using the spath command.

View solution in original post

DalJeanis · ‎06-20-2017

Two things -
1) To be proper XML or HTML, the second time the field is named, to close the tag, it must have a slash in front of it. Example:

 <activityName>TubeSales</activityName>

I'm going to assume that is the case, because otherwise you have much bigger problems than how to write the rex.

This one here will extract all the individual fields, including the two timestamps you don't want, but not including the multi-line JourneyOrderPointsByProductCode...

 \<(?<fieldname>\w+)\>(?<fieldvalue>[^\<]+)\<\/?\1\>

Here it is, built up with a negative assertion to ignore the two Timestamps...

\<(?!startTimestamp|endTimestamp)(?<fieldname>\w+)\>(?<fieldvalue>[^\<]+)\<\/?\1\>

Both of those regexes will work for any tags that are opened and closed, even if they lack the slash in the end tag. If you verify that your markup language has the proper slashes on the close tags, then remove the very last question mark from both regexes.

Now, that all being said, you are much better off using @nikeynilay's advice and using the spath command.

m7787580 · ‎06-20-2017

Hi DalJeanis,

It was great stuff,queried worked absolutely fine.

Just wanted to ask one question

<(?!startTimestamp|endTimestamp)(?\w+)>(?[^<]+)<\/?\1>

--> <\/?\1> <----

What is actually doing this thing i am able to understand the whole query but not the use of this last part and 1 which you have written in the last.

Thanks again it was really awesome stuff.

Regards,
Tarun Malhotra

DalJeanis · ‎06-20-2017

That whole thing is to find the closing tag for the same opening tag. That is how we avoid picking up the <success> keyword, because it is not followed by a close tag, so it is not calling out a field name and value.

\< means "match only the opening < of the next html/xml tag"
\/? means "match an optional slash \/ if it is there, but due to the ? if it is not there then that's okay too."
\1 means "match another copy of the first group that was previously matched... in this case that would be the group called fieldname"
\> means "match only the ending > of the html/xml tag"

m7787580 · ‎06-21-2017

Hi DalJeanis,

Thanks for the explanation it was really help full. 🙂

niketn · ‎06-20-2017

@m7787580, You should use spath (which is meant to parse XML or JSON data) to Output the fields you need.(http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Spath)

You should also see the feasibility of taking care of extracting XML data at the search time using KV_MODE = xml while defining the sourcetype (http://docs.splunk.com/Documentation/Splunk/latest/Admin/Propsconf)

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

horsefez · ‎06-20-2017

Hi,

how about this one.

(?:\<activityName\>)(?<activityName>[^\<]+)(?:\<activityName\>)[\r\n](?:\<activityStatus\>)(?<activityStatus>[^\<]+)(?:\<activityStatus\>)[\r\n].+[\r\n].+[\r\n](?:\<JourneyID\>)(?<JourneyID>[^\<]+)(?:\<JourneyID\>)[\r\n](?:\<startID\>)(?<startID>[^\<]+)(?:\<startID\>)[\r\n].+[\r\n](?:\<ProductCode\>)(?<ProductCode>[^\<]+)(?:\<ProductCode\>)[\r\n](?:\<JourneyOrderPoints\>)(?<JourneyOrderPoints>[^\<]+)(?:\<JourneyOrderPoints\>)

https://regex101.com/r/AeXvXo/1

m7787580 · ‎06-20-2017

Hi Pyro_wood,

Thanks for the solution i understood.
but what if i don't want to write whole fields names  again and again.
We can see that all fields are staring from < and ending on />

Can this be possible if we right single rex command like
rex field=_raw starting from <(capturing Name)>(Capturing Value)</

As we can see all the fields are following same format present below starting from < and ending on </

<ProductCode>16</ProductCode>
<JourneyOrderPoints>130</JourneyOrderPoints>

If i can have single standard rex query then i can run it on any service irrespective of any field name and value.

Thanks in advance

how to use rex command to rex out field starting from < and ending from >

Can’t make it to .conf25? Join us online!

Community Content Calendar, September edition

Splunkbase Unveils New App Listing Management Public Preview

Leveraging Automated Threat Analysis Across the Splunk Ecosystem

Are you a member of the Splunk Community?

how to use rex command to rex out field starting from < and ending from >

Can’t make it to .conf25? Join us online!

Community Content Calendar, September edition

Splunkbase Unveils New App Listing Management Public Preview

Leveraging Automated Threat Analysis Across the Splunk Ecosystem