Re: Clarification on regular expression to extract...

alexandermunce · ‎12-20-2016

I am working with a datasource which contains multiple instances of an XML value which exists similarly to this:

(WITHOUT THE SPACES)

Sample Event 1:

< ab1:search-name >ABC-123-XYZ < / ab1:ab1:search-name >
< ab1:search-response > YES < / ab1:search-response >
< ab1:search-relevance > 15 < / ab1:search-relevance >
< ab1:analysis / >
< ab1:search-name >ABC-001-PROD  < / ab1:search-name >
< ab1:search-response > NO < / ab1:search-response >
< ab1:search-relevance > 25 < / ab1:search-relevance >
< ab1:analysis / >

Sample Event 2:

< ab1:search-name >ABC-123-XYZ < / ab1:search-name >
< ab1:search-response > YES < / ab1:search-response >
< ab1:search-relevance > 10 < / ab1:search-relevance >
< ab1:analysis / >
< ab1:search-name >ABC-001-PROD  < / ab1:search-name >
< ab1:search-response > YES < / ab1:search-response >
< ab1:search-relevance > 20 < / ab1:search-relevance >
< ab1:analysis / >

I am wanting to write a REGEX expression to use with the rex command to extract the data contained within the < ab1:search-name > < / ab1:search-name > tags into a new field named search_name.

I understand that as there are multiple instances of this field in each event that the new field will have to be a multivalue field (which will require the rex max_match=0 argument).

Therefore I believe my command should look something similar to the below;

search query | rex [REGEX EXPRESSION] max_match=0

I have tried various REGEX expressions but am having trouble with the fact the data contains characters and symbols.

RELATED QUESTION

As you can see in the data sample above - each instance of the < ab1:search-name > < / ab1:search-name > tag has related values beneath it (search-response, search-relevance, etc).

My question is, once the rex command extracts the field, will there be any way to relate the data which fell beneath each instance of the tag to each individual value - or will the correlation be lost?

woodcock · ‎03-13-2017

Like this:

| rex max_match=0 "(?ms)ab1:search-name>(?<search_name>[^<]+)"
| rex max_match=0 "(?ms)ab1:search-response>(?<search_response>[^<]+)"
| rex max_match=0 "(?ms)ab1:search-relevance>(?<search_relevance>[^<]+)"
| eval raw = mvzip(search_name, search_response, "::")
| eval raw = mvzip(raw, search_relevance, "::")
| rex field=raw mode=sed "s/[\s\r\n:]*$//"
| fields _time raw
| mvexpand raw
| rename raw AS _raw
| eval len=len(_raw)
| search len>0
| rex "^(?<search_name>.*)::(?<search_response>.*)::(?<search_relevance>.*)"

alexandermunce · ‎12-20-2016

I believe I have figured out the required REGEX expression - as below:

search query | rex "ab1:search-name>(?<search_name>.+)<" max_match=0

It seems to match as required.

alexandermunce · ‎12-20-2016

However still seem to be getting some strange results - will have to investigate further.
Edit: This was an issue with the data - the REGEX expression seems to be OK.

sundareshr · ‎12-20-2016

Try this "run-anywhere" sample. Use from the first rex till the end for your data.

| makeresults 
| eval x="<ab1:search-name>ABC-123-XYZ</ab1:ab1:search-name>
    <ab1:search-response>YES</ab1:search-response>
    <ab1:search-relevance>15</ab1:search-relevance>
    <ab1:analysis/>
    <ab1:search-name>ABC-001-PROD</ab1:search-name>
    <ab1:search-response>NO</ab1:search-response>
    <ab1:search-relevance>25</ab1:search-relevance>
    <ab1:analysis/>;
    <ab1:search-name>ABC-123-XYZ</ab1:search-name>
    <ab1:search-response>YES</ab1:search-response>
    <ab1:search-relevance>10</ab1:search-relevance>
    <ab1:analysis/>
    <ab1:search-name>ABC-001-PROD</ab1:search-name>
    <ab1:search-response>YES</ab1:search-response>
    <ab1:search-relevance>20</ab1:search-relevance>
    <ab1:analysis/>" 
| makemv x delim=";" 
| mvexpand x 
| rex max_match=0 field=x "-name\>(?<name>[^\<]+)" 
| rex max_match=0 field=x "-response\>(?<response>[^\<]+)" 
| rex max_match=0 field=x "-relevance\>(?<relevance>[^\<]+)" 
| eval z=mvzip(name, mvzip(response, relevance)) 
| mvexpand z 
| rex field=z "(?<name>[^,]+),(?<response>[^,]+),(?<relevance>[^,]+)" 
| table name response relevance

Clarification on regular expression to extract XML field as multivalue field and related fields?

What the End of Support for Splunk Add-on Builder Means for You

Solve, Learn, Repeat: New Puzzle Channel Now Live

Building Reliable Asset and Identity Frameworks in Splunk ES

Are you a member of the Splunk Community?

Clarification on regular expression to extract XML field as multivalue field and related fields?

What the End of Support for Splunk Add-on Builder Means for You

Solve, Learn, Repeat: New Puzzle Channel Now Live

Building Reliable Asset and Identity Frameworks in Splunk ES