I am working with a datasource which contains multiple instances of an XML value which exists similarly to this:
(WITHOUT THE SPACES)
Sample Event 1:
< ab1:search-name >ABC-123-XYZ < / ab1:ab1:search-name >
< ab1:search-response > YES < / ab1:search-response >
< ab1:search-relevance > 15 < / ab1:search-relevance >
< ab1:analysis / >
< ab1:search-name >ABC-001-PROD < / ab1:search-name >
< ab1:search-response > NO < / ab1:search-response >
< ab1:search-relevance > 25 < / ab1:search-relevance >
< ab1:analysis / >
Sample Event 2:
< ab1:search-name >ABC-123-XYZ < / ab1:search-name >
< ab1:search-response > YES < / ab1:search-response >
< ab1:search-relevance > 10 < / ab1:search-relevance >
< ab1:analysis / >
< ab1:search-name >ABC-001-PROD < / ab1:search-name >
< ab1:search-response > YES < / ab1:search-response >
< ab1:search-relevance > 20 < / ab1:search-relevance >
< ab1:analysis / >
I am wanting to write a REGEX expression to use with the rex command to extract the data contained within the < ab1:search-name > < / ab1:search-name > tags into a new field named search_name.
I understand that as there are multiple instances of this field in each event that the new field will have to be a multivalue field (which will require the rex max_match=0 argument).
Therefore I believe my command should look something similar to the below;
search query | rex [REGEX EXPRESSION] max_match=0
I have tried various REGEX expressions but am having trouble with the fact the data contains characters and symbols.
RELATED QUESTION
As you can see in the data sample above - each instance of the < ab1:search-name > < / ab1:search-name > tag has related values beneath it (search-response, search-relevance, etc).
My question is, once the rex command extracts the field, will there be any way to relate the data which fell beneath each instance of the tag to each individual value - or will the correlation be lost?
Like this:
| rex max_match=0 "(?ms)ab1:search-name>(?<search_name>[^<]+)"
| rex max_match=0 "(?ms)ab1:search-response>(?<search_response>[^<]+)"
| rex max_match=0 "(?ms)ab1:search-relevance>(?<search_relevance>[^<]+)"
| eval raw = mvzip(search_name, search_response, "::")
| eval raw = mvzip(raw, search_relevance, "::")
| rex field=raw mode=sed "s/[\s\r\n:]*$//"
| fields _time raw
| mvexpand raw
| rename raw AS _raw
| eval len=len(_raw)
| search len>0
| rex "^(?<search_name>.*)::(?<search_response>.*)::(?<search_relevance>.*)"
I believe I have figured out the required REGEX expression - as below:
search query | rex "ab1:search-name>(?<search_name>.+)<" max_match=0
It seems to match as required.
However still seem to be getting some strange results - will have to investigate further.
Edit: This was an issue with the data - the REGEX expression seems to be OK.
Try this "run-anywhere" sample. Use from the first rex
till the end for your data.
| makeresults
| eval x="<ab1:search-name>ABC-123-XYZ</ab1:ab1:search-name>
<ab1:search-response>YES</ab1:search-response>
<ab1:search-relevance>15</ab1:search-relevance>
<ab1:analysis/>
<ab1:search-name>ABC-001-PROD</ab1:search-name>
<ab1:search-response>NO</ab1:search-response>
<ab1:search-relevance>25</ab1:search-relevance>
<ab1:analysis/>;
<ab1:search-name>ABC-123-XYZ</ab1:search-name>
<ab1:search-response>YES</ab1:search-response>
<ab1:search-relevance>10</ab1:search-relevance>
<ab1:analysis/>
<ab1:search-name>ABC-001-PROD</ab1:search-name>
<ab1:search-response>YES</ab1:search-response>
<ab1:search-relevance>20</ab1:search-relevance>
<ab1:analysis/>"
| makemv x delim=";"
| mvexpand x
| rex max_match=0 field=x "-name\>(?<name>[^\<]+)"
| rex max_match=0 field=x "-response\>(?<response>[^\<]+)"
| rex max_match=0 field=x "-relevance\>(?<relevance>[^\<]+)"
| eval z=mvzip(name, mvzip(response, relevance))
| mvexpand z
| rex field=z "(?<name>[^,]+),(?<response>[^,]+),(?<relevance>[^,]+)"
| table name response relevance