Splunk Search

Clarification on regular expression to extract XML field as multivalue field and related fields?

alexandermunce
Communicator

I am working with a datasource which contains multiple instances of an XML value which exists similarly to this:

(WITHOUT THE SPACES)

Sample Event 1:

< ab1:search-name >ABC-123-XYZ < / ab1:ab1:search-name >
< ab1:search-response > YES < / ab1:search-response >
< ab1:search-relevance > 15 < / ab1:search-relevance >
< ab1:analysis / >
< ab1:search-name >ABC-001-PROD  < / ab1:search-name >
< ab1:search-response > NO < / ab1:search-response >
< ab1:search-relevance > 25 < / ab1:search-relevance >
< ab1:analysis / >

Sample Event 2:

< ab1:search-name >ABC-123-XYZ < / ab1:search-name >
< ab1:search-response > YES < / ab1:search-response >
< ab1:search-relevance > 10 < / ab1:search-relevance >
< ab1:analysis / >
< ab1:search-name >ABC-001-PROD  < / ab1:search-name >
< ab1:search-response > YES < / ab1:search-response >
< ab1:search-relevance > 20 < / ab1:search-relevance >
< ab1:analysis / >

I am wanting to write a REGEX expression to use with the rex command to extract the data contained within the < ab1:search-name > < / ab1:search-name > tags into a new field named search_name.

I understand that as there are multiple instances of this field in each event that the new field will have to be a multivalue field (which will require the rex max_match=0 argument).

Therefore I believe my command should look something similar to the below;

search query | rex [REGEX EXPRESSION] max_match=0

I have tried various REGEX expressions but am having trouble with the fact the data contains characters and symbols.

RELATED QUESTION

As you can see in the data sample above - each instance of the < ab1:search-name > < / ab1:search-name > tag has related values beneath it (search-response, search-relevance, etc).

My question is, once the rex command extracts the field, will there be any way to relate the data which fell beneath each instance of the tag to each individual value - or will the correlation be lost?

0 Karma

woodcock
Esteemed Legend

Like this:

| rex max_match=0 "(?ms)ab1:search-name>(?<search_name>[^<]+)"
| rex max_match=0 "(?ms)ab1:search-response>(?<search_response>[^<]+)"
| rex max_match=0 "(?ms)ab1:search-relevance>(?<search_relevance>[^<]+)"
| eval raw = mvzip(search_name, search_response, "::")
| eval raw = mvzip(raw, search_relevance, "::")
| rex field=raw mode=sed "s/[\s\r\n:]*$//"
| fields _time raw
| mvexpand raw
| rename raw AS _raw
| eval len=len(_raw)
| search len>0
| rex "^(?<search_name>.*)::(?<search_response>.*)::(?<search_relevance>.*)"
0 Karma

alexandermunce
Communicator

I believe I have figured out the required REGEX expression - as below:

search query | rex "ab1:search-name>(?<search_name>.+)<" max_match=0

It seems to match as required.

0 Karma

alexandermunce
Communicator

However still seem to be getting some strange results - will have to investigate further.
Edit: This was an issue with the data - the REGEX expression seems to be OK.

0 Karma

sundareshr
Legend

Try this "run-anywhere" sample. Use from the first rex till the end for your data.

| makeresults 
| eval x="<ab1:search-name>ABC-123-XYZ</ab1:ab1:search-name>
    <ab1:search-response>YES</ab1:search-response>
    <ab1:search-relevance>15</ab1:search-relevance>
    <ab1:analysis/>
    <ab1:search-name>ABC-001-PROD</ab1:search-name>
    <ab1:search-response>NO</ab1:search-response>
    <ab1:search-relevance>25</ab1:search-relevance>
    <ab1:analysis/>;
    <ab1:search-name>ABC-123-XYZ</ab1:search-name>
    <ab1:search-response>YES</ab1:search-response>
    <ab1:search-relevance>10</ab1:search-relevance>
    <ab1:analysis/>
    <ab1:search-name>ABC-001-PROD</ab1:search-name>
    <ab1:search-response>YES</ab1:search-response>
    <ab1:search-relevance>20</ab1:search-relevance>
    <ab1:analysis/>" 
| makemv x delim=";" 
| mvexpand x 
| rex max_match=0 field=x "-name\>(?<name>[^\<]+)" 
| rex max_match=0 field=x "-response\>(?<response>[^\<]+)" 
| rex max_match=0 field=x "-relevance\>(?<relevance>[^\<]+)" 
| eval z=mvzip(name, mvzip(response, relevance)) 
| mvexpand z 
| rex field=z "(?<name>[^,]+),(?<response>[^,]+),(?<relevance>[^,]+)" 
| table name response relevance
0 Karma
Get Updates on the Splunk Community!

Index This | Why did the turkey cross the road?

November 2025 Edition  Hayyy Splunk Education Enthusiasts and the Eternally Curious!   We’re back with this ...

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

  &#x1f680; Your data just got a serious AI upgrade — are you ready? Say hello to the Agentic Era with the ...

Feel the Splunk Love: Real Stories from Real Customers

Hello Splunk Community,    What’s the best part of hearing how our customers use Splunk? Easy: the positive ...