Splunk Search

Clarification on regular expression to extract XML field as multivalue field and related fields?

alexandermunce
Communicator

I am working with a datasource which contains multiple instances of an XML value which exists similarly to this:

(WITHOUT THE SPACES)

Sample Event 1:

< ab1:search-name >ABC-123-XYZ < / ab1:ab1:search-name >
< ab1:search-response > YES < / ab1:search-response >
< ab1:search-relevance > 15 < / ab1:search-relevance >
< ab1:analysis / >
< ab1:search-name >ABC-001-PROD  < / ab1:search-name >
< ab1:search-response > NO < / ab1:search-response >
< ab1:search-relevance > 25 < / ab1:search-relevance >
< ab1:analysis / >

Sample Event 2:

< ab1:search-name >ABC-123-XYZ < / ab1:search-name >
< ab1:search-response > YES < / ab1:search-response >
< ab1:search-relevance > 10 < / ab1:search-relevance >
< ab1:analysis / >
< ab1:search-name >ABC-001-PROD  < / ab1:search-name >
< ab1:search-response > YES < / ab1:search-response >
< ab1:search-relevance > 20 < / ab1:search-relevance >
< ab1:analysis / >

I am wanting to write a REGEX expression to use with the rex command to extract the data contained within the < ab1:search-name > < / ab1:search-name > tags into a new field named search_name.

I understand that as there are multiple instances of this field in each event that the new field will have to be a multivalue field (which will require the rex max_match=0 argument).

Therefore I believe my command should look something similar to the below;

search query | rex [REGEX EXPRESSION] max_match=0

I have tried various REGEX expressions but am having trouble with the fact the data contains characters and symbols.

RELATED QUESTION

As you can see in the data sample above - each instance of the < ab1:search-name > < / ab1:search-name > tag has related values beneath it (search-response, search-relevance, etc).

My question is, once the rex command extracts the field, will there be any way to relate the data which fell beneath each instance of the tag to each individual value - or will the correlation be lost?

0 Karma

woodcock
Esteemed Legend

Like this:

| rex max_match=0 "(?ms)ab1:search-name>(?<search_name>[^<]+)"
| rex max_match=0 "(?ms)ab1:search-response>(?<search_response>[^<]+)"
| rex max_match=0 "(?ms)ab1:search-relevance>(?<search_relevance>[^<]+)"
| eval raw = mvzip(search_name, search_response, "::")
| eval raw = mvzip(raw, search_relevance, "::")
| rex field=raw mode=sed "s/[\s\r\n:]*$//"
| fields _time raw
| mvexpand raw
| rename raw AS _raw
| eval len=len(_raw)
| search len>0
| rex "^(?<search_name>.*)::(?<search_response>.*)::(?<search_relevance>.*)"
0 Karma

alexandermunce
Communicator

I believe I have figured out the required REGEX expression - as below:

search query | rex "ab1:search-name>(?<search_name>.+)<" max_match=0

It seems to match as required.

0 Karma

alexandermunce
Communicator

However still seem to be getting some strange results - will have to investigate further.
Edit: This was an issue with the data - the REGEX expression seems to be OK.

0 Karma

sundareshr
Legend

Try this "run-anywhere" sample. Use from the first rex till the end for your data.

| makeresults 
| eval x="<ab1:search-name>ABC-123-XYZ</ab1:ab1:search-name>
    <ab1:search-response>YES</ab1:search-response>
    <ab1:search-relevance>15</ab1:search-relevance>
    <ab1:analysis/>
    <ab1:search-name>ABC-001-PROD</ab1:search-name>
    <ab1:search-response>NO</ab1:search-response>
    <ab1:search-relevance>25</ab1:search-relevance>
    <ab1:analysis/>;
    <ab1:search-name>ABC-123-XYZ</ab1:search-name>
    <ab1:search-response>YES</ab1:search-response>
    <ab1:search-relevance>10</ab1:search-relevance>
    <ab1:analysis/>
    <ab1:search-name>ABC-001-PROD</ab1:search-name>
    <ab1:search-response>YES</ab1:search-response>
    <ab1:search-relevance>20</ab1:search-relevance>
    <ab1:analysis/>" 
| makemv x delim=";" 
| mvexpand x 
| rex max_match=0 field=x "-name\>(?<name>[^\<]+)" 
| rex max_match=0 field=x "-response\>(?<response>[^\<]+)" 
| rex max_match=0 field=x "-relevance\>(?<relevance>[^\<]+)" 
| eval z=mvzip(name, mvzip(response, relevance)) 
| mvexpand z 
| rex field=z "(?<name>[^,]+),(?<response>[^,]+),(?<relevance>[^,]+)" 
| table name response relevance
0 Karma
Get Updates on the Splunk Community!

Index This | What is broken 80% of the time by February?

December 2025 Edition   Hayyy Splunk Education Enthusiasts and the Eternally Curious!    We’re back with this ...

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Hello Splunk Community,   We're thrilled to share an exciting update that will help you manage your data more ...

Splunk MCP & Agentic AI: Machine Data Without Limits

Discover how the Splunk Model Context Protocol (MCP) Server can revolutionize the way your organization uses ...