Splunk Search

Clarification on regular expression to extract XML field as multivalue field and related fields?

alexandermunce
Communicator

I am working with a datasource which contains multiple instances of an XML value which exists similarly to this:

(WITHOUT THE SPACES)

Sample Event 1:

< ab1:search-name >ABC-123-XYZ < / ab1:ab1:search-name >
< ab1:search-response > YES < / ab1:search-response >
< ab1:search-relevance > 15 < / ab1:search-relevance >
< ab1:analysis / >
< ab1:search-name >ABC-001-PROD  < / ab1:search-name >
< ab1:search-response > NO < / ab1:search-response >
< ab1:search-relevance > 25 < / ab1:search-relevance >
< ab1:analysis / >

Sample Event 2:

< ab1:search-name >ABC-123-XYZ < / ab1:search-name >
< ab1:search-response > YES < / ab1:search-response >
< ab1:search-relevance > 10 < / ab1:search-relevance >
< ab1:analysis / >
< ab1:search-name >ABC-001-PROD  < / ab1:search-name >
< ab1:search-response > YES < / ab1:search-response >
< ab1:search-relevance > 20 < / ab1:search-relevance >
< ab1:analysis / >

I am wanting to write a REGEX expression to use with the rex command to extract the data contained within the < ab1:search-name > < / ab1:search-name > tags into a new field named search_name.

I understand that as there are multiple instances of this field in each event that the new field will have to be a multivalue field (which will require the rex max_match=0 argument).

Therefore I believe my command should look something similar to the below;

search query | rex [REGEX EXPRESSION] max_match=0

I have tried various REGEX expressions but am having trouble with the fact the data contains characters and symbols.

RELATED QUESTION

As you can see in the data sample above - each instance of the < ab1:search-name > < / ab1:search-name > tag has related values beneath it (search-response, search-relevance, etc).

My question is, once the rex command extracts the field, will there be any way to relate the data which fell beneath each instance of the tag to each individual value - or will the correlation be lost?

0 Karma

woodcock
Esteemed Legend

Like this:

| rex max_match=0 "(?ms)ab1:search-name>(?<search_name>[^<]+)"
| rex max_match=0 "(?ms)ab1:search-response>(?<search_response>[^<]+)"
| rex max_match=0 "(?ms)ab1:search-relevance>(?<search_relevance>[^<]+)"
| eval raw = mvzip(search_name, search_response, "::")
| eval raw = mvzip(raw, search_relevance, "::")
| rex field=raw mode=sed "s/[\s\r\n:]*$//"
| fields _time raw
| mvexpand raw
| rename raw AS _raw
| eval len=len(_raw)
| search len>0
| rex "^(?<search_name>.*)::(?<search_response>.*)::(?<search_relevance>.*)"
0 Karma

alexandermunce
Communicator

I believe I have figured out the required REGEX expression - as below:

search query | rex "ab1:search-name>(?<search_name>.+)<" max_match=0

It seems to match as required.

0 Karma

alexandermunce
Communicator

However still seem to be getting some strange results - will have to investigate further.
Edit: This was an issue with the data - the REGEX expression seems to be OK.

0 Karma

sundareshr
Legend

Try this "run-anywhere" sample. Use from the first rex till the end for your data.

| makeresults 
| eval x="<ab1:search-name>ABC-123-XYZ</ab1:ab1:search-name>
    <ab1:search-response>YES</ab1:search-response>
    <ab1:search-relevance>15</ab1:search-relevance>
    <ab1:analysis/>
    <ab1:search-name>ABC-001-PROD</ab1:search-name>
    <ab1:search-response>NO</ab1:search-response>
    <ab1:search-relevance>25</ab1:search-relevance>
    <ab1:analysis/>;
    <ab1:search-name>ABC-123-XYZ</ab1:search-name>
    <ab1:search-response>YES</ab1:search-response>
    <ab1:search-relevance>10</ab1:search-relevance>
    <ab1:analysis/>
    <ab1:search-name>ABC-001-PROD</ab1:search-name>
    <ab1:search-response>YES</ab1:search-response>
    <ab1:search-relevance>20</ab1:search-relevance>
    <ab1:analysis/>" 
| makemv x delim=";" 
| mvexpand x 
| rex max_match=0 field=x "-name\>(?<name>[^\<]+)" 
| rex max_match=0 field=x "-response\>(?<response>[^\<]+)" 
| rex max_match=0 field=x "-relevance\>(?<relevance>[^\<]+)" 
| eval z=mvzip(name, mvzip(response, relevance)) 
| mvexpand z 
| rex field=z "(?<name>[^,]+),(?<response>[^,]+),(?<relevance>[^,]+)" 
| table name response relevance
0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Index This | What travels the world but is also stuck in place?

April 2026 Edition  Hayyy Splunk Education Enthusiasts and the Eternally Curious!   We’re back with this ...

Discover New Use Cases: Unlock Greater Value from Your Existing Splunk Data

Realizing the full potential of your Splunk investment requires more than just understanding current usage; it ...

Continue Your Journey: Join Session 2 of the Data Management and Federation Bootcamp ...

As data volumes continue to grow and environments become more distributed, managing and optimizing data ...