Splunk Search

Clarification on regular expression to extract XML field as multivalue field and related fields?

alexandermunce
Communicator

I am working with a datasource which contains multiple instances of an XML value which exists similarly to this:

(WITHOUT THE SPACES)

Sample Event 1:

< ab1:search-name >ABC-123-XYZ < / ab1:ab1:search-name >
< ab1:search-response > YES < / ab1:search-response >
< ab1:search-relevance > 15 < / ab1:search-relevance >
< ab1:analysis / >
< ab1:search-name >ABC-001-PROD  < / ab1:search-name >
< ab1:search-response > NO < / ab1:search-response >
< ab1:search-relevance > 25 < / ab1:search-relevance >
< ab1:analysis / >

Sample Event 2:

< ab1:search-name >ABC-123-XYZ < / ab1:search-name >
< ab1:search-response > YES < / ab1:search-response >
< ab1:search-relevance > 10 < / ab1:search-relevance >
< ab1:analysis / >
< ab1:search-name >ABC-001-PROD  < / ab1:search-name >
< ab1:search-response > YES < / ab1:search-response >
< ab1:search-relevance > 20 < / ab1:search-relevance >
< ab1:analysis / >

I am wanting to write a REGEX expression to use with the rex command to extract the data contained within the < ab1:search-name > < / ab1:search-name > tags into a new field named search_name.

I understand that as there are multiple instances of this field in each event that the new field will have to be a multivalue field (which will require the rex max_match=0 argument).

Therefore I believe my command should look something similar to the below;

search query | rex [REGEX EXPRESSION] max_match=0

I have tried various REGEX expressions but am having trouble with the fact the data contains characters and symbols.

RELATED QUESTION

As you can see in the data sample above - each instance of the < ab1:search-name > < / ab1:search-name > tag has related values beneath it (search-response, search-relevance, etc).

My question is, once the rex command extracts the field, will there be any way to relate the data which fell beneath each instance of the tag to each individual value - or will the correlation be lost?

0 Karma

woodcock
Esteemed Legend

Like this:

| rex max_match=0 "(?ms)ab1:search-name>(?<search_name>[^<]+)"
| rex max_match=0 "(?ms)ab1:search-response>(?<search_response>[^<]+)"
| rex max_match=0 "(?ms)ab1:search-relevance>(?<search_relevance>[^<]+)"
| eval raw = mvzip(search_name, search_response, "::")
| eval raw = mvzip(raw, search_relevance, "::")
| rex field=raw mode=sed "s/[\s\r\n:]*$//"
| fields _time raw
| mvexpand raw
| rename raw AS _raw
| eval len=len(_raw)
| search len>0
| rex "^(?<search_name>.*)::(?<search_response>.*)::(?<search_relevance>.*)"
0 Karma

alexandermunce
Communicator

I believe I have figured out the required REGEX expression - as below:

search query | rex "ab1:search-name>(?<search_name>.+)<" max_match=0

It seems to match as required.

0 Karma

alexandermunce
Communicator

However still seem to be getting some strange results - will have to investigate further.
Edit: This was an issue with the data - the REGEX expression seems to be OK.

0 Karma

sundareshr
Legend

Try this "run-anywhere" sample. Use from the first rex till the end for your data.

| makeresults 
| eval x="<ab1:search-name>ABC-123-XYZ</ab1:ab1:search-name>
    <ab1:search-response>YES</ab1:search-response>
    <ab1:search-relevance>15</ab1:search-relevance>
    <ab1:analysis/>
    <ab1:search-name>ABC-001-PROD</ab1:search-name>
    <ab1:search-response>NO</ab1:search-response>
    <ab1:search-relevance>25</ab1:search-relevance>
    <ab1:analysis/>;
    <ab1:search-name>ABC-123-XYZ</ab1:search-name>
    <ab1:search-response>YES</ab1:search-response>
    <ab1:search-relevance>10</ab1:search-relevance>
    <ab1:analysis/>
    <ab1:search-name>ABC-001-PROD</ab1:search-name>
    <ab1:search-response>YES</ab1:search-response>
    <ab1:search-relevance>20</ab1:search-relevance>
    <ab1:analysis/>" 
| makemv x delim=";" 
| mvexpand x 
| rex max_match=0 field=x "-name\>(?<name>[^\<]+)" 
| rex max_match=0 field=x "-response\>(?<response>[^\<]+)" 
| rex max_match=0 field=x "-relevance\>(?<relevance>[^\<]+)" 
| eval z=mvzip(name, mvzip(response, relevance)) 
| mvexpand z 
| rex field=z "(?<name>[^,]+),(?<response>[^,]+),(?<relevance>[^,]+)" 
| table name response relevance
0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...