I need help in regex for key and value to be extracted from raw data, below regex working with xml_kv_extraction. While its working in regex101 but not in splunk with rex, any suggesstions.
<(?<field_header>[^>]+)>(?<field_value>[^<]+)<\/\1>https://regex101.com/r/IBsMhK/1
eg: events with
<field_title><field_header1>field_value1</field_header1><field_header2>field_value2</field_header2></field_title>
Should appear fields as below.
field title = <field_header1>field_value1</field_header1><field_header2>field_value2</field_header2>
field_header1=field_value1
field_header2=field_value2
1997-10-10 15:35:13.046, CREATE_DATE="1997-10-10 13:36:22.742479", LAST_UPDATE_DATE="1997-10-10 13:36:22.74", ACTION="externalFactor", STATUS="info", DATA_STRING="<?xml version="1.0" encoding="UTF-8"?>
<externalFactor><current>parker</current><keywordp><encrypted>true</encrypted><keywordp>******</keywordp></keywordp><boriskhan>boriskhan1-CMX_PRTY</boriskhan></externalFactor>"
1997-10-10 15:35:13.046, CREATE_DATE="1997-10-10 13:03:58.388887", LAST_UPDATE_DATE="1997-10-10 13:03:58.388", ACTION="externalFactor.RESPONSE", STATUS="info", DATA_STRING="<?xml version="1.0" encoding="UTF-8"?>
<externalFactorReturn><roleName>ROLE.CustomerManager</roleName><roleName>ROLE.DataSteward</roleName><pepres>false</pepres><externalFactor>false</externalFactor><parkeristrator>true</parkeristrator><current>parker</current></externalFactorReturn>"
1997-10-10 15:35:13.046, CREATE_DATE="1997-10-10 13:03:58.384984", LAST_UPDATE_DATE="1997-10-10 13:03:58.384", ACTION="externalFactor.RESPONSE", STATUS="info", DATA_STRING="<?xml version="1.0" encoding="UTF-8"?>
<externalFactorReturn><roleName>ROLE.CustomerManager</roleName><roleName>ROLE.DataSteward</roleName><pepres>false</pepres><externalFactor>false</externalFactor><parkeristrator>true</parkeristrator><current>parker</current></externalFactorReturn>"
1997-10-10 15:35:13.046, CREATE_DATE="1997-10-10 13:03:58.384947", LAST_UPDATE_DATE="1997-10-10 13:03:58.384", ACTION="externalFactor.RESPONSE", STATUS="info", DATA_STRING="<?xml version="1.0" encoding="UTF-8"?>
<externalFactorReturn><roleName>ROLE.CustomerManager</roleName><roleName>ROLE.DataSteward</roleName><pepres>false</pepres><externalFactor>false</externalFactor><parkeristrator>true</parkeristrator><current>parker</current></externalFactorReturn>"
1997-10-10 15:35:13.046, CREATE_DATE="1997-10-10 13:03:58.378965", LAST_UPDATE_DATE="1997-10-10 13:03:58.378", ACTION="externalFactor", STATUS="info", DATA_STRING="<?xml version="1.0" encoding="UTF-8"?>
<externalFactor><current>parker</current><keywordp><encrypted>true</encrypted><keywordp>******</keywordp></keywordp><boriskhan>boriskhan1-CMX_PRTY</boriskhan></externalFactor>"
1997-10-10 15:35:13.046, CREATE_DATE="1997-10-10 13:03:58.374242", LAST_UPDATE_DATE="1997-10-10 13:03:58.373", ACTION="externalFactor", STATUS="info", DATA_STRING="<?xml version="1.0" encoding="UTF-8"?>
<externalFactor><current>parker</current><keywordp><encrypted>true</encrypted><keywordp>******</keywordp></keywordp><boriskhan>boriskhan1-CMX_PRTY</boriskhan></externalFactor>"
1997-10-10 15:35:13.046, CREATE_DATE="1997-10-10 13:03:58.374235", LAST_UPDATE_DATE="1997-10-10 13:03:58.373", ACTION="externalFactor", STATUS="info", DATA_STRING="<?xml version="1.0" encoding="UTF-8"?>
<externalFactor><current>parker</current><keywordp><encrypted>true</encrypted><keywordp>******</keywordp></keywordp><boriskhan>boriskhan1-CMX_PRTY</boriskhan></externalFactor>"
1997-10-10 15:35:13.046, CREATE_DATE="1997-10-10 13:03:58.325953", LAST_UPDATE_DATE="1997-10-10 13:03:58.325", ACTION="externalFactor.RESPONSE", STATUS="info", DATA_STRING="<?xml version="1.0" encoding="UTF-8"?>
<externalFactorReturn><roleName>ROLE.CustomerManager</roleName><roleName>ROLE.DataSteward</roleName><pepres>false</pepres><externalFactor>false</externalFactor><parkeristrator>true</parkeristrator><current>parker</current></externalFactorReturn>"
I've tried identifying all individual fields in events and extracted using rex.
| rex "\s\<externalFactor\>(?<externalFactor>.*)\<\/externalFactor\>"
| rex "\s\<externalFactorReturn\>(?<externalFactorReturn>.*)\<\/externalFactorReturn\>"
| rex "\<current\>(?<current>.*)\<\/current\>"
| rex "\<encrypted\>(?<encrypted>.*)\<\/encrypted\>"
| rex "\<keywordp\>(?<keywordp>.*)\<\/keywordp\>"
| rex "\<pepres\>(?<pepres>.*)\<\/pepres\>"
| rex "\<roleName\>(?<roleName>.*)\<\/roleName\>"
| rex "\<boriskhan\>(?<boriskhan>.*)\<\/boriskhan\>"
| rex "\<sload\>(?<sload>.*)\<\/sload\>"
| rex "\<externalFactor\>(?<externalFactor>.*)\<\/externalFactor\>"
| rex "\<parkeristrator\>(?<parkeristrator>.*)\<\/parkeristrator\>"
I've tried identifying all individual fields in events and extracted using rex.
| rex "\s\<externalFactor\>(?<externalFactor>.*)\<\/externalFactor\>"
| rex "\s\<externalFactorReturn\>(?<externalFactorReturn>.*)\<\/externalFactorReturn\>"
| rex "\<current\>(?<current>.*)\<\/current\>"
| rex "\<encrypted\>(?<encrypted>.*)\<\/encrypted\>"
| rex "\<keywordp\>(?<keywordp>.*)\<\/keywordp\>"
| rex "\<pepres\>(?<pepres>.*)\<\/pepres\>"
| rex "\<roleName\>(?<roleName>.*)\<\/roleName\>"
| rex "\<boriskhan\>(?<boriskhan>.*)\<\/boriskhan\>"
| rex "\<sload\>(?<sload>.*)\<\/sload\>"
| rex "\<externalFactor\>(?<externalFactor>.*)\<\/externalFactor\>"
| rex "\<parkeristrator\>(?<parkeristrator>.*)\<\/parkeristrator\>"
You need to extract special capture groups from each match called _KEY_1 and _VAL_1
Where in Splunk are you using this regex? If it's in transforms.conf then please share the full stanza.
I have regex in description and also url from regex101.com. its not working in Splunk when i used rex with SPL query.
index=universe sourcetype=planet
| rex field=_raw "<(?<key>[^>]+)> (?<value>[^<]+)<\/\1>"
Results i got,
key
current
encrypted
keywordp
boriskhan
rolename
.
.
value
parker
true
******
role.customermanager
false
.
.
.
Result i need from raw data is,
current=parker
encrypted=true
keywordp=******
boriskhan=boriskhan1-cmx_prty
rolename=role.customermanager
.
.
.
.
See if this run-anywhere example query helps.
| makeresults
| eval data="1997-10-10 15:35:13.046, CREATE_DATE=\"1997-10-10 13:36:22.742479\", LAST_UPDATE_DATE=\"1997-10-10 13:36:22.74\", ACTION=\"externalFactor\", STATUS=\"info\", DATA_STRING=\"<?xml version=\"1.0\" encoding=\"UTF-8\"?><externalFactor><current>parker</current><keywordp><encrypted>true</encrypted><keywordp>******</keywordp></keywordp><boriskhan>boriskhan1-CMX_PRTY</boriskhan></externalFactor>\"
1997-10-10 15:35:13.046, CREATE_DATE=\"1997-10-10 13:03:58.388887\", LAST_UPDATE_DATE=\"1997-10-10 13:03:58.388\", ACTION=\"externalFactor.RESPONSE\", STATUS=\"info\", DATA_STRING=\"<?xml version=\"1.0\" encoding=\"UTF-8\"?><externalFactorReturn><roleName>ROLE.CustomerManager</roleName><roleName>ROLE.DataSteward</roleName><pepres>false</pepres><externalFactor>false</externalFactor><parkeristrator>true</parkeristrator><current>parker</current></externalFactorReturn>"
| eval data=split(data,"
")
| mvexpand data
| eval _raw=data
| fields - data
``` Everything above sets up demo data. Delete IRL ```
``` Extract keys and values ```
| rex max_match=0 "<(?<key>[^>]+)>(?<value>[^<]+)<\/\1>"
``` Match keys and values so they stayed paired during mvexpand ```
| eval pairs=mvzip(key,value)
| mvexpand pairs
``` Separate key from value ```
| eval pairs=split(pairs,",")
``` Define key=value result ```
| eval pairs=mvindex(pairs,0) . "=" . mvindex(pairs,1)
| table pairs
Its great, i have all key as in field_header and all valuesas in field_value assigned to a field called paris. So, here all key and value extracted in one field called paris. But, i'm trying to extract field headers and field values from raw data. As to search for SPL query "index=index_name sourcetype=sourcetype |table current, encrypted, keywordp,boriskhan, rolename, prepres, externalFactor, parkeristrator".
so, what i got is field_header as paris and field_values as current=parker, encrypted=true, . . . .
paris |
current=parker |
encrypted=true |
keywordp=****** |
boriskhan=1-CMX_PRTY |
roleName=ROLE.CustomerManager |
pepres=false |
externalFactor=false |
parkeristrator=true |
current=parker |
What goal is, field_header_1 is current, field_header_2 is encrypted, field_header_3 is keyword as below and to get field_headers showup in Splunk interesting fields with its values.
current | encrypted | keywordp | boriskhan | roleName | pepres | externalFactor |
parker | true | ****** | 1-CMX_PRTY | ROLE.CustomerManager | false | false |
Try this query. The transpose command "rotates" the results table into the desired format.
| makeresults
| eval data="1997-10-10 15:35:13.046, CREATE_DATE=\"1997-10-10 13:36:22.742479\", LAST_UPDATE_DATE=\"1997-10-10 13:36:22.74\", ACTION=\"externalFactor\", STATUS=\"info\", DATA_STRING=\"<?xml version=\"1.0\" encoding=\"UTF-8\"?><externalFactor><current>parker</current><keywordp><encrypted>true</encrypted><keywordp>******</keywordp></keywordp><boriskhan>boriskhan1-CMX_PRTY</boriskhan></externalFactor>\"
1997-10-10 15:35:13.046, CREATE_DATE=\"1997-10-10 13:03:58.388887\", LAST_UPDATE_DATE=\"1997-10-10 13:03:58.388\", ACTION=\"externalFactor.RESPONSE\", STATUS=\"info\", DATA_STRING=\"<?xml version=\"1.0\" encoding=\"UTF-8\"?><externalFactorReturn><roleName>ROLE.CustomerManager</roleName><roleName>ROLE.DataSteward</roleName><pepres>false</pepres><externalFactor>false</externalFactor><parkeristrator>true</parkeristrator><current>parker</current></externalFactorReturn>"
| eval data=split(data,"
")
| mvexpand data
| eval _raw=data
| fields - data
``` Everything above sets up demo data. Delete IRL ```
``` Extract keys and values ```
| rex max_match=0 "<(?<key>[^>]+)>(?<value>[^<]+)<\/\1>"
``` Match keys and values so they stayed paired during mvexpand ```
| eval pairs=mvzip(key,value)
| mvexpand pairs
``` Separate key from value ```
| eval pairs=split(pairs,",")
``` Define key=value result ```
| eval key=mvindex(pairs,0), value=mvindex(pairs,1)
| fields key,value
| fields - _*
| transpose 0 header_field=key
But, i'm still not able to view extracted field_header properly in interesting fields. i can only view "key" and "value" in interesting fields. I'm searching in verbose mode.
Only the key and value fields are visible because that is what the fields command does. If you prefer different names then change "key" to "field_header" throughout the query. Likewise for "value".
i'm trying to find right regex for Splunk search time extraction first, via Splunk GUI.