Splunk Search

How to extract name/value pair from XML datasource?

rvoninski_splun
Splunk Employee
Splunk Employee

I'm asking this question on behalf of a customer. We are ingesting XML data and it comes in clean. Timestamp is being received correctly. When i do a search in Splunk the raw data looks like this as a typical event.

  <col name="time">1/15/2017 10:43:38 AM</col>
  <col name="TimeSpan">N/A</col>
  <col name="ThreadID">0x00000516</col>
  <col name="User"></col>
  <col name="HTTPSessionID">p21hued5eusxsxc3zl5ekbyo</col>
  <col name="SessionGUID">p21hued5eusxsxc3zl5ekbyo</col>
  <col name="SessionID">0</col>
  <col name="Datasource"></col>
  <col name="AppPoolName">AppNet</col>
  <col name="IpAddress">10.65.6.47</col>
  <col name="MachineName">10.65.6.47</col>
  <col name="Result">0xFFFFFFFF80131500</col>
  <col name="Message"></col>
  <col name="Module">Hyland.Applications.Web.Client</col>
  <col name="Class">Hyland.Applications.Web.Client.ProviderPages.WorkflowSOAProvider</col>
  <col name="Method">ReleaseLicense</col>
  <col name="SourceFile"></col>
  <col name="SourceLine">0</col>
  <col name="Severity">Error</col>
  <col name="ErrorId">3dc1d8b4-cbab-4649-a09e-8c3434484aaa</col>
</row>
<row>

We want Splunk to extract the name/value pairs or key-value pairs and add them as interesting fields. We have tried REX as an option with this string but it isn't pulling it out. (we built this string in regex101.com and it yanks things out correctly using the global option but doesn't seem to be working here).

rex max_match=0 field=_raw ".+?name="(?<_KEY_1>.+?)"\>(?<_VAL_1>.+?)\<\/col>"

I'm looking for an answer or advice on how we can get Splunk to extract these name/value pairs.

Thank you.

Rich

alt text

0 Karma
1 Solution

mattymo
Splunk Employee
Splunk Employee

Hey Rich!

Your regex is fine, I think the issue is that rex didn't like the leading underscores on your field and value names...(remember, leading underscores are reserved for Splunk internal use)

This worked for me:

| rex max_match=0 field=_raw ".+?name="(?<KEY_1>.+?)"\>(?<VAL_1>.+?)\<\/col>"

Once you have the rex the way you like it, you can implement this with props and transforms...
http://docs.splunk.com/Documentation/Splunk/6.5.1/Knowledge/Configureadvancedextractionswithfieldtra...

I tried using KV_MODE= xml but it didn't extract what I think you will want...the whole "col name =" got in the way, and made it messy..

the main thing is we need to keep the KV pair relationship to do the reporting you want...will play in the lab and see what I can come up with then update this post.

UPDATE:

Here is the props.conf and transforms.conf I used to parse the fields and keep the KV pairs. Big Up MuS for the optimized regex to deal with null values in the XML. Previous regex was too greedy for null fields.

props.conf

[answers494268]
BREAK_ONLY_BEFORE=<row>
CHARSET=UTF-8
KV_MODE=none
MAX_TIMESTAMP_LOOKAHEAD=125
NO_BINARY_CHECK=true
SHOULD_LINEMERGE=true
category=Custom
disabled=false
pulldown_type=true
TIME_FORMAT=%m/%d/%Y %H:%M:%S %p
TIME_PREFIX=<col name="time">

REPORT-xml = answers494268xml

transforms.conf

[answers494268xml]
#REGEX = .+?name="(.+?)">(.+?)</col> 
#Above regex too greedy for null values
REGEX = .+?name=\"([^\"\>]+)\"\>([^\<]+)\<\/col>
FORMAT = $1::$2

One thing to note in this data...there are some fields with no values....not sure if this is because you were scrubbing the data to be shared...but just keep in mind the first regex wont match if there is no value for the key, thus messing up the transform...updated rex ensures no erroneous matches

ie.

<col name="User"></col>
<col name="Message"></col>
<col name="Datasource"></col>

alt text

- MattyMo

View solution in original post

ferdbiffle
Explorer

Update:
The props/transforms method allowed us to successfully extract KV pairs for all fields that are not Null. And for our use case that works just fine! 🙂

Thanks so much Matthew and Rich for your help in tackling this sticky issue!

Eric

0 Karma

ferdbiffle
Explorer

Hello!

I applied the modified REGEX into an inline search. The search successfully gives us a list of the desired field names under the "KEY_" field, and the desired values in a list under the "VAL_" field. Unfortunately not as individual key-val pairs. But getting much closer 🙂

I will give the Props/Transforms route a try next and post the results.

Thanks!

0 Karma

rvoninski_splun
Splunk Employee
Splunk Employee

Love to see a working version of the KV Pair also. But this is definitely on the rt track and provides access to the specific items. Thanks for helping us out. Rich

0 Karma

mattymo
Splunk Employee
Splunk Employee

posted working version of props/transforms above...playing with the "null" values now...

- MattyMo
0 Karma

mattymo
Splunk Employee
Splunk Employee

Yeah thats what I was eluding to with keeping the KV Pair relationship for reporting....will post as soon as I have a working version

- MattyMo
0 Karma

mattymo
Splunk Employee
Splunk Employee

Hey Rich!

Your regex is fine, I think the issue is that rex didn't like the leading underscores on your field and value names...(remember, leading underscores are reserved for Splunk internal use)

This worked for me:

| rex max_match=0 field=_raw ".+?name="(?<KEY_1>.+?)"\>(?<VAL_1>.+?)\<\/col>"

Once you have the rex the way you like it, you can implement this with props and transforms...
http://docs.splunk.com/Documentation/Splunk/6.5.1/Knowledge/Configureadvancedextractionswithfieldtra...

I tried using KV_MODE= xml but it didn't extract what I think you will want...the whole "col name =" got in the way, and made it messy..

the main thing is we need to keep the KV pair relationship to do the reporting you want...will play in the lab and see what I can come up with then update this post.

UPDATE:

Here is the props.conf and transforms.conf I used to parse the fields and keep the KV pairs. Big Up MuS for the optimized regex to deal with null values in the XML. Previous regex was too greedy for null fields.

props.conf

[answers494268]
BREAK_ONLY_BEFORE=<row>
CHARSET=UTF-8
KV_MODE=none
MAX_TIMESTAMP_LOOKAHEAD=125
NO_BINARY_CHECK=true
SHOULD_LINEMERGE=true
category=Custom
disabled=false
pulldown_type=true
TIME_FORMAT=%m/%d/%Y %H:%M:%S %p
TIME_PREFIX=<col name="time">

REPORT-xml = answers494268xml

transforms.conf

[answers494268xml]
#REGEX = .+?name="(.+?)">(.+?)</col> 
#Above regex too greedy for null values
REGEX = .+?name=\"([^\"\>]+)\"\>([^\<]+)\<\/col>
FORMAT = $1::$2

One thing to note in this data...there are some fields with no values....not sure if this is because you were scrubbing the data to be shared...but just keep in mind the first regex wont match if there is no value for the key, thus messing up the transform...updated rex ensures no erroneous matches

ie.

<col name="User"></col>
<col name="Message"></col>
<col name="Datasource"></col>

alt text

- MattyMo

rvoninski_splun
Splunk Employee
Splunk Employee

Thank you very much. Was pulling my hair out trying to figure out. Customer was trying to use the variables with the underscores. We wont make that mistake again! Again TY 🙂 Props/Transforms is a good idea and we will put that into place.

0 Karma

kkrishnan_splun
Splunk Employee
Splunk Employee

Saved my day !! Thank you very much. I made a workaround if events had no value like so :

Using 2 options :

  1. SEDCMD to make it , generally not preferred because people don't want to mess with their data before being sent into Splunk.
  2. Using another report feature in props like so : .+?name=\"([^\">]+)\"()
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...