I'm asking this question on behalf of a customer. We are ingesting XML data and it comes in clean. Timestamp is being received correctly. When i do a search in Splunk the raw data looks like this as a typical event.
<col name="time">1/15/2017 10:43:38 AM</col>
<col name="TimeSpan">N/A</col>
<col name="ThreadID">0x00000516</col>
<col name="User"></col>
<col name="HTTPSessionID">p21hued5eusxsxc3zl5ekbyo</col>
<col name="SessionGUID">p21hued5eusxsxc3zl5ekbyo</col>
<col name="SessionID">0</col>
<col name="Datasource"></col>
<col name="AppPoolName">AppNet</col>
<col name="IpAddress">10.65.6.47</col>
<col name="MachineName">10.65.6.47</col>
<col name="Result">0xFFFFFFFF80131500</col>
<col name="Message"></col>
<col name="Module">Hyland.Applications.Web.Client</col>
<col name="Class">Hyland.Applications.Web.Client.ProviderPages.WorkflowSOAProvider</col>
<col name="Method">ReleaseLicense</col>
<col name="SourceFile"></col>
<col name="SourceLine">0</col>
<col name="Severity">Error</col>
<col name="ErrorId">3dc1d8b4-cbab-4649-a09e-8c3434484aaa</col>
</row>
<row>
We want Splunk to extract the name/value pairs or key-value pairs and add them as interesting fields. We have tried REX as an option with this string but it isn't pulling it out. (we built this string in regex101.com and it yanks things out correctly using the global option but doesn't seem to be working here).
rex max_match=0 field=_raw ".+?name="(?<_KEY_1>.+?)"\>(?<_VAL_1>.+?)\<\/col>"
I'm looking for an answer or advice on how we can get Splunk to extract these name/value pairs.
Thank you.
Rich
Hey Rich!
Your regex is fine, I think the issue is that rex didn't like the leading underscores on your field and value names...(remember, leading underscores are reserved for Splunk internal use)
This worked for me:
| rex max_match=0 field=_raw ".+?name="(?<KEY_1>.+?)"\>(?<VAL_1>.+?)\<\/col>"
Once you have the rex the way you like it, you can implement this with props and transforms...
http://docs.splunk.com/Documentation/Splunk/6.5.1/Knowledge/Configureadvancedextractionswithfieldtra...
I tried using KV_MODE= xml but it didn't extract what I think you will want...the whole "col name =" got in the way, and made it messy..
the main thing is we need to keep the KV pair relationship to do the reporting you want...will play in the lab and see what I can come up with then update this post.
UPDATE:
Here is the props.conf and transforms.conf I used to parse the fields and keep the KV pairs. Big Up MuS for the optimized regex to deal with null values in the XML. Previous regex was too greedy for null fields.
props.conf
[answers494268]
BREAK_ONLY_BEFORE=<row>
CHARSET=UTF-8
KV_MODE=none
MAX_TIMESTAMP_LOOKAHEAD=125
NO_BINARY_CHECK=true
SHOULD_LINEMERGE=true
category=Custom
disabled=false
pulldown_type=true
TIME_FORMAT=%m/%d/%Y %H:%M:%S %p
TIME_PREFIX=<col name="time">
REPORT-xml = answers494268xml
transforms.conf
[answers494268xml]
#REGEX = .+?name="(.+?)">(.+?)</col>
#Above regex too greedy for null values
REGEX = .+?name=\"([^\"\>]+)\"\>([^\<]+)\<\/col>
FORMAT = $1::$2
One thing to note in this data...there are some fields with no values....not sure if this is because you were scrubbing the data to be shared...but just keep in mind the first regex wont match if there is no value for the key, thus messing up the transform...updated rex ensures no erroneous matches
ie.
<col name="User"></col>
<col name="Message"></col>
<col name="Datasource"></col>
Update:
The props/transforms method allowed us to successfully extract KV pairs for all fields that are not Null. And for our use case that works just fine! 🙂
Thanks so much Matthew and Rich for your help in tackling this sticky issue!
Eric
Hello!
I applied the modified REGEX into an inline search. The search successfully gives us a list of the desired field names under the "KEY_" field, and the desired values in a list under the "VAL_" field. Unfortunately not as individual key-val pairs. But getting much closer 🙂
I will give the Props/Transforms route a try next and post the results.
Thanks!
Love to see a working version of the KV Pair also. But this is definitely on the rt track and provides access to the specific items. Thanks for helping us out. Rich
posted working version of props/transforms above...playing with the "null" values now...
Yeah thats what I was eluding to with keeping the KV Pair relationship for reporting....will post as soon as I have a working version
Hey Rich!
Your regex is fine, I think the issue is that rex didn't like the leading underscores on your field and value names...(remember, leading underscores are reserved for Splunk internal use)
This worked for me:
| rex max_match=0 field=_raw ".+?name="(?<KEY_1>.+?)"\>(?<VAL_1>.+?)\<\/col>"
Once you have the rex the way you like it, you can implement this with props and transforms...
http://docs.splunk.com/Documentation/Splunk/6.5.1/Knowledge/Configureadvancedextractionswithfieldtra...
I tried using KV_MODE= xml but it didn't extract what I think you will want...the whole "col name =" got in the way, and made it messy..
the main thing is we need to keep the KV pair relationship to do the reporting you want...will play in the lab and see what I can come up with then update this post.
UPDATE:
Here is the props.conf and transforms.conf I used to parse the fields and keep the KV pairs. Big Up MuS for the optimized regex to deal with null values in the XML. Previous regex was too greedy for null fields.
props.conf
[answers494268]
BREAK_ONLY_BEFORE=<row>
CHARSET=UTF-8
KV_MODE=none
MAX_TIMESTAMP_LOOKAHEAD=125
NO_BINARY_CHECK=true
SHOULD_LINEMERGE=true
category=Custom
disabled=false
pulldown_type=true
TIME_FORMAT=%m/%d/%Y %H:%M:%S %p
TIME_PREFIX=<col name="time">
REPORT-xml = answers494268xml
transforms.conf
[answers494268xml]
#REGEX = .+?name="(.+?)">(.+?)</col>
#Above regex too greedy for null values
REGEX = .+?name=\"([^\"\>]+)\"\>([^\<]+)\<\/col>
FORMAT = $1::$2
One thing to note in this data...there are some fields with no values....not sure if this is because you were scrubbing the data to be shared...but just keep in mind the first regex wont match if there is no value for the key, thus messing up the transform...updated rex ensures no erroneous matches
ie.
<col name="User"></col>
<col name="Message"></col>
<col name="Datasource"></col>
Thank you very much. Was pulling my hair out trying to figure out. Customer was trying to use the variables with the underscores. We wont make that mistake again! Again TY 🙂 Props/Transforms is a good idea and we will put that into place.
Saved my day !! Thank you very much. I made a workaround if events had no value like so :
Using 2 options :