How to extract multiple KV pairs from XML using FO...

anewell · ‎01-17-2017

I am tasked with consuming a number of XML config files, which contain many key value pairs, but where the semantically useful KV pairs are obscured by literal KV pairs that are not useful. For example:

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
  <appSettings>
        <add key="John" value="Guitar"/>
        <add key="Paul" value="Bass"/>
        <add key="George" value="Guitar"/>
        <add key="Ringo" value="Drums"/>
    </appSettings>
</configuration>

I would like to extract the data as: John=Guitar, Paul=Bass, George=Guitar, Ringo=Drums ... There are dozens of these keys.

I am attempting index-time extraction because I read that search-time extraction can not concatenate event segments into a field name If there were a search-time method, that would be preferred, of course.

Props.conf

[fab]
BREAK_ONLY_BEFORE = NEVER_BREAK
DATETIME_CONFIG = CURRENT
NO_BINARY_CHECK = true
category = Custom
description = Example XML
disabled = false
TRANSFORMS-xml1 = xml1
KV_MODE = none

Transforms.conf

[xml1]
REGEX = [\s+]<add\skey=\"(\w+)\"\svalue=\"(.*)\"\s\/>
FORMAT = $1::$2
WRITE_META = true
REPEAT_MATCH = true
LOOKAHEAD = 4096

Regex101.com validates the regex does capture the groups I want, but I'm not seeing any extracted fields in Spunk Web. What am I missing? I am working in Splunk Enterprise 6.5.1. TIA!

hunters_splunk · ‎01-17-2017

Hi anewell,

I think what you really need is a lookup to store key-value pairs. It is not appropriate to extract keys as field names since they are numerous and subject to change.

At search time, you can list key-value pairs in a table and output them into csv or kv lookups.

<search to display the key-value table>|outputlookup musicians.csv

or

<search to display the key-value table>|outputlookup kvstorecoll_lookup

For detailed information about the outputlookup command, please refer to documentation:
http://docs.splunk.com/Documentation/Splunk/6.5.1/SearchReference/Outputlookup

Hope this helps. Thanks!
Hunter

DalJeanis · ‎01-17-2017

looking at this -

 REGEX = [\s+]<add\skey=\"(\w+)\"\svalue=\"(.*)\"\s\/>

I notice a few things -

First, your regex indicates a whitespace character (\s) after the close-quote for value and before the closing slash-brace. I don't see a space there in the data you posted.

Second, you are using different assumptions to pull the key and the value.

For the key, you are assuming only "word" characters (\w).
For the value, you are allowing ALL characters (.)

in the second case, it might be more efficient scanning to define a character class of everything except a double-quote [^"] or [^\"] depending on the version of regular expressions you're using.

How to extract multiple KV pairs from XML using FORMAT in transforms.conf to build the key-value pairing?

Deep Dive into Federated Analytics: Unlocking the Full Power of Your Security Data

Your summer travels continue with new course releases

From Alert to Resolution: How Splunk Observability Helps SREs Navigate Critical ...

Are you a member of the Splunk Community?

How to extract multiple KV pairs from XML using FORMAT in transforms.conf to build the key-value pairing?

Deep Dive into Federated Analytics: Unlocking the Full Power of Your Security Data

Your summer travels continue with new course releases

From Alert to Resolution: How Splunk Observability Helps SREs Navigate Critical ...