Splunk Search

How to extract multiple KV pairs from XML using FORMAT in transforms.conf to build the key-value pairing?

Path Finder

I am tasked with consuming a number of XML config files, which contain many key value pairs, but where the semantically useful KV pairs are obscured by literal KV pairs that are not useful. For example:

<?xml version="1.0" encoding="UTF-8"?>
        <add key="John" value="Guitar"/>
        <add key="Paul" value="Bass"/>
        <add key="George" value="Guitar"/>
        <add key="Ringo" value="Drums"/>

I would like to extract the data as: John=Guitar, Paul=Bass, George=Guitar, Ringo=Drums ... There are dozens of these keys.

I am attempting index-time extraction because I read that search-time extraction can not concatenate event segments into a field name If there were a search-time method, that would be preferred, of course.


category = Custom
description = Example XML
disabled = false
TRANSFORMS-xml1 = xml1
KV_MODE = none


REGEX = [\s+]<add\skey=\"(\w+)\"\svalue=\"(.*)\"\s\/>
FORMAT = $1::$2

Regex101.com validates the regex does capture the groups I want, but I'm not seeing any extracted fields in Spunk Web. What am I missing? I am working in Splunk Enterprise 6.5.1. TIA!

0 Karma

Splunk Employee
Splunk Employee

Hi anewell,

I think what you really need is a lookup to store key-value pairs. It is not appropriate to extract keys as field names since they are numerous and subject to change.

At search time, you can list key-value pairs in a table and output them into csv or kv lookups.

<search to display the key-value table>|outputlookup musicians.csv


<search to display the key-value table>|outputlookup kvstorecoll_lookup

For detailed information about the outputlookup command, please refer to documentation:

Hope this helps. Thanks!

0 Karma


looking at this -

 REGEX = [\s+]<add\skey=\"(\w+)\"\svalue=\"(.*)\"\s\/>

I notice a few things -

First, your regex indicates a whitespace character (\s) after the close-quote for value and before the closing slash-brace. I don't see a space there in the data you posted.

Second, you are using different assumptions to pull the key and the value.

For the key, you are assuming only "word" characters (\w).
For the value, you are allowing ALL characters (.)

in the second case, it might be more efficient scanning to define a character class of everything except a double-quote [^"] or [^\"] depending on the version of regular expressions you're using.

0 Karma
Get Updates on the Splunk Community!

Data Preparation Made Easy: SPL2 for Edge Processor

By now, you may have heard the exciting news that Edge Processor, the easy-to-use Splunk data preparation tool ...

Introducing Edge Processor: Next Gen Data Transformation

We get it - not only can it take a lot of time, money and resources to get data into Splunk, but it also takes ...

Tips & Tricks When Using Ingest Actions

Tune in to learn about:Large scale architecture when using Ingest ActionsRegEx performance considerations ...