Splunk Search

How to extract multiple KV pairs from XML using FORMAT in transforms.conf to build the key-value pairing?

Path Finder

I am tasked with consuming a number of XML config files, which contain many key value pairs, but where the semantically useful KV pairs are obscured by literal KV pairs that are not useful. For example:

<?xml version="1.0" encoding="UTF-8"?>
        <add key="John" value="Guitar"/>
        <add key="Paul" value="Bass"/>
        <add key="George" value="Guitar"/>
        <add key="Ringo" value="Drums"/>

I would like to extract the data as: John=Guitar, Paul=Bass, George=Guitar, Ringo=Drums ... There are dozens of these keys.

I am attempting index-time extraction because I read that search-time extraction can not concatenate event segments into a field name If there were a search-time method, that would be preferred, of course.


category = Custom
description = Example XML
disabled = false
TRANSFORMS-xml1 = xml1
KV_MODE = none


REGEX = [\s+]<add\skey=\"(\w+)\"\svalue=\"(.*)\"\s\/>
FORMAT = $1::$2

Regex101.com validates the regex does capture the groups I want, but I'm not seeing any extracted fields in Spunk Web. What am I missing? I am working in Splunk Enterprise 6.5.1. TIA!

0 Karma

Splunk Employee
Splunk Employee

Hi anewell,

I think what you really need is a lookup to store key-value pairs. It is not appropriate to extract keys as field names since they are numerous and subject to change.

At search time, you can list key-value pairs in a table and output them into csv or kv lookups.

<search to display the key-value table>|outputlookup musicians.csv


<search to display the key-value table>|outputlookup kvstorecoll_lookup

For detailed information about the outputlookup command, please refer to documentation:

Hope this helps. Thanks!

0 Karma


looking at this -

 REGEX = [\s+]<add\skey=\"(\w+)\"\svalue=\"(.*)\"\s\/>

I notice a few things -

First, your regex indicates a whitespace character (\s) after the close-quote for value and before the closing slash-brace. I don't see a space there in the data you posted.

Second, you are using different assumptions to pull the key and the value.

For the key, you are assuming only "word" characters (\w).
For the value, you are allowing ALL characters (.)

in the second case, it might be more efficient scanning to define a character class of everything except a double-quote [^"] or [^\"] depending on the version of regular expressions you're using.

0 Karma
Get Updates on the Splunk Community!

Take the 2021 Splunk Career Survey for $50 in Amazon Cash

Help us learn about how Splunk has impacted your career by taking the 2021 Splunk Career Survey. Last year’s ...

Using Machine Learning for Hunting Security Threats

WATCH NOW Seeing the exponential hike in global cyber threat spectrum, organizations are now striving more for ...

Observability Newsletter Highlights | March 2023

 March 2023 | Check out the latest and greatestSplunk APM's New Tag Filter ExperienceSplunk APM has updated ...