Splunk Search

How to extract multiple KV pairs from XML using FORMAT in transforms.conf to build the key-value pairing?

anewell
Path Finder

I am tasked with consuming a number of XML config files, which contain many key value pairs, but where the semantically useful KV pairs are obscured by literal KV pairs that are not useful. For example:

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
  <appSettings>
        <add key="John" value="Guitar"/>
        <add key="Paul" value="Bass"/>
        <add key="George" value="Guitar"/>
        <add key="Ringo" value="Drums"/>
    </appSettings>
</configuration>

I would like to extract the data as: John=Guitar, Paul=Bass, George=Guitar, Ringo=Drums ... There are dozens of these keys.

I am attempting index-time extraction because I read that search-time extraction can not concatenate event segments into a field name If there were a search-time method, that would be preferred, of course.

Props.conf

[fab]
BREAK_ONLY_BEFORE = NEVER_BREAK
DATETIME_CONFIG = CURRENT
NO_BINARY_CHECK = true
category = Custom
description = Example XML
disabled = false
TRANSFORMS-xml1 = xml1
KV_MODE = none

Transforms.conf

[xml1]
REGEX = [\s+]<add\skey=\"(\w+)\"\svalue=\"(.*)\"\s\/>
FORMAT = $1::$2
WRITE_META = true
REPEAT_MATCH = true
LOOKAHEAD = 4096

Regex101.com validates the regex does capture the groups I want, but I'm not seeing any extracted fields in Spunk Web. What am I missing? I am working in Splunk Enterprise 6.5.1. TIA!

0 Karma

hunters_splunk
Splunk Employee
Splunk Employee

Hi anewell,

I think what you really need is a lookup to store key-value pairs. It is not appropriate to extract keys as field names since they are numerous and subject to change.

At search time, you can list key-value pairs in a table and output them into csv or kv lookups.

<search to display the key-value table>|outputlookup musicians.csv

or

<search to display the key-value table>|outputlookup kvstorecoll_lookup

For detailed information about the outputlookup command, please refer to documentation:
http://docs.splunk.com/Documentation/Splunk/6.5.1/SearchReference/Outputlookup

Hope this helps. Thanks!
Hunter

0 Karma

DalJeanis
Legend

looking at this -

 REGEX = [\s+]<add\skey=\"(\w+)\"\svalue=\"(.*)\"\s\/>

I notice a few things -

First, your regex indicates a whitespace character (\s) after the close-quote for value and before the closing slash-brace. I don't see a space there in the data you posted.

Second, you are using different assumptions to pull the key and the value.

For the key, you are assuming only "word" characters (\w).
For the value, you are allowing ALL characters (.)

in the second case, it might be more efficient scanning to define a character class of everything except a double-quote [^"] or [^\"] depending on the version of regular expressions you're using.

0 Karma
Get Updates on the Splunk Community!

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...

Industry Solutions for Supply Chain and OT, Amazon Use Cases, Plus More New Articles ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Enterprise Security Content Update (ESCU) | New Releases

In November, the Splunk Threat Research Team had one release of new security content via the Enterprise ...