I am tasked with consuming a number of XML config files, which contain many key value pairs, but where the semantically useful KV pairs are obscured by literal KV pairs that are not useful. For example:
<?xml version="1.0" encoding="UTF-8"?> <configuration> <appSettings> <add key="John" value="Guitar"/> <add key="Paul" value="Bass"/> <add key="George" value="Guitar"/> <add key="Ringo" value="Drums"/> </appSettings> </configuration>
I would like to extract the data as: John=Guitar, Paul=Bass, George=Guitar, Ringo=Drums ... There are dozens of these keys.
I am attempting index-time extraction because I read that search-time extraction can not concatenate event segments into a field name If there were a search-time method, that would be preferred, of course.
[fab] BREAK_ONLY_BEFORE = NEVER_BREAK DATETIME_CONFIG = CURRENT NO_BINARY_CHECK = true category = Custom description = Example XML disabled = false TRANSFORMS-xml1 = xml1 KV_MODE = none
[xml1] REGEX = [\s+]<add\skey=\"(\w+)\"\svalue=\"(.*)\"\s\/> FORMAT = $1::$2 WRITE_META = true REPEAT_MATCH = true LOOKAHEAD = 4096
Regex101.com validates the regex does capture the groups I want, but I'm not seeing any extracted fields in Spunk Web. What am I missing? I am working in Splunk Enterprise 6.5.1. TIA!
I think what you really need is a lookup to store key-value pairs. It is not appropriate to extract keys as field names since they are numerous and subject to change.
At search time, you can list key-value pairs in a table and output them into csv or kv lookups.
<search to display the key-value table>|outputlookup musicians.csv
<search to display the key-value table>|outputlookup kvstorecoll_lookup
For detailed information about the outputlookup command, please refer to documentation:
Hope this helps. Thanks!
looking at this -
REGEX = [\s+]<add\skey=\"(\w+)\"\svalue=\"(.*)\"\s\/>
I notice a few things -
First, your regex indicates a whitespace character (\s) after the close-quote for value and before the closing slash-brace. I don't see a space there in the data you posted.
Second, you are using different assumptions to pull the key and the value.
For the key, you are assuming only "word" characters (\w).
For the value, you are allowing ALL characters (.)
in the second case, it might be more efficient scanning to define a character class of everything except a double-quote [^"] or [^\"] depending on the version of regular expressions you're using.