I'm trying to extract a value from a fairly simple XML document. My regular expression works fine in search (rex) and also in python, however, it is not working as a field extraction.
Here is an example, with some details and links omitted, the part I am interested in is simply the final "true" Command outcome value. Note that the response can vary greatly and there can be other xml elements before/after this command outcome value.
<?xml version="1.0" encoding="utf-8"?><d:SetBookingReference xmlns:d="..." xmlns:m="..." xmlns:georss="..." ..." m:type="Framework.CommandOutcome"><d:OK m:type="Edm.Boolean">true</d:OK>
Let's assume this field is attached to each event in my "Events" index, the field is called XMLField. The following search works perfectly in extracting the value true/false from all the responses.
index=Events | rex field=XMLField "CommandOutcome[^<>]*><[^<>]*>(?<CommandOutcome2>[^<>]*)"
Here is how my field extraction looks, it is assigned to the correct index and is an "inline" extraction.
CommandOutcome[^<>]*><[^<>]*>(?<CommandOutcome>[^<>]*) in XMLField
I've got tens of regular expressions working as field extractions, I've got this particular expression working in search and in a python script, I'm just really out of ideas as to why it's not working in the field extraction. I originally had quotes, but I replaced these with [^<>]* to avoid awkward looking escape sequences on the quotes, I've also tried escaping the "<" and ">" signs, but the regex still fails.
Any ideas? Thanks!
Hi, apologies if this is not relevant to you but have you tried the spath command?
You can also do the following in your props.conf in order to let Splunk parse the XML automatically for you:
[yoursourcetype] KV_MODE = xml
*More about KV_MODE: *
KV_MODE = [none|auto|auto_escaped|multi|json|xml] * Used for search-time field extractions only. * Specifies the field/value extraction mode for the data. * Set KV_MODE to one of the following: * none: if you want no field/value extraction to take place. * auto: extracts field/value pairs separated by equal signs. * auto_escaped: extracts fields/value pairs separated by equal signs and honors \" and \\ as escaped sequences within quoted values, e.g field="value with \"nested\" quotes" * multi: invokes the multikv search command to expand a tabular event into multiple events. * xml : automatically extracts fields from XML data. * json: automatically extracts fields from JSON data. * Setting to 'none' can ensure that one or more user-created regexes are not overridden by automatic field/value extraction for a particular host, source, or source type, and also increases search performance. * Defaults to auto. * The 'xml' and 'json' modes will not extract any fields when used on data that isn't of the correct format (JSON or XML).
Curious... is XMLField a search time extraction or an index time extraction?
If the XMLField field is a search time extraction, then it needs to happen in the props prior to the new extraction.
Also in your search, the second CommandOutcome has a 2 at the end... but in your sedcmd example, there isnt a 2 at the end of the 2nd CommandOutcome
I have one of the fields named 2 so I can differentiate between the two fields and find out when the field extraction worked.
XMLField is an index time extraction, I think. This data source is JSON objects generated in a python script, so I would think the XMLField is created at index time so the extraction should work on it. But that might explain it as I can't get any extraction to work on the field.