I'm trying to extract a value from a fairly simple XML document. My regular expression works fine in search (rex) and also in python, however, it is not working as a field extraction.
Here is an example, with some details and links omitted, the part I am interested in is simply the final "true" Command outcome value. Note that the response can vary greatly and there can be other xml elements before/after this command outcome value.
<?xml version="1.0" encoding="utf-8"?><d:SetBookingReference xmlns:d="..." xmlns:m="..." xmlns:georss="..." ..." m:type="Framework.CommandOutcome"><d:OK m:type="Edm.Boolean">true</d:OK>
Let's assume this field is attached to each event in my "Events" index, the field is called XMLField. The following search works perfectly in extracting the value true/false from all the responses.
index=Events | rex field=XMLField "CommandOutcome[^<>]*><[^<>]*>(?<CommandOutcome2>[^<>]*)"
Here is how my field extraction looks, it is assigned to the correct index and is an "inline" extraction.
CommandOutcome[^<>]*><[^<>]*>(?<CommandOutcome>[^<>]*) in XMLField
I've got tens of regular expressions working as field extractions, I've got this particular expression working in search and in a python script, I'm just really out of ideas as to why it's not working in the field extraction. I originally had quotes, but I replaced these with [^<>]* to avoid awkward looking escape sequences on the quotes, I've also tried escaping the "<" and ">" signs, but the regex still fails.
Any ideas? Thanks!
Hi, apologies if this is not relevant to you but have you tried the spath command?
http://docs.splunk.com/Documentation/Splunk/6.3.2/SearchReference/spath
You can also do the following in your props.conf in order to let Splunk parse the XML automatically for you:
[yoursourcetype]
KV_MODE = xml
*More about KV_MODE: *
KV_MODE = [none|auto|auto_escaped|multi|json|xml]
* Used for search-time field extractions only.
* Specifies the field/value extraction mode for the data.
* Set KV_MODE to one of the following:
* none: if you want no field/value extraction to take place.
* auto: extracts field/value pairs separated by equal signs.
* auto_escaped: extracts fields/value pairs separated by equal signs and
honors \" and \\ as escaped sequences within quoted
values, e.g field="value with \"nested\" quotes"
* multi: invokes the multikv search command to expand a tabular event into
multiple events.
* xml : automatically extracts fields from XML data.
* json: automatically extracts fields from JSON data.
* Setting to 'none' can ensure that one or more user-created regexes are not
overridden by automatic field/value extraction for a particular host,
source, or source type, and also increases search performance.
* Defaults to auto.
* The 'xml' and 'json' modes will not extract any fields when used on data
that isn't of the correct format (JSON or XML).
Curious... is XMLField a search time extraction or an index time extraction?
If the XMLField field is a search time extraction, then it needs to happen in the props prior to the new extraction.
Also in your search, the second CommandOutcome has a 2 at the end... but in your sedcmd example, there isnt a 2 at the end of the 2nd CommandOutcome
I have one of the fields named 2 so I can differentiate between the two fields and find out when the field extraction worked.
XMLField is an index time extraction, I think. This data source is JSON objects generated in a python script, so I would think the XMLField is created at index time so the extraction should work on it. But that might explain it as I can't get any extraction to work on the field.