Why does my regular expression work with rex, but ...

jpanderson · ‎01-20-2016

I'm trying to extract a value from a fairly simple XML document. My regular expression works fine in search (rex) and also in python, however, it is not working as a field extraction.

Here is an example, with some details and links omitted, the part I am interested in is simply the final "true" Command outcome value. Note that the response can vary greatly and there can be other xml elements before/after this command outcome value.

<?xml version="1.0" encoding="utf-8"?><d:SetBookingReference xmlns:d="..." xmlns:m="..." xmlns:georss="..." ..." m:type="Framework.CommandOutcome"><d:OK m:type="Edm.Boolean">true</d:OK>

Let's assume this field is attached to each event in my "Events" index, the field is called XMLField. The following search works perfectly in extracting the value true/false from all the responses.

index=Events | rex field=XMLField "CommandOutcome[^<>]*><[^<>]*>(?<CommandOutcome2>[^<>]*)"

Here is how my field extraction looks, it is assigned to the correct index and is an "inline" extraction.

CommandOutcome[^<>]*><[^<>]*>(?<CommandOutcome>[^<>]*) in XMLField

I've got tens of regular expressions working as field extractions, I've got this particular expression working in search and in a python script, I'm just really out of ideas as to why it's not working in the field extraction. I originally had quotes, but I replaced these with [^<>]* to avoid awkward looking escape sequences on the quotes, I've also tried escaping the "<" and ">" signs, but the regex still fails.

Any ideas? Thanks!

javiergn · ‎01-20-2016

Hi, apologies if this is not relevant to you but have you tried the spath command?

http://docs.splunk.com/Documentation/Splunk/6.3.2/SearchReference/spath

You can also do the following in your props.conf in order to let Splunk parse the XML automatically for you:

[yoursourcetype]
KV_MODE = xml

*More about KV_MODE: *

KV_MODE = [none|auto|auto_escaped|multi|json|xml]
* Used for search-time field extractions only.
* Specifies the field/value extraction mode for the data.
* Set KV_MODE to one of the following:
  * none: if you want no field/value extraction to take place.
  * auto: extracts field/value pairs separated by equal signs.
  * auto_escaped: extracts fields/value pairs separated by equal signs and
                  honors \" and \\ as escaped sequences within quoted
                  values, e.g field="value with \"nested\" quotes"
  * multi: invokes the multikv search command to expand a tabular event into
           multiple events.
  * xml : automatically extracts fields from XML data.
  * json: automatically extracts fields from JSON data.
* Setting to 'none' can ensure that one or more user-created regexes are not
  overridden by automatic field/value extraction for a particular host,
  source, or source type, and also increases search performance.
* Defaults to auto.
* The 'xml' and 'json' modes will not extract any fields when used on data
  that isn't of the correct format (JSON or XML).

jkat54 · ‎01-20-2016

Curious... is XMLField a search time extraction or an index time extraction?

If the XMLField field is a search time extraction, then it needs to happen in the props prior to the new extraction.

Also in your search, the second CommandOutcome has a 2 at the end... but in your sedcmd example, there isnt a 2 at the end of the 2nd CommandOutcome

jpanderson · ‎01-20-2016

I have one of the fields named 2 so I can differentiate between the two fields and find out when the field extraction worked.

XMLField is an index time extraction, I think. This data source is JSON objects generated in a python script, so I would think the XMLField is created at index time so the extraction should work on it. But that might explain it as I can't get any extraction to work on the field.

Why does my regular expression work with rex, but not as a configured field extraction?

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Index This | What travels the world but is also stuck in place?

Discover New Use Cases: Unlock Greater Value from Your Existing Splunk Data

Continue Your Journey: Join Session 2 of the Data Management and Federation Bootcamp ...

Join the Conversation