Splunk Search

Why does my regular expression work with rex, but not as a configured field extraction?

jpanderson
Path Finder

I'm trying to extract a value from a fairly simple XML document. My regular expression works fine in search (rex) and also in python, however, it is not working as a field extraction.

Here is an example, with some details and links omitted, the part I am interested in is simply the final "true" Command outcome value. Note that the response can vary greatly and there can be other xml elements before/after this command outcome value.

<?xml version="1.0" encoding="utf-8"?><d:SetBookingReference xmlns:d="..." xmlns:m="..." xmlns:georss="..." ..." m:type="Framework.CommandOutcome"><d:OK m:type="Edm.Boolean">true</d:OK>

Let's assume this field is attached to each event in my "Events" index, the field is called XMLField. The following search works perfectly in extracting the value true/false from all the responses.

index=Events | rex field=XMLField "CommandOutcome[^<>]*><[^<>]*>(?<CommandOutcome2>[^<>]*)"

Here is how my field extraction looks, it is assigned to the correct index and is an "inline" extraction.

CommandOutcome[^<>]*><[^<>]*>(?<CommandOutcome>[^<>]*) in XMLField

I've got tens of regular expressions working as field extractions, I've got this particular expression working in search and in a python script, I'm just really out of ideas as to why it's not working in the field extraction. I originally had quotes, but I replaced these with [^<>]* to avoid awkward looking escape sequences on the quotes, I've also tried escaping the "<" and ">" signs, but the regex still fails.

Any ideas? Thanks!

0 Karma

javiergn
SplunkTrust
SplunkTrust

Hi, apologies if this is not relevant to you but have you tried the spath command?

http://docs.splunk.com/Documentation/Splunk/6.3.2/SearchReference/spath

You can also do the following in your props.conf in order to let Splunk parse the XML automatically for you:

[yoursourcetype]
KV_MODE = xml

*More about KV_MODE: *

KV_MODE = [none|auto|auto_escaped|multi|json|xml]
* Used for search-time field extractions only.
* Specifies the field/value extraction mode for the data.
* Set KV_MODE to one of the following:
  * none: if you want no field/value extraction to take place.
  * auto: extracts field/value pairs separated by equal signs.
  * auto_escaped: extracts fields/value pairs separated by equal signs and
                  honors \" and \\ as escaped sequences within quoted
                  values, e.g field="value with \"nested\" quotes"
  * multi: invokes the multikv search command to expand a tabular event into
           multiple events.
  * xml : automatically extracts fields from XML data.
  * json: automatically extracts fields from JSON data.
* Setting to 'none' can ensure that one or more user-created regexes are not
  overridden by automatic field/value extraction for a particular host,
  source, or source type, and also increases search performance.
* Defaults to auto.
* The 'xml' and 'json' modes will not extract any fields when used on data
  that isn't of the correct format (JSON or XML).
0 Karma

jkat54
SplunkTrust
SplunkTrust

Curious... is XMLField a search time extraction or an index time extraction?

If the XMLField field is a search time extraction, then it needs to happen in the props prior to the new extraction.

Also in your search, the second CommandOutcome has a 2 at the end... but in your sedcmd example, there isnt a 2 at the end of the 2nd CommandOutcome

0 Karma

jpanderson
Path Finder

I have one of the fields named 2 so I can differentiate between the two fields and find out when the field extraction worked.

XMLField is an index time extraction, I think. This data source is JSON objects generated in a python script, so I would think the XMLField is created at index time so the extraction should work on it. But that might explain it as I can't get any extraction to work on the field.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...