Splunk Search

xml field extraction with a twist

manderson7
Contributor

Data example:

<Asset href="/company/rest-1.v1/Data/Story/2530981/6709286" id="Story:2530981:6709286"><Attribute name="Status.Name">Ready</Attribute><Attribute name="Number">B-107445</Attribute><Attribute name="Name">Upgrade Splunk Windows TA</Attribute><Attribute name="ChangeDate">2020-01-29T13:49:44.337</Attribute><Attribute name="CreateDate">2019-03-12T12:49:22.703</Attribute><Attribute name="Owners.Name"><Value>owner one</Value><Value>owner two</Value></Attribute></Asset>

&

<Asset href="/company/rest-1.v1/Data/Story/3644941/6720976" id="Story:3644941:6720976"><Attribute name="Status.Name">Ready</Attribute><Attribute name="Number">B-143465</Attribute><Attribute name="Name">Review/Upgrade Splunk_TA_Nix to v7</Attribute><Attribute name="ChangeDate">2020-01-30T12:54:07.103</Attribute><Attribute name="CreateDate">2020-01-15T10:40:49.307</Attribute><Attribute name="Owners.Name"><Value>owner one</Value></Attribute></Asset>

I've gotten my XML to seperate into events finally, but I'm being thrown by trying to get the fields to work. I'd like to have
Status.Name = Ready
Number = B-143465
ChangeDate = 2020-01-30T12:54:07.103
and so on

I created this regex using the field extractor and regex101:

^(?:[^>\n]*>){2}(?P<Status_Name>\w+\s+\w+|\w+)(?:[^>\n]*>){2}(?P<Number>\w+\-\d+)[^ \n]* \w+="\w+">(?P<Name>[^<]+)[^ \n]* \w+="\w+">(?P<ChangeDate>[^<]+)(?:[^"\n]*"){2}>(?<CreateDate>[^<]+)(?:[^"\n]*"){2}><\w+>(?P<Owners_Name>\w+\s+\w+)

which gets me most of the way there, but it won't work for the multiple owner values.
Can someone suggest a fix here? Also, if you could also suggest some help in implementing the regex in a transforms, I'd appreciate it. I think I can call it using
PROPS

...
REPORT-V1 = v1_fields

TRANSFORMS

[v1_fields]
REGEX = ^(?:[^>\n]*>){2}(?P<Status_Name>\w+\s+\w+|\w+)(?:[^>\n]*>){2}(?P<Number>\w+\-\d+)[^ \n]* \w+="\w+">(?P<Name>[^<]+)[^ \n]* \w+="\w+">(?P<ChangeDate>[^<]+)(?:[^"\n]*"){2}>(?<CreateDate>[^<]+)(?:[^"\n]*"){2}><\w+>(?P<Owners_Name>\w+\s+\w+)

But I don't know if I need to add a FORMAT = $1::$2 line (nor do I know what that line does ... )

Any help you can provide here would be great.
I've also tried KV_MODE=xml on the search head, but that doesn't give me the field names I want, just values for
Asset.Attribute
Asset.Attribute.Value
etc

Thanks

0 Karma
1 Solution

to4kawa
Ultra Champion

transforms.conf

  • For example, the following are equivalent for search-time field extractions:
    • Using FORMAT:
      • REGEX = ([a-z]+)=([a-z]+)
      • FORMAT = $1::$2
    • Without using FORMAT
      • REGEX = (?<_KEY_1>[a-z]+)=(?<_VAL_1>[a-z]+)
    • When using either of the above formats, in a search-time extraction, the regular expression attempts to match against the source text, extracting as many fields as can be identified in the source text.

FORMAT ver:

REGEX = \<Attribute name=\"([^\"]+)\"\>(?:\<Value\>)?(.*?)(?:\<\/Value\>)?\<\/Attribute\>
FORMAT = $1::$2

regexr.com/4vca1

View solution in original post

to4kawa
Ultra Champion

transforms.conf

  • For example, the following are equivalent for search-time field extractions:
    • Using FORMAT:
      • REGEX = ([a-z]+)=([a-z]+)
      • FORMAT = $1::$2
    • Without using FORMAT
      • REGEX = (?<_KEY_1>[a-z]+)=(?<_VAL_1>[a-z]+)
    • When using either of the above formats, in a search-time extraction, the regular expression attempts to match against the source text, extracting as many fields as can be identified in the source text.

FORMAT ver:

REGEX = \<Attribute name=\"([^\"]+)\"\>(?:\<Value\>)?(.*?)(?:\<\/Value\>)?\<\/Attribute\>
FORMAT = $1::$2

regexr.com/4vca1

manderson7
Contributor

That works in regex101, to an extent.
The Owners.Name field keeps the closed/open tags between the names, like

owner one< /value> <value>owner two

Is there any way around this, or is this the best that can happen?
and this is search time field extractions, so I need to put it on the search head, not the ingest host. thanks for that.

Also, thank you for your help, and for explaining the transforms.

0 Karma

to4kawa
Ultra Champion
[first trans]
REGEX = \<Attribute name=\"([^\"]+)\"\>(.*?)\<\/Attribute\>
FORMAT = $1::$2

[second trans]
SOURCE_KEY = "Owners.Name"
REGEX = \<value\>(.*?)\<\/value\> 
FORMAT = Owners_name::$1
MV_ADD = true
0 Karma

manderson7
Contributor

Thanks for your help. Unfortunately, I'm still getting
< Value>name one< /Value>< Value>name two< /Value>
minus the spaces.

Transforms.conf is :
[version1_fields]
REGEX = <Attribute name=\"([^\"]+)\">(.*?)<\/Attribute>
FORMAT=$1::$2

[v1_ownername]
SOURCE_KEY = "Owners.Name"
REGEX = \<Value\>(.*?)\<\/Value\>
FORMAT = Owners.Name::$1
MV_ADD = true

I make the Value uppercase in the regex, and adjusted Format from Owners_name to Owners.Name, but no help. Props looks like:

[version1_xml]
REPORT-v1 = version1_fields
REPORT-v12 = v1_ownername

Update:
I changed props to
[version1_xml]
REPORT-v1 = version1_fields,v1_ownername
and restarted, but nothing useful happened unfortunately, still seeing the multiple values in the same value surrounded by the < Value>< \Value>

0 Karma

to4kawa
Ultra Champion

second trans aims only to extract field.
Owners.Name 's value has < Value> and how's Owners_name ?
Why I separated fields is to check field name correct.
If Owners_name is nothing, you should fix it.

0 Karma

manderson7
Contributor

This is the final props & transforms that finally worked, thanks again for all your help

Transforms:

[version1_fields]
REGEX = \<Attribute name=\"([^\"]+)\"\>(.*?)\<\/Attribute\>
FORMAT=$1::$2

[v1_ownername]
SOURCE_KEY = Owners_Name
REGEX = \<Value\>(?<Owner>.*?)\<\/Value\>
MV_ADD = true
0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...