Splunk Search

How to configure props and transforms to automatically extract XML fields from my data so I don't have to use spath?

ahmad_elkomey
Explorer

Hello all,

I'm new to Splunk and I would love some help here. I have an xml file (well, partial xml as you will see), that I want to extract fields value from and I don't care whether that will be at index-time or search-time. What is important to me is that I would be able to see the fields when I search the events. I have been searching for 2 days now and tried different answers I came across, but in vain. Even the documentation isn't clear enough with examples. I am able to extract the fields in the search (using spath and specifying tags paths), but that is not what I want. I want to have the event fields extracted as soon as I fetch the event by sourcetype without using spath.

Here's a sample of the XML file:

POST /Air HTTP/1.1
Content-Length: 1048
Content-Type: text/xml
Date: Mon, 30 Aug 2004 13:17:39 MEST
Host: ws2258:10010
User-Agent: CX/4.3/1.0
Authorization: Basic dXNlcjpwYXNzd29yZA==

<?xml version="1.0" encoding="utf-8"?>
<methodCall>
    <methodName>UpdateServiceClass</methodName>
    <params>
        <param>
            <value>
                <struct>
                    <member>
                        <name>originNodeType</name>
                        <value>
                            <string>CX</string>
                        </value>
                    </member>
                    <member>
                        <name>originHostName</name>
                        <value>
                            <string>CX_1</string>
                        </value>
                    </member>
                    <member>
                        <name>originTransactionID</name>
                        <value>
                            <string>12013021310152350917</string>
                        </value>
                    </member>
                    <member>
                        <name>originTimeStamp</name>
                        <value>
                            <dateTime.iso8601>20130213T08:15:23+0200</dateTime.iso8601>
                        </value>
                    </member>
                    <member>
                        <name>subscriberNumberNAI</name>
                        <value>
                            <int>0</int>
                        </value>
                    </member>
                    <member>
                        <name>subscriberNumber</name>
                        <value>
                            <string>01101004157</string>
                        </value>
                    </member>
                    <member>
                        <name>originOperatorID</name>
                        <value>
                            <string>ericsson</string>
                        </value>
                    </member>
                    <member>
                        <name>serviceClassAction</name>
                        <value>
                            <string>SetOriginal</string>
                        </value>
                    </member>
                    <member>
                        <name>serviceClassNew</name>
                        <value>
                            <int>201</int>
                        </value>
                    </member>
                    <member>
                        <name>serviceClassCurrent</name>
                        <value>
                            <int>202</int>
                        </value>
                    </member>
                </struct>
            </value>
        </param>
    </params>
</methodCall>

This file contains only one record with multiple xml tag values. What I want is to extract the fields values without using spath. I want some configuration in the sourcetype so that the data is automatically extracted.

Here's the configuration I tried so far:

In the props.conf

BREAK_ONLY_BEFORE = (<methodCall>)
DATETIME_CONFIG = CURRENT
FIELDALIAS-rootfields = methodCall.params.param.value.struct.member{1}.value.string as "Origin Node Type"  methodCall.params.param.value.struct.member{2}.value.string as "Origin Host Name"  methodCall.params.param.value.struct.member{3}.value.string as "Origin Transaction ID"  methodCall.params.param.value.struct.member{4}.value as "Origin Timestamp"  methodCall.params.param.value.struct.member{5}.value.int as "Subscriber Number NAI"  methodCall.params.param.value.struct.member{6}.value.string as "Subsrciber Number"  methodCall.params.param.value.struct.member{7}.value.string as "Origin Operator ID"  methodCall.params.param.value.struct.member{9}.value.int as "Service Class New"  methodCall.params.param.value.struct.member{8}.value.string as "Service Class Action"  methodCall.params.param.value.struct.member{10}.value.int as "Service Class Current"
KV_MODE = xml
LINE_BREAKER = (<methodCall>)
NO_BINARY_CHECK = true
REPORT-xmlkv = 
SHOULD_LINEMERGE = true
TRUNCATE = 0
category = Custom
disabled = false
pulldown_type = true
supports_multivalues = true
REPORT-kozbaraXML = kozbaraXML

In transforms.conf:

[kozbaraXML]
REGEX = <([^\s\>])[^\>]\>([^<]*)\<\/\1\>
FORMAT = $1::$2

That was as far as I was able to get to according to what i found and understood from what I read. Any help would be much appreciated.

Update:

The output of the previous configurations:

alt text

With another configurations as follows: (props.conf only)

BREAK_ONLY_BEFORE = (<methodCall>)
DATETIME_CONFIG = CURRENT
FIELDALIAS-rootfields = methodCall.params.param.value.struct.member{1}.value.string as "Origin Node Type"  methodCall.params.param.value.struct.member{2}.value.string as "Origin Host Name"  methodCall.params.param.value.struct.member{3}.value.string as "Origin Transaction ID"  methodCall.params.param.value.struct.member{4}.value as "Origin Timestamp"  methodCall.params.param.value.struct.member{5}.value.int as "Subscriber Number NAI"  methodCall.params.param.value.struct.member{6}.value.string as "Subsrciber Number"  methodCall.params.param.value.struct.member{7}.value.string as "Origin Operator ID"  methodCall.params.param.value.struct.member{9}.value.int as "Service Class New"  methodCall.params.param.value.struct.member{8}.value.string as "Service Class Action"  methodCall.params.param.value.struct.member{10}.value.int as "Service Class Current"
KV_MODE = xml
LINE_BREAKER = (<methodCall>)
NO_BINARY_CHECK = true
REPORT-xmlkv = 
SHOULD_LINEMERGE = true
TRUNCATE = 0
category = Custom
disabled = false
pulldown_type = true
supports_multivalues = true
REPORT-xmlfield = xmlfield

The output is:

alt text

However, what I want is something like this:

Required Output:
https://drive.google.com/open?id=0B_dKbpAuqWHtNzk5amRqLUxrcE0

0 Karma
1 Solution

sundareshr
Legend

Try this for your transform. you can remove the ALIAS from props

[kozbaraXML]
 REGEX = \<name\>(\w+).*\>\s*\<value\>[\R\r\t\n\s]*\<\w+\>([^\<]+)\<\/
 FORMAT = $1::$2

View solution in original post

sundareshr
Legend

Try this for your transform. you can remove the ALIAS from props

[kozbaraXML]
 REGEX = \<name\>(\w+).*\>\s*\<value\>[\R\r\t\n\s]*\<\w+\>([^\<]+)\<\/
 FORMAT = $1::$2

ahmad_elkomey
Explorer

Nope! It didn't work.
By "didn't work" I mean the required output isn't reached. The current output is the same as the first image in the original post.

The current configuration in props.conf:

[CX_Request_AdvancedSettings4]
BREAK_ONLY_BEFORE = (<methodCall>)
DATETIME_CONFIG = CURRENT
KV_MODE = xml
LINE_BREAKER = (<methodCall>)
NO_BINARY_CHECK = true
REPORT-xmlkv = 
SHOULD_LINEMERGE = true
TRUNCATE = 0
category = Custom
disabled = false
pulldown_type = true
supports_multivalues = true
REPORT-kozbaraXML = kozbaraXML

The current configurations in transforms.conf:

[kozbaraXML]
REGEX = \<name\>(\w+).*\>\s*\<value\>[\R\r\t\n\s]*\<\w+\>([^\<]+)\<\/
FORMAT = $1::$2

Have you tried your suggestion? (You can create an xml file and fill it with the sample data in the original post). Your response is appreciated for sure!

Do you mind explaining the thoughts/logic behind your answer so I can have an idea of what is going on?

0 Karma

sundareshr
Legend

I did import the data, and that's how I was able to create the regex. I noticed you have KV_MODE=xml and supports_multvalues, remove those from your props. The idea is to treat the data a unstructured text and parse out the fields using regex. You can test the regex at http://regex101.com.

The way I tested this was by importing the data with default settings, no KV_MODE. Then went to Field Transformations page to create the transform rule (regex + $1::$2)

0 Karma

sundareshr
Legend

Here's the props that works for me

BREAK_ONLY_BEFORE = (<?xml)
DATETIME_CONFIG = CURRENT
NO_BINARY_CHECK = true
category = 
pulldown_type = true
REPORT-extractfields = kozbaraXML
disabled = false

and corresponding transforms stanza

[kozbaraXML]
CLEAN_KEYS = 0
FORMAT = $1::$2
REGEX = (?m)\<name\>(\w+)\<\/name\>\s*\<value\>\s*\<[^\>]+\>([^\<]+)\<\/
max_matches = 0

ahmad_elkomey
Explorer

I want to thank you so much for your efforts and time trying to help me out with my problem. The last configurations did work!

What I currently need now is to develop the mindset so I can do it myself from now on. How did you acquire the knowledge of props.conf and transforms.conf and all the works related? The official documentation is very basic with no examples to help me understand and I can't really find tutorials on splunk :') Nothing but the paid training offered by splunk! Can you refer me to any resources to learn more?

Thank you again for your response.

sundareshr
Legend

I'm glad this worked for you. If you would please accept the answer will help others having a similar issue.

As for learning, I find most of the information online at Splunk documentation website. I also completed splunk training (they're awesome, great trainers) and certification. Most importantly, I spend a lot of time going through the questions in this community and learning from others' answers. This community is the best resource for learning Splunk.

In short, take the online training, if possible. Refer to Splunk online documentation everytime. Look through this community for answers and finally, keep trying/experimenting. Happy Splunking!

sundareshr
Legend

What do you get with these props.conf and transforms.conf settings?

0 Karma

ahmad_elkomey
Explorer

Thank you for your response; it's highly appreciated. I updated the original post. You should be notified.

0 Karma
Get Updates on the Splunk Community!

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

For the past four years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...