 
					
				
		
Hello all,
I'm new to Splunk and I would love some help here. I have an xml file (well, partial xml as you will see), that I want to extract fields value from and I don't care whether that will be at index-time or search-time. What is important to me is that I would be able to see the fields when I search the events. I have been searching for 2 days now and tried different answers I came across, but in vain. Even the documentation isn't clear enough with examples. I am able to extract the fields in the search (using spath and specifying tags paths), but that is not what I want. I want to have the event fields extracted as soon as I fetch the event by sourcetype without using spath.
Here's a sample of the XML file:
POST /Air HTTP/1.1
Content-Length: 1048
Content-Type: text/xml
Date: Mon, 30 Aug 2004 13:17:39 MEST
Host: ws2258:10010
User-Agent: CX/4.3/1.0
Authorization: Basic dXNlcjpwYXNzd29yZA==
<?xml version="1.0" encoding="utf-8"?>
<methodCall>
    <methodName>UpdateServiceClass</methodName>
    <params>
        <param>
            <value>
                <struct>
                    <member>
                        <name>originNodeType</name>
                        <value>
                            <string>CX</string>
                        </value>
                    </member>
                    <member>
                        <name>originHostName</name>
                        <value>
                            <string>CX_1</string>
                        </value>
                    </member>
                    <member>
                        <name>originTransactionID</name>
                        <value>
                            <string>12013021310152350917</string>
                        </value>
                    </member>
                    <member>
                        <name>originTimeStamp</name>
                        <value>
                            <dateTime.iso8601>20130213T08:15:23+0200</dateTime.iso8601>
                        </value>
                    </member>
                    <member>
                        <name>subscriberNumberNAI</name>
                        <value>
                            <int>0</int>
                        </value>
                    </member>
                    <member>
                        <name>subscriberNumber</name>
                        <value>
                            <string>01101004157</string>
                        </value>
                    </member>
                    <member>
                        <name>originOperatorID</name>
                        <value>
                            <string>ericsson</string>
                        </value>
                    </member>
                    <member>
                        <name>serviceClassAction</name>
                        <value>
                            <string>SetOriginal</string>
                        </value>
                    </member>
                    <member>
                        <name>serviceClassNew</name>
                        <value>
                            <int>201</int>
                        </value>
                    </member>
                    <member>
                        <name>serviceClassCurrent</name>
                        <value>
                            <int>202</int>
                        </value>
                    </member>
                </struct>
            </value>
        </param>
    </params>
</methodCall>
This file contains only one record with multiple xml tag values. What I want is to extract the fields values without using spath. I want some configuration in the sourcetype so that the data is automatically extracted.
Here's the configuration I tried so far:
In the props.conf
BREAK_ONLY_BEFORE = (<methodCall>)
DATETIME_CONFIG = CURRENT
FIELDALIAS-rootfields = methodCall.params.param.value.struct.member{1}.value.string as "Origin Node Type"  methodCall.params.param.value.struct.member{2}.value.string as "Origin Host Name"  methodCall.params.param.value.struct.member{3}.value.string as "Origin Transaction ID"  methodCall.params.param.value.struct.member{4}.value as "Origin Timestamp"  methodCall.params.param.value.struct.member{5}.value.int as "Subscriber Number NAI"  methodCall.params.param.value.struct.member{6}.value.string as "Subsrciber Number"  methodCall.params.param.value.struct.member{7}.value.string as "Origin Operator ID"  methodCall.params.param.value.struct.member{9}.value.int as "Service Class New"  methodCall.params.param.value.struct.member{8}.value.string as "Service Class Action"  methodCall.params.param.value.struct.member{10}.value.int as "Service Class Current"
KV_MODE = xml
LINE_BREAKER = (<methodCall>)
NO_BINARY_CHECK = true
REPORT-xmlkv = 
SHOULD_LINEMERGE = true
TRUNCATE = 0
category = Custom
disabled = false
pulldown_type = true
supports_multivalues = true
REPORT-kozbaraXML = kozbaraXML
In transforms.conf:
[kozbaraXML]
REGEX = <([^\s\>])[^\>]\>([^<]*)\<\/\1\>
FORMAT = $1::$2
That was as far as I was able to get to according to what i found and understood from what I read. Any help would be much appreciated.
Update:
The output of the previous configurations:
With another configurations as follows: (props.conf only)
BREAK_ONLY_BEFORE = (<methodCall>)
DATETIME_CONFIG = CURRENT
FIELDALIAS-rootfields = methodCall.params.param.value.struct.member{1}.value.string as "Origin Node Type"  methodCall.params.param.value.struct.member{2}.value.string as "Origin Host Name"  methodCall.params.param.value.struct.member{3}.value.string as "Origin Transaction ID"  methodCall.params.param.value.struct.member{4}.value as "Origin Timestamp"  methodCall.params.param.value.struct.member{5}.value.int as "Subscriber Number NAI"  methodCall.params.param.value.struct.member{6}.value.string as "Subsrciber Number"  methodCall.params.param.value.struct.member{7}.value.string as "Origin Operator ID"  methodCall.params.param.value.struct.member{9}.value.int as "Service Class New"  methodCall.params.param.value.struct.member{8}.value.string as "Service Class Action"  methodCall.params.param.value.struct.member{10}.value.int as "Service Class Current"
KV_MODE = xml
LINE_BREAKER = (<methodCall>)
NO_BINARY_CHECK = true
REPORT-xmlkv = 
SHOULD_LINEMERGE = true
TRUNCATE = 0
category = Custom
disabled = false
pulldown_type = true
supports_multivalues = true
REPORT-xmlfield = xmlfield
The output is:
However, what I want is something like this:
Required Output:
https://drive.google.com/open?id=0B_dKbpAuqWHtNzk5amRqLUxrcE0
 
					
				
		
Try this for your transform. you can remove the ALIAS from props
[kozbaraXML]
 REGEX = \<name\>(\w+).*\>\s*\<value\>[\R\r\t\n\s]*\<\w+\>([^\<]+)\<\/
 FORMAT = $1::$2
 
					
				
		
Try this for your transform. you can remove the ALIAS from props
[kozbaraXML]
 REGEX = \<name\>(\w+).*\>\s*\<value\>[\R\r\t\n\s]*\<\w+\>([^\<]+)\<\/
 FORMAT = $1::$2
 
					
				
		
Nope! It didn't work.
By "didn't work" I mean the required output isn't reached. The current output is the same as the first image in the original post.
The current configuration in props.conf:
[CX_Request_AdvancedSettings4]
BREAK_ONLY_BEFORE = (<methodCall>)
DATETIME_CONFIG = CURRENT
KV_MODE = xml
LINE_BREAKER = (<methodCall>)
NO_BINARY_CHECK = true
REPORT-xmlkv = 
SHOULD_LINEMERGE = true
TRUNCATE = 0
category = Custom
disabled = false
pulldown_type = true
supports_multivalues = true
REPORT-kozbaraXML = kozbaraXML
The current configurations in transforms.conf:
[kozbaraXML]
REGEX = \<name\>(\w+).*\>\s*\<value\>[\R\r\t\n\s]*\<\w+\>([^\<]+)\<\/
FORMAT = $1::$2
Have you tried your suggestion? (You can create an xml file and fill it with the sample data in the original post). Your response is appreciated for sure!
Do you mind explaining the thoughts/logic behind your answer so I can have an idea of what is going on?
 
					
				
		
I did import the data, and that's how I was able to create the regex. I noticed you have KV_MODE=xml and supports_multvalues, remove those from your props. The idea is to treat the data a unstructured text and parse out the fields using regex. You can test the regex at http://regex101.com.
The way I tested this was by importing the data with default settings, no KV_MODE. Then went to Field Transformations page to create the transform rule (regex + $1::$2)
 
					
				
		
Here's the props that works for me
BREAK_ONLY_BEFORE = (<?xml)
DATETIME_CONFIG = CURRENT
NO_BINARY_CHECK = true
category = 
pulldown_type = true
REPORT-extractfields = kozbaraXML
disabled = false
and corresponding transforms stanza
[kozbaraXML]
CLEAN_KEYS = 0
FORMAT = $1::$2
REGEX = (?m)\<name\>(\w+)\<\/name\>\s*\<value\>\s*\<[^\>]+\>([^\<]+)\<\/
max_matches = 0
 
					
				
		
I want to thank you so much for your efforts and time trying to help me out with my problem. The last configurations did work!
What I currently need now is to develop the mindset so I can do it myself from now on. How did you acquire the knowledge of props.conf and transforms.conf and all the works related? The official documentation is very basic with no examples to help me understand and I can't really find tutorials on splunk :') Nothing but the paid training offered by splunk! Can you refer me to any resources to learn more?
Thank you again for your response.
 
					
				
		
I'm glad this worked for you. If you would please accept the answer will help others having a similar issue.
As for learning, I find most of the information online at Splunk documentation website. I also completed splunk training (they're awesome, great trainers) and certification. Most importantly, I spend a lot of time going through the questions in this community and learning from others' answers. This community is the best resource for learning Splunk.
In short, take the online training, if possible. Refer to Splunk online documentation everytime. Look through this community for answers and finally, keep trying/experimenting. Happy Splunking!
 
					
				
		
What do you get with these props.conf and transforms.conf settings?
 
					
				
		
Thank you for your response; it's highly appreciated. I updated the original post. You should be notified.
