Dashboards & Visualizations

Name - value pair - issue in extracting

jenipriya
Explorer

Hi I am trying to extract the values of certain field present in Log for a particular operation:

My Query:

Service="X1" Operation="Y1" AND AuditType="REQUEST_OUTBOUND" | sort _time | xmlunescape | xmlkv | fields + A, B

There are multiple values present in the log for fields A & B as below:

Message in Log:

<A>1000</A>
<A>29</A>
<A>30</A>

<B>B1</B>
<B>D2</B>
<B>C3</B>

whereas I need my output as :

A       B
1000    B1
29      D2
30      C3

But i am getting output result as :

A   B
30  C3

I am getting only the last value of the name-value pair but not all the values.

Could anyone please help in getting the desired output? Please let me know what command should i use and how to modify the search?

2 Solutions

Ron_Naken
Splunk Employee
Splunk Employee

This limitation seems to stem from the xmlkv command in the search app. It appears that the python script overwrites the previous value as it enumerates new values for the same key.

Here's the culprit:

for kvpair in XML_KV_RE.findall(rawOut):
    r[kvpair[0]] = kvpair[1]

It didn't take much time to create a "hack" to fix this, but it's not an ideal solution. This will work:

  1. In $SPLUNK_HOME/etc/apps/search/bin, copy xmlkv.py to myxml.py
  2. In myxml.py, replace the code above with the following, but be very careful to indent the code properly. The indentations don't display properly on this page, so you will have to eyeball them according to the rest of the code in myxml.py. Everything under the "for kvpair" line needs to be indented, and the lines under "if not" and under "else:" both need to be further indented. If you don't indent properly, you'll probably just generate an error in the Splunk UI when you run the command we're building.

        for kvpair in XML_KV_RE.findall(rawOut):
    if not kvpair[0] in r:
                r[kvpair[0]] = kvpair[1]
    else:
        r[kvpair[0]] = r[kvpair[0]] + ", " + kvpair[1]
    
  3. Create/edit $SPLUNK_HOME/etc/apps/search/local/commands.conf and add the following:

    [myxml]
    filename = myxml.py
    retainsevents = true
    overrides_timeorder = false

  4. Restart Splunk

We have just created a new command, myxml, that you can use in place of xmlkv to collect the multivalue XML fields. The multivalue XML fields will now show up as comma-separated values. For instance, A = "1000, 29, 30" and "B = B1, D2, C3". If you need to split the fields into individual values, you can use makemv.

Here is a sample search:

sourcetype=xmlkv | myxml | makemv delim="," A | makemv delim="," B

When I perform this in the lab, I see A(3) and B(3) in the Field Picker, and I can perform operations on any of the individual values.

HTH
ron

View solution in original post

Ron_Naken
Splunk Employee
Splunk Employee

Here is a post of the myxml.py file for the above solution. I wasn't able to get this to format properly in the initial answer:

# Copyright (C) 2005-2010 Splunk Inc. All Rights Reserved. Version 4.0 import sys,splunk.Intersplunk import re import urllib import xml.sax.saxutils as sax

XML_KV_RE = re.compile("<(.?)(?:\s[^>])?>([^<]*)")

try: results,dummyresults,settings = splunk.Intersplunk.getOrganizedResults()

for r in results:
    if "_raw" in r:
        raw = r["_raw"]
        rawOut = sax.unescape( raw )
        while( rawOut != raw ):
            raw = rawOut
            rawOut = sax.unescape( raw )                
        r["_raw"] = rawOut

        for kvpair in XML_KV_RE.findall(rawOut):
            if not kvpair[0] in r:
                r[kvpair[0]] = kvpair[1]
            else:
                r[kvpair[0]] = r[kvpair[0]] + ", " + kvpair[1]

except: import traceback stack = traceback.format_exc() results = splunk.Intersplunk.generateErrorResults("Error : Traceback: " + str(stack))

splunk.Intersplunk.outputResults( results )

View solution in original post

jenipriya
Explorer

Hi Ron

It did work, and thanks a lot for your reply.

My sincere apologies for the late response, as i have some n/w connectivity issues and couldn't sign in for a long time.

Once again thanks for your valuable inputs.

  • Jeni
0 Karma

Ron_Naken
Splunk Employee
Splunk Employee

Here is a post of the myxml.py file for the above solution. I wasn't able to get this to format properly in the initial answer:

# Copyright (C) 2005-2010 Splunk Inc. All Rights Reserved. Version 4.0 import sys,splunk.Intersplunk import re import urllib import xml.sax.saxutils as sax

XML_KV_RE = re.compile("<(.?)(?:\s[^>])?>([^<]*)")

try: results,dummyresults,settings = splunk.Intersplunk.getOrganizedResults()

for r in results:
    if "_raw" in r:
        raw = r["_raw"]
        rawOut = sax.unescape( raw )
        while( rawOut != raw ):
            raw = rawOut
            rawOut = sax.unescape( raw )                
        r["_raw"] = rawOut

        for kvpair in XML_KV_RE.findall(rawOut):
            if not kvpair[0] in r:
                r[kvpair[0]] = kvpair[1]
            else:
                r[kvpair[0]] = r[kvpair[0]] + ", " + kvpair[1]

except: import traceback stack = traceback.format_exc() results = splunk.Intersplunk.generateErrorResults("Error : Traceback: " + str(stack))

splunk.Intersplunk.outputResults( results )

Ron_Naken
Splunk Employee
Splunk Employee

This limitation seems to stem from the xmlkv command in the search app. It appears that the python script overwrites the previous value as it enumerates new values for the same key.

Here's the culprit:

for kvpair in XML_KV_RE.findall(rawOut):
    r[kvpair[0]] = kvpair[1]

It didn't take much time to create a "hack" to fix this, but it's not an ideal solution. This will work:

  1. In $SPLUNK_HOME/etc/apps/search/bin, copy xmlkv.py to myxml.py
  2. In myxml.py, replace the code above with the following, but be very careful to indent the code properly. The indentations don't display properly on this page, so you will have to eyeball them according to the rest of the code in myxml.py. Everything under the "for kvpair" line needs to be indented, and the lines under "if not" and under "else:" both need to be further indented. If you don't indent properly, you'll probably just generate an error in the Splunk UI when you run the command we're building.

        for kvpair in XML_KV_RE.findall(rawOut):
    if not kvpair[0] in r:
                r[kvpair[0]] = kvpair[1]
    else:
        r[kvpair[0]] = r[kvpair[0]] + ", " + kvpair[1]
    
  3. Create/edit $SPLUNK_HOME/etc/apps/search/local/commands.conf and add the following:

    [myxml]
    filename = myxml.py
    retainsevents = true
    overrides_timeorder = false

  4. Restart Splunk

We have just created a new command, myxml, that you can use in place of xmlkv to collect the multivalue XML fields. The multivalue XML fields will now show up as comma-separated values. For instance, A = "1000, 29, 30" and "B = B1, D2, C3". If you need to split the fields into individual values, you can use makemv.

Here is a sample search:

sourcetype=xmlkv | myxml | makemv delim="," A | makemv delim="," B

When I perform this in the lab, I see A(3) and B(3) in the Field Picker, and I can perform operations on any of the individual values.

HTH
ron

jenipriya
Explorer

Tried using "search .... | multikv fields A B" but still not getting any output. Not sure as why. Can someone please help me get this fixed.

0 Karma
Get Updates on the Splunk Community!

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

For the past four years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...