Dashboards & Visualizations

Name - value pair - issue in extracting

jenipriya
Explorer

Hi I am trying to extract the values of certain field present in Log for a particular operation:

My Query:

Service="X1" Operation="Y1" AND AuditType="REQUEST_OUTBOUND" | sort _time | xmlunescape | xmlkv | fields + A, B

There are multiple values present in the log for fields A & B as below:

Message in Log:

<A>1000</A>
<A>29</A>
<A>30</A>

<B>B1</B>
<B>D2</B>
<B>C3</B>

whereas I need my output as :

A       B
1000    B1
29      D2
30      C3

But i am getting output result as :

A   B
30  C3

I am getting only the last value of the name-value pair but not all the values.

Could anyone please help in getting the desired output? Please let me know what command should i use and how to modify the search?

2 Solutions

Ron_Naken
Splunk Employee
Splunk Employee

This limitation seems to stem from the xmlkv command in the search app. It appears that the python script overwrites the previous value as it enumerates new values for the same key.

Here's the culprit:

for kvpair in XML_KV_RE.findall(rawOut):
    r[kvpair[0]] = kvpair[1]

It didn't take much time to create a "hack" to fix this, but it's not an ideal solution. This will work:

  1. In $SPLUNK_HOME/etc/apps/search/bin, copy xmlkv.py to myxml.py
  2. In myxml.py, replace the code above with the following, but be very careful to indent the code properly. The indentations don't display properly on this page, so you will have to eyeball them according to the rest of the code in myxml.py. Everything under the "for kvpair" line needs to be indented, and the lines under "if not" and under "else:" both need to be further indented. If you don't indent properly, you'll probably just generate an error in the Splunk UI when you run the command we're building.

        for kvpair in XML_KV_RE.findall(rawOut):
    if not kvpair[0] in r:
                r[kvpair[0]] = kvpair[1]
    else:
        r[kvpair[0]] = r[kvpair[0]] + ", " + kvpair[1]
    
  3. Create/edit $SPLUNK_HOME/etc/apps/search/local/commands.conf and add the following:

    [myxml]
    filename = myxml.py
    retainsevents = true
    overrides_timeorder = false

  4. Restart Splunk

We have just created a new command, myxml, that you can use in place of xmlkv to collect the multivalue XML fields. The multivalue XML fields will now show up as comma-separated values. For instance, A = "1000, 29, 30" and "B = B1, D2, C3". If you need to split the fields into individual values, you can use makemv.

Here is a sample search:

sourcetype=xmlkv | myxml | makemv delim="," A | makemv delim="," B

When I perform this in the lab, I see A(3) and B(3) in the Field Picker, and I can perform operations on any of the individual values.

HTH
ron

View solution in original post

Ron_Naken
Splunk Employee
Splunk Employee

Here is a post of the myxml.py file for the above solution. I wasn't able to get this to format properly in the initial answer:

# Copyright (C) 2005-2010 Splunk Inc. All Rights Reserved. Version 4.0 import sys,splunk.Intersplunk import re import urllib import xml.sax.saxutils as sax

XML_KV_RE = re.compile("<(.?)(?:\s[^>])?>([^<]*)")

try: results,dummyresults,settings = splunk.Intersplunk.getOrganizedResults()

for r in results:
    if "_raw" in r:
        raw = r["_raw"]
        rawOut = sax.unescape( raw )
        while( rawOut != raw ):
            raw = rawOut
            rawOut = sax.unescape( raw )                
        r["_raw"] = rawOut

        for kvpair in XML_KV_RE.findall(rawOut):
            if not kvpair[0] in r:
                r[kvpair[0]] = kvpair[1]
            else:
                r[kvpair[0]] = r[kvpair[0]] + ", " + kvpair[1]

except: import traceback stack = traceback.format_exc() results = splunk.Intersplunk.generateErrorResults("Error : Traceback: " + str(stack))

splunk.Intersplunk.outputResults( results )

View solution in original post

jenipriya
Explorer

Hi Ron

It did work, and thanks a lot for your reply.

My sincere apologies for the late response, as i have some n/w connectivity issues and couldn't sign in for a long time.

Once again thanks for your valuable inputs.

  • Jeni
0 Karma

Ron_Naken
Splunk Employee
Splunk Employee

Here is a post of the myxml.py file for the above solution. I wasn't able to get this to format properly in the initial answer:

# Copyright (C) 2005-2010 Splunk Inc. All Rights Reserved. Version 4.0 import sys,splunk.Intersplunk import re import urllib import xml.sax.saxutils as sax

XML_KV_RE = re.compile("<(.?)(?:\s[^>])?>([^<]*)")

try: results,dummyresults,settings = splunk.Intersplunk.getOrganizedResults()

for r in results:
    if "_raw" in r:
        raw = r["_raw"]
        rawOut = sax.unescape( raw )
        while( rawOut != raw ):
            raw = rawOut
            rawOut = sax.unescape( raw )                
        r["_raw"] = rawOut

        for kvpair in XML_KV_RE.findall(rawOut):
            if not kvpair[0] in r:
                r[kvpair[0]] = kvpair[1]
            else:
                r[kvpair[0]] = r[kvpair[0]] + ", " + kvpair[1]

except: import traceback stack = traceback.format_exc() results = splunk.Intersplunk.generateErrorResults("Error : Traceback: " + str(stack))

splunk.Intersplunk.outputResults( results )

Ron_Naken
Splunk Employee
Splunk Employee

This limitation seems to stem from the xmlkv command in the search app. It appears that the python script overwrites the previous value as it enumerates new values for the same key.

Here's the culprit:

for kvpair in XML_KV_RE.findall(rawOut):
    r[kvpair[0]] = kvpair[1]

It didn't take much time to create a "hack" to fix this, but it's not an ideal solution. This will work:

  1. In $SPLUNK_HOME/etc/apps/search/bin, copy xmlkv.py to myxml.py
  2. In myxml.py, replace the code above with the following, but be very careful to indent the code properly. The indentations don't display properly on this page, so you will have to eyeball them according to the rest of the code in myxml.py. Everything under the "for kvpair" line needs to be indented, and the lines under "if not" and under "else:" both need to be further indented. If you don't indent properly, you'll probably just generate an error in the Splunk UI when you run the command we're building.

        for kvpair in XML_KV_RE.findall(rawOut):
    if not kvpair[0] in r:
                r[kvpair[0]] = kvpair[1]
    else:
        r[kvpair[0]] = r[kvpair[0]] + ", " + kvpair[1]
    
  3. Create/edit $SPLUNK_HOME/etc/apps/search/local/commands.conf and add the following:

    [myxml]
    filename = myxml.py
    retainsevents = true
    overrides_timeorder = false

  4. Restart Splunk

We have just created a new command, myxml, that you can use in place of xmlkv to collect the multivalue XML fields. The multivalue XML fields will now show up as comma-separated values. For instance, A = "1000, 29, 30" and "B = B1, D2, C3". If you need to split the fields into individual values, you can use makemv.

Here is a sample search:

sourcetype=xmlkv | myxml | makemv delim="," A | makemv delim="," B

When I perform this in the lab, I see A(3) and B(3) in the Field Picker, and I can perform operations on any of the individual values.

HTH
ron

jenipriya
Explorer

Tried using "search .... | multikv fields A B" but still not getting any output. Not sure as why. Can someone please help me get this fixed.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Best Practices: Splunk auto adjust pipeline queue

When you enable autoAdjustQueue in Splunk, maxSize should be understood as the queue size Splunk starts with ...

Request for Professional Development: Attending .conf26

Winning Over the Boss: Your Pass to .conf26 conf26 is going to be here before you know it. If don't already ...

Casting Call: Compete in Cyber Games

Lights, Camera, SecOps: Apply to Compete in Cyber Games     Think you have what it takes to beat the clock? ...