Splunk Search

XML Field Extraction

mwcooley
Explorer

Hi,

Here's a sample of my XML data. I want to get the username. I tried a field alias, but that's not working, nor is field extraction. When I open the field extractor tool, the data is truncated after the caller_profile tag. When I look at the event, it's all there. It's only when I try to use the field extractor that it gets truncated.

props.conf:
[conf_cdr_xml]
TRUNCATE = 0
KV_MODE = xml

date sample:


1235551234-101
hostname.com
8000
20
1510329526
1510329534


1510329526
1510329534

true
true
false
false


1235551010
XML
Joe Boss
1235551010


1235551010

10.0.1.1

1235551234;conf=101;mod;tone=NO_SOUNDS
038fa0ce-c630-11e7-938f-b3cdceb36fa4
mod_sofia
public
sofia/internal/1235551010@10.10.1.1





0 Karma
1 Solution

niketn
Legend

@mwcooley, so by KV_MODE=xml not working do you mean Search Time Field discovery in smart/verbose mode is not working? The following table command does not work

<YourBaseSearch>
|  table *username

Have you also tried

<YourBaseSearch>
| spath
|  table *username

In case XML parsing is not working and you are able to see data with <username>1235551010</username>, then try the following rex command and see how it behaves:

<YourBaseSearch>
|  rex "<username>(?<username>[^\<]+)</username>"
|  table username
____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

View solution in original post

maciep
Champion

so your events are already broken correctly and you're just working on field extractions? If so, then the kv_mode setting should be on your search head. Is it there?

0 Karma

mwcooley
Explorer

ah, OK. I don't have access to the search head, only the forwarder. i thought I could put it in props.conf there and make it work.

0 Karma

niketn
Legend

@mwcooley, so by KV_MODE=xml not working do you mean Search Time Field discovery in smart/verbose mode is not working? The following table command does not work

<YourBaseSearch>
|  table *username

Have you also tried

<YourBaseSearch>
| spath
|  table *username

In case XML parsing is not working and you are able to see data with <username>1235551010</username>, then try the following rex command and see how it behaves:

<YourBaseSearch>
|  rex "<username>(?<username>[^\<]+)</username>"
|  table username
____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

mwcooley
Explorer

@niketnilay, that's closer. | spath | table *username works. I get the usernames even when there are multiples. The rex command only returns the first .

If I use spath, how do I get the username into eventstats?

0 Karma

mintucs
New Member

in case xml using above solution

getting only single result

0 Karma

niketn
Legend

@mintucs, you might have to post a separate question with your sample xml data and extraction that you are using. If applicable your props.conf and transforms.conf as well. You would also need to mask any sensitive information while posting your question.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

niketn
Legend

@mwcooley, your sample data had only one username in the event. By the rex command only returning first match, do you mean that single event may have multiple usernames? Can you add such sample?

In any case, you can use max_match=0 in the rex command to return multiple matches within single event. username field will be treated as multivalued.

<YourBaseSearch>
|  rex "<username>(?<username>[^<]+)<\/username>" max_match=0
|  table username

What do you mean by eventstats? What is your intended output and which fields do you want to use and what is the desired output? In other words give the desired field names and expected values in tabular format.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

mwcooley
Explorer

I want to count the users, so i was trying to feed the usernames to eventstats. the final search is in a comment below.

0 Karma

mwcooley
Explorer

thanks @niketn, using max_match=0 worked. here's the final search (turned out i needed callerID, not username):

index="myIndex" sourcetype="conf_cdr_xml" |
eval Conf_Start=strftime(start_time,"%H:%M:%S %m/%d/%y") |

eval Conf_End=strftime(end_time,"%H:%M:%S %m/%d/%y") |

eval Duration = tostring((end_time - start_time), "Duration") |
rex "(?[^<]+)<\/caller_id_name>" max_match=0 |
eventstats count(caller_id_name) as Attendees by Conf_Start |
table confName Conf_Start Conf_End Duration Attendees

An, here's the xml with multiple usernames/callerIDs:

<?xml version="1.0"?>
<cdr>
  <conference>
    <name>1235551234-101</name>
    <hostname>hostname.com</hostname>
    <rate>8000</rate>
    <interval>20</interval>
    <start_time type="UNIX-epoch">1510329526</start_time>
    <end_time endconference_forced="false" type="UNIX-epoch">1510329534</end_time>
    <members>
      <member type="caller">
        <join_time type="UNIX-epoch">1510329526</join_time>
        <leave_time type="UNIX-epoch">1510329534</leave_time>
        <flags>
          <is_moderator>true</is_moderator>
          <end_conference>true</end_conference>
          <was_kicked>false</was_kicked>
          <is_ghost>false</is_ghost>
        </flags>
        <caller_profile>
          <username>1235551010</username>
          <dialplan>XML</dialplan>
          <caller_id_name>Joe Boss</caller_id_name>
          <caller_id_number>1235551010</caller_id_number>
          <callee_id_name></callee_id_name>
          <callee_id_number></callee_id_number>
          <ani>1235551010</ani>
          <aniii></aniii>
          <network_addr>10.0.1.1</network_addr>
          <rdnis></rdnis>
          <destination_number>1235551234;conf=101;mod;tone=NO_SOUNDS</destination_number>
          <uuid>038fa0ce-c630-11e7-938f-b3cdceb36fa4</uuid>
          <source>mod_sofia</source>
          <context>public</context>
          <chan_name>sofia/internal/1235551010@10.10.1.1</chan_name>
        </caller_profile>
      </member>
      <member type="caller">
        <join_time type="UNIX-epoch">1510329526</join_time>
        <leave_time type="UNIX-epoch">1510329534</leave_time>
        <flags>
          <is_moderator>true</is_moderator>
          <end_conference>true</end_conference>
          <was_kicked>false</was_kicked>
          <is_ghost>false</is_ghost>
        </flags>
        <caller_profile>
          <username>1235557721</username>
          <dialplan>XML</dialplan>
          <caller_id_name>Bob</caller_id_name>
          <caller_id_number>1235557721</caller_id_number>
          <callee_id_name></callee_id_name>
          <callee_id_number></callee_id_number>
          <ani>1235557721</ani>
          <aniii></aniii>
          <network_addr>10.0.1.2</network_addr>
          <rdnis></rdnis>
          <destination_number>1235551234;conf=101;mod;tone=NO_SOUNDS</destination_number>
          <uuid>038fa0ce-c630-11e7-938f-b3cdceb36fa4</uuid>
          <source>mod_sofia</source>
          <context>public</context>
          <chan_name>sofia/internal/1235557721@10.10.1.2</chan_name>
        </caller_profile>
      </member>
      </members>
    <rejected></rejected>
  </conference>
</cdr>

niketn
Legend

Glad it worked. Do compare stats and eventstats and see which one you actually need.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

FrankSPL
Path Finder

Try this in the props.conf

[conf_cdr_xml]
KV_MODE = xml
SHOULD_LINEMERGE = true
BREAK_ONLY_BEFORE = ((--NEVER--))
MAX_EVENTS = 1000
NO_BINARY_CHECK = true
pulldown_type = true

0 Karma

mwcooley
Explorer

Hey. Didn't work. I forward to heavy forwarders which forward to indexes. I'm worried something in the heavy forwarder is messing me up. KV_MODE isn't working either. But then, i'm a complete noob.

0 Karma

mwcooley
Explorer

dang it. the preview showed my xml as text. one more try:

<cdr>
  <conference>
    <name>1235551234-101</name>
    <hostname>hostname.com</hostname>
    <rate>8000</rate>
    <interval>20</interval>
    <start_time type="UNIX-epoch">1510329526</start_time>
    <end_time endconference_forced="false" type="UNIX-epoch">1510329534</end_time>
    <members>
      <member type="caller">
        <join_time type="UNIX-epoch">1510329526</join_time>
        <leave_time type="UNIX-epoch">1510329534</leave_time>
        <flags>
          <is_moderator>true</is_moderator>
          <end_conference>true</end_conference>
          <was_kicked>false</was_kicked>
          <is_ghost>false</is_ghost>
        </flags>
        <caller_profile>
          <username>1235551010</username>
          <dialplan>XML</dialplan>
          <caller_id_name>Joe Boss</caller_id_name>
          <caller_id_number>1235551010</caller_id_number>
          <callee_id_name></callee_id_name>
          <callee_id_number></callee_id_number>
          <ani>1235551010</ani>
          <aniii></aniii>
          <network_addr>10.0.1.1</network_addr>
          <rdnis></rdnis>
          <destination_number>1235551234;conf=101;mod;tone=NO_SOUNDS</destination_number>
          <uuid>038fa0ce-c630-11e7-938f-b3cdceb36fa4</uuid>
          <source>mod_sofia</source>
          <context>public</context>
          <chan_name>sofia/internal/1235551010@10.10.1.1</chan_name>
        </caller_profile>
      </member>
    </members>
    <rejected></rejected>
  </conference>
</cdr>
0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...