Getting Data In

Re-labeling and breaking up xml stream?

jason0
Path Finder

Hello,

I have a tcp stream incoming with xml Call Data Records (CDR).  enclosed at the end is an example.

The CDR contains information about the caller and destination phone.  There are several SEDCMDs in the props.conf to take lines like <party type="orig"... and convert them into <orig_party...

The problem is with lines whose only differentiation is their position in the data structure.  With the xml lines that start with "<RTCPstats>  I need to modify their fields:  I need the first line to be ingested as "<orig_PS>31367</orig_PS> <orig_OS>6273400</orig_OS>..." and the second line as "<term_PS>31366</term_PS>, <term_OS>6273200</term_OS>...".

<flowinfo>     <RTCPstats>PS=31367, OS=6273400,..>   </flowinfo>

<flowinfo>     <RTCPstats>PS=31366, OS=6273200,...>    </flowinfo>

The actual sed command "sed -r -e '0,/ ?([A-Z_]*)=([0-9]*)/s//<ORIG_\1>\2<\/ORIG_\1>/g'" will do this, but the same entry in the SEDCMD will not.  

Altering a line of input is easy: altering only the FIRST instance in a record with embedded newlines is not.

What are my options?  

  • SEDCMD  in props.conf?
    • strip out all newlines so sedcmd treats it all as one? (can't really do it w. sed...)
  • regex in transforms.conf?
  • pass input through a script that CAN do this? 

Some details:

  • The linebreaker is two carriage returns (literally \r\r, or 0d0d).  
  • There are embedded newlines in each record so any SEDCMD will be applied to each line, not the entire record all at once.
  • I get 60,000 records per minute: this transformation needs to be fast.

Sample record: (spaced out neatly)

 

 

 

 

<e>
  <st>Perimeta XML CDR</st>
  <h>the perimeta hostname</h>
  <t>1664838107186</t>
  <tid>2814754955435820</tid>
  <sid>2082</sid>
  <eid>CDR</eid>
  <call starttime="1664837476918" starttime_local="2022-10-03T15:51:16-0700" endtime="1664838107179" endtime_local="2022-10-03T16:01:47-0700" duration="630261" release_side="term" bcid="a big string of numbers and letters">
    <party type="orig" phone="caller phonenumber" domain="ipaddr1" sig_address="ipaddr1" sig_port="5060" sig_transport="udp" trunk_group="trunkgroupname" trunk_context="nap" sip_call_id="anumber@adestination"/>
    <party type="term" phone="destination phonenumber" domain="0.0.0.0" routing_number="a routing number" sig_address="<an ip addr>" sig_port="5060" sig_transport="udp" trunk_group="6444" trunk_context="itg" edit_trunk_group="" edit_trunk_context="" sip_call_id="adifferentnumber@adifferentdestination"/>
    <adjacency type="orig" name="orig_adjacency_system" account="" vpnid="0X00000001" mediarealm="CoreMedia1"/>
    <adjacency type="term" name="dest_adjacency_system" account="" vpnid="0X00000004" mediarealm="CoreMedia1"/>
    <category name="cat.sbc.redirected"/>
    <connect time="1664837483144" time_local="2022-10-03T15:51:23-0700"/>
    <firstendrequest time="1664838107158" time_local="2022-10-03T16:01:47-0700"/>
    <disconnect time="1664838107179" time_local="2022-10-03T16:01:47-0700" reason="0"/>
    <redirector bcid="another string of letters and numbers" editphone="a phone number"/>
    <post_dial_delay duration="2895"/>
    <QoS stream_id="1" instance="0" reservetime="1664837476918" reservetime_local="2022-10-03T15:51:16-0700" committime="1664837483144" committime_local="2022-10-03T15:51:23-0700" releasetime="1664838107184" releasetime_local="2022-10-03T16:01:47-0700">
      <gate>
        <flowinfo>
          <local address="an ip address" port="63130"/>
          <remote address="another ip address" port="36214"/>
          <sd>m=audio 0 RTP/AVP 0
a=rtpmap:0 PCMU/8000
a=ptime:20
</sd>
          <RTCPstats>PS=31367, OS=6273400, PR=31366, OR=6273200, PD=0, OD=0, PL=0, JI=0, TOS=0, TOR=0, LA=0, PC/RPS=31165, PC/ROS=4986400, PC/RPR=31367, PC/RPL=0, PC/RJI=0, PC/RLA=0, RF=91, MOS=43, PC/RAJ=0, PC/RML=0</RTCPstats>
        </flowinfo>
        <flowinfo>
          <local address="an ip address" port="19648"/>
          <remote address="a diffrent ip address" port="26046"/>
          <sd>m=audio 0 RTP/AVP 0
a=rtpmap:0 PCMU/8000
a=ptime:20
</sd>
          <RTCPstats>PS=31366, OS=6273200, PR=31367, OR=6273400, PD=0, OD=0, PL=0, JI=0, TOS=0, TOR=0, LA=0, PC/RPS=0, PC/ROS=0, PC/RPR=0, PC/RPL=0, PC/RJI=0, PC/RLA=0, RF=82, MOS=41, PC/RAJ=0, PC/RML=0</RTCPstats>
        </flowinfo>
      </gate>
    </QoS>
  </call>
</e>

 

 

 

 

 

0 Karma

richgalloway
SplunkTrust
SplunkTrust

To change only the first match, remove the 'g' flag from the end of the sed command.

---
If this reply helps you, Karma would be appreciated.
0 Karma

jason0
Path Finder

Hello, 

I have tried that, but the problem is that the sedcmd is still line oriented on newlines: thus the second line appears to be a separate record and still gets changed.

--jason

Tags (1)
0 Karma

richgalloway
SplunkTrust
SplunkTrust

SEDCMD is event-oriented rather than line-oriented.  That you see it changing each line implies one line is one event.  Either the line breaking settings should be changed or another means found to change the data, such as INGEST_EVAL

---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...