Getting Data In

Re-labeling and breaking up xml stream?

jason0
Path Finder

Hello,

I have a tcp stream incoming with xml Call Data Records (CDR).  enclosed at the end is an example.

The CDR contains information about the caller and destination phone.  There are several SEDCMDs in the props.conf to take lines like <party type="orig"... and convert them into <orig_party...

The problem is with lines whose only differentiation is their position in the data structure.  With the xml lines that start with "<RTCPstats>  I need to modify their fields:  I need the first line to be ingested as "<orig_PS>31367</orig_PS> <orig_OS>6273400</orig_OS>..." and the second line as "<term_PS>31366</term_PS>, <term_OS>6273200</term_OS>...".

<flowinfo>     <RTCPstats>PS=31367, OS=6273400,..>   </flowinfo>

<flowinfo>     <RTCPstats>PS=31366, OS=6273200,...>    </flowinfo>

The actual sed command "sed -r -e '0,/ ?([A-Z_]*)=([0-9]*)/s//<ORIG_\1>\2<\/ORIG_\1>/g'" will do this, but the same entry in the SEDCMD will not.  

Altering a line of input is easy: altering only the FIRST instance in a record with embedded newlines is not.

What are my options?  

  • SEDCMD  in props.conf?
    • strip out all newlines so sedcmd treats it all as one? (can't really do it w. sed...)
  • regex in transforms.conf?
  • pass input through a script that CAN do this? 

Some details:

  • The linebreaker is two carriage returns (literally \r\r, or 0d0d).  
  • There are embedded newlines in each record so any SEDCMD will be applied to each line, not the entire record all at once.
  • I get 60,000 records per minute: this transformation needs to be fast.

Sample record: (spaced out neatly)

 

 

 

 

<e>
  <st>Perimeta XML CDR</st>
  <h>the perimeta hostname</h>
  <t>1664838107186</t>
  <tid>2814754955435820</tid>
  <sid>2082</sid>
  <eid>CDR</eid>
  <call starttime="1664837476918" starttime_local="2022-10-03T15:51:16-0700" endtime="1664838107179" endtime_local="2022-10-03T16:01:47-0700" duration="630261" release_side="term" bcid="a big string of numbers and letters">
    <party type="orig" phone="caller phonenumber" domain="ipaddr1" sig_address="ipaddr1" sig_port="5060" sig_transport="udp" trunk_group="trunkgroupname" trunk_context="nap" sip_call_id="anumber@adestination"/>
    <party type="term" phone="destination phonenumber" domain="0.0.0.0" routing_number="a routing number" sig_address="<an ip addr>" sig_port="5060" sig_transport="udp" trunk_group="6444" trunk_context="itg" edit_trunk_group="" edit_trunk_context="" sip_call_id="adifferentnumber@adifferentdestination"/>
    <adjacency type="orig" name="orig_adjacency_system" account="" vpnid="0X00000001" mediarealm="CoreMedia1"/>
    <adjacency type="term" name="dest_adjacency_system" account="" vpnid="0X00000004" mediarealm="CoreMedia1"/>
    <category name="cat.sbc.redirected"/>
    <connect time="1664837483144" time_local="2022-10-03T15:51:23-0700"/>
    <firstendrequest time="1664838107158" time_local="2022-10-03T16:01:47-0700"/>
    <disconnect time="1664838107179" time_local="2022-10-03T16:01:47-0700" reason="0"/>
    <redirector bcid="another string of letters and numbers" editphone="a phone number"/>
    <post_dial_delay duration="2895"/>
    <QoS stream_id="1" instance="0" reservetime="1664837476918" reservetime_local="2022-10-03T15:51:16-0700" committime="1664837483144" committime_local="2022-10-03T15:51:23-0700" releasetime="1664838107184" releasetime_local="2022-10-03T16:01:47-0700">
      <gate>
        <flowinfo>
          <local address="an ip address" port="63130"/>
          <remote address="another ip address" port="36214"/>
          <sd>m=audio 0 RTP/AVP 0
a=rtpmap:0 PCMU/8000
a=ptime:20
</sd>
          <RTCPstats>PS=31367, OS=6273400, PR=31366, OR=6273200, PD=0, OD=0, PL=0, JI=0, TOS=0, TOR=0, LA=0, PC/RPS=31165, PC/ROS=4986400, PC/RPR=31367, PC/RPL=0, PC/RJI=0, PC/RLA=0, RF=91, MOS=43, PC/RAJ=0, PC/RML=0</RTCPstats>
        </flowinfo>
        <flowinfo>
          <local address="an ip address" port="19648"/>
          <remote address="a diffrent ip address" port="26046"/>
          <sd>m=audio 0 RTP/AVP 0
a=rtpmap:0 PCMU/8000
a=ptime:20
</sd>
          <RTCPstats>PS=31366, OS=6273200, PR=31367, OR=6273400, PD=0, OD=0, PL=0, JI=0, TOS=0, TOR=0, LA=0, PC/RPS=0, PC/ROS=0, PC/RPR=0, PC/RPL=0, PC/RJI=0, PC/RLA=0, RF=82, MOS=41, PC/RAJ=0, PC/RML=0</RTCPstats>
        </flowinfo>
      </gate>
    </QoS>
  </call>
</e>

 

 

 

 

 

0 Karma

richgalloway
SplunkTrust
SplunkTrust

To change only the first match, remove the 'g' flag from the end of the sed command.

---
If this reply helps you, Karma would be appreciated.
0 Karma

jason0
Path Finder

Hello, 

I have tried that, but the problem is that the sedcmd is still line oriented on newlines: thus the second line appears to be a separate record and still gets changed.

--jason

Tags (1)
0 Karma

richgalloway
SplunkTrust
SplunkTrust

SEDCMD is event-oriented rather than line-oriented.  That you see it changing each line implies one line is one event.  Either the line breaking settings should be changed or another means found to change the data, such as INGEST_EVAL

---
If this reply helps you, Karma would be appreciated.
0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Event Series: Splunk Observability Metrics Cost Optimization

Balancing Scale and Spend: Gaining Control Over High-Volume Metrics in Splunk Observability Cloud As ...

Kick the Tires Before You Commit: A Hands-On Tour of the Splunk Observability Cloud ...

Evaluating an enterprise observability platform usually goes like this: fill out a form, get a free trial with ...

Deep insights, no barriers: Splunk Observability Cloud Free Edition

As software delivery cycles continue to accelerate, observability shouldn’t be a luxury — it should be a ...