<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Parsing SIP multiline, multiformat events in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Parsing-SIP-multiline-multiformat-events/m-p/83955#M17452</link>
    <description>&lt;P&gt;I think there are several options here as you seem to have variable number of varying fields in each event.  One solution is to use a combination of props &amp;amp; transforms definitions to pull out major/high-level extractions on first pass and then pull out additional fields in second pass.  &lt;/P&gt;

&lt;P&gt;You could have a props.conf like this to efficiently break events, extract timestamp, and call the field extraction pieces::&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[sipSourcetype]
SHOULD_LINEMERGE=false
LINE_BREAKER=([\r\n]+)\d{4}\.\d{2}\.\d{2}\s+\d{2}:\d{2}:\d{2}:\d{3}\s+\w+\s+\|
TIME_PREFIX=^
TIMESTAMP_LOOKAHEAD=28
TIME_FORMAT=%Y.%m.%d %H:%M:%S:%3N %z
KV_MODE=none
REPORT-field_passes=pass_one, pass_two, pass_three
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;and a corresponding transforms.conf like this to first pull out static known fields (pass_one) and then pull out colon separated values (pass_two) and finally add additional passes against sipFields (extracted in pass_one) to handle anything else &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[pass_one]
REGEX=^(\d{4}\.\d{2}\.\d{2}\s+\d{2}:\d{2}:\d{2}:\d{3}\s+\w+)[\s\|\t]+([^\|\t\n\r]+)[\s\|\t\n\r]+([^\|\t\n\r]+)(.*)?
#REGEX=^(\d{4}\.\d{2}\.\d{2}\s+\d{2}:\d{2}:\d{2}:\d{3}\s+\w+)[\s\|\t]+([^\|\t\n\r]+)[\s\|\t]+([^\|\t\n\r]+)(?:[\s\|\t]+)?(.*)?
FORMAT=szDateTime::$1 logLevel::$2 logType::$3 sipFields::$4

[pass_two]
SOURCE_KEY=sipFields
REGEX=([^\:\t\n\r\|\d]+)\:([^\t\n\r\|]+)
FORMAT=$1::$2 
MV_ADD=true    

[pass_three]
# another iteration for variable number of pipe separated values, etc
&lt;/CODE&gt;&lt;/PRE&gt;</description>
    <pubDate>Mon, 28 Sep 2020 11:59:13 GMT</pubDate>
    <dc:creator>bwooden</dc:creator>
    <dc:date>2020-09-28T11:59:13Z</dc:date>
    <item>
      <title>Parsing SIP multiline, multiformat events</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Parsing-SIP-multiline-multiformat-events/m-p/83954#M17451</link>
      <description>&lt;P&gt;Hi, I'm trying to parse some logs generated by Broadsoft SIP servers. The log formats follow a general pattern but the detail can vary from event to event and field meanings can be context-sensitive.&lt;/P&gt;

&lt;P&gt;The events are multiline broken by datetime string and the first portion is pipe-separated. The fields here can differ in number and meaning, and if I use DELIMS on the pipe character it works except for the last field which flows into the remainder of the event. &lt;/P&gt;

&lt;P&gt;The first thing I'd like to do is stop the delims at a defined point which seems to be a newline character. The following transform using "| or newline" doesn't work. If I make it "| or tab", it works better for the first line but also matches unwanted fields in the remainder of the event (many of which start with tab).&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[transform-bsft-xslog-test1]
# delims are pipe OR newline.
DELIMS = "|
"
FIELDS = "szDateTime" logLevel logType sipField1 sipField2 sipField3
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Event sample:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;2012.06.21 02:48:15:155 EST | Info       | CallP | SIP Endpoint | +155512345678 | Service Delivery | localHost1234:5678

        Processing Event: com.broadsoft.events.sip.SipReferEvent

2012.06.21 02:48:15:157 EST | Info       | Accounting

        SERVICE INVOCATION ACCOUNTING EVENT
        Time Stamp: Thu Jun 21 02:48:15 EST 2012 (1340264895157)
        Accounting ID: [id]
        Service Name: Call Transfer
        Related Accounting ID: [id]


2012.06.21 02:48:14:773 EST | Info       | SipMedia | +155512345678 | localHost1234:5678

        udp 391 Bytes IN from 10.10.10.10:5060
SIP/2.0 200 OK
[various amounts (10 - 30+ lines) of SIP information trimmed]
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 26 Jun 2012 10:42:19 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Parsing-SIP-multiline-multiformat-events/m-p/83954#M17451</guid>
      <dc:creator>inglisn</dc:creator>
      <dc:date>2012-06-26T10:42:19Z</dc:date>
    </item>
    <item>
      <title>Re: Parsing SIP multiline, multiformat events</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Parsing-SIP-multiline-multiformat-events/m-p/83955#M17452</link>
      <description>&lt;P&gt;I think there are several options here as you seem to have variable number of varying fields in each event.  One solution is to use a combination of props &amp;amp; transforms definitions to pull out major/high-level extractions on first pass and then pull out additional fields in second pass.  &lt;/P&gt;

&lt;P&gt;You could have a props.conf like this to efficiently break events, extract timestamp, and call the field extraction pieces::&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[sipSourcetype]
SHOULD_LINEMERGE=false
LINE_BREAKER=([\r\n]+)\d{4}\.\d{2}\.\d{2}\s+\d{2}:\d{2}:\d{2}:\d{3}\s+\w+\s+\|
TIME_PREFIX=^
TIMESTAMP_LOOKAHEAD=28
TIME_FORMAT=%Y.%m.%d %H:%M:%S:%3N %z
KV_MODE=none
REPORT-field_passes=pass_one, pass_two, pass_three
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;and a corresponding transforms.conf like this to first pull out static known fields (pass_one) and then pull out colon separated values (pass_two) and finally add additional passes against sipFields (extracted in pass_one) to handle anything else &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[pass_one]
REGEX=^(\d{4}\.\d{2}\.\d{2}\s+\d{2}:\d{2}:\d{2}:\d{3}\s+\w+)[\s\|\t]+([^\|\t\n\r]+)[\s\|\t\n\r]+([^\|\t\n\r]+)(.*)?
#REGEX=^(\d{4}\.\d{2}\.\d{2}\s+\d{2}:\d{2}:\d{2}:\d{3}\s+\w+)[\s\|\t]+([^\|\t\n\r]+)[\s\|\t]+([^\|\t\n\r]+)(?:[\s\|\t]+)?(.*)?
FORMAT=szDateTime::$1 logLevel::$2 logType::$3 sipFields::$4

[pass_two]
SOURCE_KEY=sipFields
REGEX=([^\:\t\n\r\|\d]+)\:([^\t\n\r\|]+)
FORMAT=$1::$2 
MV_ADD=true    

[pass_three]
# another iteration for variable number of pipe separated values, etc
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 28 Sep 2020 11:59:13 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Parsing-SIP-multiline-multiformat-events/m-p/83955#M17452</guid>
      <dc:creator>bwooden</dc:creator>
      <dc:date>2020-09-28T11:59:13Z</dc:date>
    </item>
    <item>
      <title>Re: Parsing SIP multiline, multiformat events</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Parsing-SIP-multiline-multiformat-events/m-p/83956#M17453</link>
      <description>&lt;P&gt;Excellent, thanks.&lt;/P&gt;

&lt;P&gt;I came across a "2-phase" similar strategy in a question about FIX logs. Its a really useful way of working with ugly log formats. I can pull out other values with rex in the search command. &lt;/P&gt;

&lt;P&gt;You also resolved some other issues on linebreaking I was having.&lt;/P&gt;</description>
      <pubDate>Wed, 27 Jun 2012 11:12:37 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Parsing-SIP-multiline-multiformat-events/m-p/83956#M17453</guid>
      <dc:creator>inglisn</dc:creator>
      <dc:date>2012-06-27T11:12:37Z</dc:date>
    </item>
  </channel>
</rss>

