topic Re: Parsing SIP multiline, multiformat events in Getting Data In

Parsing SIP multiline, multiformat events

inglisn — Tue, 26 Jun 2012 10:42:19 GMT

Hi, I'm trying to parse some logs generated by Broadsoft SIP servers. The log formats follow a general pattern but the detail can vary from event to event and field meanings can be context-sensitive.

The events are multiline broken by datetime string and the first portion is pipe-separated. The fields here can differ in number and meaning, and if I use DELIMS on the pipe character it works except for the last field which flows into the remainder of the event.

The first thing I'd like to do is stop the delims at a defined point which seems to be a newline character. The following transform using "| or newline" doesn't work. If I make it "| or tab", it works better for the first line but also matches unwanted fields in the remainder of the event (many of which start with tab).

[transform-bsft-xslog-test1]
# delims are pipe OR newline.
DELIMS = "|
"
FIELDS = "szDateTime" logLevel logType sipField1 sipField2 sipField3

Event sample:

2012.06.21 02:48:15:155 EST | Info       | CallP | SIP Endpoint | +155512345678 | Service Delivery | localHost1234:5678

        Processing Event: com.broadsoft.events.sip.SipReferEvent

2012.06.21 02:48:15:157 EST | Info       | Accounting

        SERVICE INVOCATION ACCOUNTING EVENT
        Time Stamp: Thu Jun 21 02:48:15 EST 2012 (1340264895157)
        Accounting ID: [id]
        Service Name: Call Transfer
        Related Accounting ID: [id]


2012.06.21 02:48:14:773 EST | Info       | SipMedia | +155512345678 | localHost1234:5678

        udp 391 Bytes IN from 10.10.10.10:5060
SIP/2.0 200 OK
[various amounts (10 - 30+ lines) of SIP information trimmed]

Re: Parsing SIP multiline, multiformat events

bwooden — Mon, 28 Sep 2020 11:59:13 GMT

I think there are several options here as you seem to have variable number of varying fields in each event. One solution is to use a combination of props & transforms definitions to pull out major/high-level extractions on first pass and then pull out additional fields in second pass.

You could have a props.conf like this to efficiently break events, extract timestamp, and call the field extraction pieces::

[sipSourcetype]
SHOULD_LINEMERGE=false
LINE_BREAKER=([\r\n]+)\d{4}\.\d{2}\.\d{2}\s+\d{2}:\d{2}:\d{2}:\d{3}\s+\w+\s+\|
TIME_PREFIX=^
TIMESTAMP_LOOKAHEAD=28
TIME_FORMAT=%Y.%m.%d %H:%M:%S:%3N %z
KV_MODE=none
REPORT-field_passes=pass_one, pass_two, pass_three

and a corresponding transforms.conf like this to first pull out static known fields (pass_one) and then pull out colon separated values (pass_two) and finally add additional passes against sipFields (extracted in pass_one) to handle anything else

[pass_one]
REGEX=^(\d{4}\.\d{2}\.\d{2}\s+\d{2}:\d{2}:\d{2}:\d{3}\s+\w+)[\s\|\t]+([^\|\t\n\r]+)[\s\|\t\n\r]+([^\|\t\n\r]+)(.*)?
#REGEX=^(\d{4}\.\d{2}\.\d{2}\s+\d{2}:\d{2}:\d{2}:\d{3}\s+\w+)[\s\|\t]+([^\|\t\n\r]+)[\s\|\t]+([^\|\t\n\r]+)(?:[\s\|\t]+)?(.*)?
FORMAT=szDateTime::$1 logLevel::$2 logType::$3 sipFields::$4

[pass_two]
SOURCE_KEY=sipFields
REGEX=([^\:\t\n\r\|\d]+)\:([^\t\n\r\|]+)
FORMAT=$1::$2 
MV_ADD=true    

[pass_three]
# another iteration for variable number of pipe separated values, etc

Re: Parsing SIP multiline, multiformat events

inglisn — Wed, 27 Jun 2012 11:12:37 GMT

Excellent, thanks.

I came across a "2-phase" similar strategy in a question about FIX logs. Its a really useful way of working with ugly log formats. I can pull out other values with rex in the search command.

You also resolved some other issues on linebreaking I was having.