Getting Data In

How to configure timestamp and other data formatting for multiline Exchange Autodiscover logs?

wrangler2x
Motivator

I have been asked to take on some logs which have a predictable format but which on a one-shot test input shows that Splunk hasn't figured them out. Here is a sample log entry, which is multi-line:

20141021_150239.928_128.200.22.13: Request Begin. User Agent: Microsoft Office/14.0 (Windows NT 6.1; Microsoft Outlook 14.0.7128; Pro)
20141021_150239.928_128.200.22.13: XML Message: <?xml version="1.0" encoding="utf-8"?><Autodiscover xmlns="http://schemas.microsoft.com/exchange/autodiscover/outlook/requestschema/2006"><Request><EMailAddress>bvarela@uci.edu</EMailAddress><AcceptableResponseSchema>http://schemas.microsoft.com/exchange/autodiscover/outlook/responseschema/2006a</AcceptableResponseSchema></Request></Autodiscover>
20141021_150239.928_128.200.22.13: **** Start Header Dump ****
20141021_150239.928_128.200.22.13:  Cache-Control: no-cache
20141021_150239.928_128.200.22.13:  Connection: Keep-Alive
20141021_150239.928_128.200.22.13:  Pragma: no-cache
20141021_150239.928_128.200.22.13:  Content-Length: 348
20141021_150239.928_128.200.22.13:  Content-Type: text/xml
20141021_150239.928_128.200.22.13:  Cookie: OutlookSession="{54AE4359-2E0C-4A13-9486-1DD48DAD6B66}"
20141021_150239.928_128.200.22.13:  Host: autodiscover.uci.edu
20141021_150239.928_128.200.22.13:  User-Agent: Microsoft Office/14.0 (Windows NT 6.1; Microsoft Outlook 14.0.7128; Pro)
20141021_150239.928_128.200.22.13:  X-User-Identity: bvarela@uci.edu
20141021_150239.928_128.200.22.13:  Depth: 0
20141021_150239.928_128.200.22.13: **** End Header Dump ****
20141021_150239.928_128.200.22.13: Email address "bvarela@uci.edu" retrieved from XML request.
20141021_150239.928_128.200.22.13: Request: bvarela@uci.edu; Redirect: bvarela@exchange.uci.edu
20141021_150239.928_128.200.22.13: End Request. Took 44ms.

Using Splunk > Manager >> Data Inputs >> Files & Directories >> Data Preview I was able to Specify a pattern or regex to break before and this is the regex that I gave it:

(\d{8}_\d{1,6}\.\d{3}_)(\d{1,3}\.){3}\d{1,3}: Request Begin\.

This resulted in this (1st record)

1   10/14/01 4:15:28.000 PM 
20141021_150239.928_128.200.22.13: Request Begin. User Agent: Microsoft Office/14.0 (Windows NT 6.1; Microsoft Outlook 14.0.7128; Pro)
20141021_150239.928_128.200.22.13: XML Message: <?xml version="1.0" encoding="utf-8"?><Autodiscover xmlns="http://schemas.microsoft.com/exchange/autodiscover/outlook/requestschema/2006"><Request><EMailAddress>bvarela@uci.edu</EMailAddress><AcceptableResponseSchema>http://schemas.microsoft.com/exchange/autodiscover/outlook/responseschema/2006a</AcceptableResponseSchema></Request></Autodiscover>
20141021_150239.928_128.200.22.13: **** Start Header Dump ****
20141021_150239.928_128.200.22.13:  Cache-Control: no-cache
20141021_150239.928_128.200.22.13:  Connection: Keep-Alive
20141021_150239.928_128.200.22.13:  Pragma: no-cache
20141021_150239.928_128.200.22.13:  Content-Length: 348
20141021_150239.928_128.200.22.13:  Content-Type: text/xml
20141021_150239.928_128.200.22.13:  Cookie: OutlookSession="{54AE4359-2E0C-4A13-9486-1DD48DAD6B66}"
20141021_150239.928_128.200.22.13:  Host: autodiscover.uci.edu
20141021_150239.928_128.200.22.13:  User-Agent: Microsoft Office/14.0 (Windows NT 6.1; Microsoft Outlook 14.0.7128; Pro)
20141021_150239.928_128.200.22.13:  X-User-Identity: bvarela@uci.edu
20141021_150239.928_128.200.22.13:  Depth: 0
20141021_150239.928_128.200.22.13: **** End Header Dump ****
20141021_150239.928_128.200.22.13: Email address "bvarela@uci.edu" retrieved from XML request.
20141021_150239.928_128.200.22.13: Request: bvarela@uci.edu; Redirect: bvarela@exchange.uci.edu
20141021_150239.928_128.200.22.13: End Request. Took 44ms.

The format of the time/date-stamp & IP before each colon is:

YYYYmmdd_(24hr)(min)(sec).(millisec)_(ipnumber)

or, put another way

YYYYmmdd_HHMMss.mmm_(ipnumber)

So in this last example (20141021_150239.928_128.200.22.13) we would be expecting 10/21/2014 3:02:39.928 PM for the timestamp, but Splunk is not getting this. Plus, it would be nice if I could reformat the IP as being a separate field, removing the '_' and having IP=128.200.22.13, and also would be great to drop redundant headers through the remainder of the log entry.

Any ideas?

0 Karma

somesoni2
Revered Legend

Give this a try (either in props.conf directly OR in Data Preview -> Advanced mode)

BREAK_ONLY_BEFORE=(\d{8}_\d{1,6}\.\d{3}_)(\d{1,3}\.){3}\d{1,3}: Request Begin\.
MAX_TIMESTAMP_LOOKAHEAD=25
NO_BINARY_CHECK=1
SEDCMD-ipaddr=s/(\d{8}_\d{6}\.\d{3})_(.*)/\1 IP=\2/
SEDCMD-removeextra=s/(\d{8}_\d{6}\.\d{3}_\d+\.\d+\.\d+\.\d+\:\s*)//g
SHOULD_LINEMERGE=true
TIME_FORMAT=%Y%m%d_%H%M%S.%3Q_
0 Karma

wrangler2x
Motivator

This works brilliantly except the SEDCMD-ipaddr sed command. After much head scratching I realized that the equal sign in the replace string was causing the sed to fail. Then, of course, the SEDCMD-removeextra sed command removed all, leaving no IP address at all.

I can use IP: and it works fine. Is there some way to include the = sign though? I tried this in regular sed on linux and it had no problem with the = sign, so it must be unique to splunk.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...