Getting Data In

How to configure timestamp and other data formatting for multiline Exchange Autodiscover logs?

wrangler2x
Motivator

I have been asked to take on some logs which have a predictable format but which on a one-shot test input shows that Splunk hasn't figured them out. Here is a sample log entry, which is multi-line:

20141021_150239.928_128.200.22.13: Request Begin. User Agent: Microsoft Office/14.0 (Windows NT 6.1; Microsoft Outlook 14.0.7128; Pro)
20141021_150239.928_128.200.22.13: XML Message: <?xml version="1.0" encoding="utf-8"?><Autodiscover xmlns="http://schemas.microsoft.com/exchange/autodiscover/outlook/requestschema/2006"><Request><EMailAddress>bvarela@uci.edu</EMailAddress><AcceptableResponseSchema>http://schemas.microsoft.com/exchange/autodiscover/outlook/responseschema/2006a</AcceptableResponseSchema></Request></Autodiscover>
20141021_150239.928_128.200.22.13: **** Start Header Dump ****
20141021_150239.928_128.200.22.13:  Cache-Control: no-cache
20141021_150239.928_128.200.22.13:  Connection: Keep-Alive
20141021_150239.928_128.200.22.13:  Pragma: no-cache
20141021_150239.928_128.200.22.13:  Content-Length: 348
20141021_150239.928_128.200.22.13:  Content-Type: text/xml
20141021_150239.928_128.200.22.13:  Cookie: OutlookSession="{54AE4359-2E0C-4A13-9486-1DD48DAD6B66}"
20141021_150239.928_128.200.22.13:  Host: autodiscover.uci.edu
20141021_150239.928_128.200.22.13:  User-Agent: Microsoft Office/14.0 (Windows NT 6.1; Microsoft Outlook 14.0.7128; Pro)
20141021_150239.928_128.200.22.13:  X-User-Identity: bvarela@uci.edu
20141021_150239.928_128.200.22.13:  Depth: 0
20141021_150239.928_128.200.22.13: **** End Header Dump ****
20141021_150239.928_128.200.22.13: Email address "bvarela@uci.edu" retrieved from XML request.
20141021_150239.928_128.200.22.13: Request: bvarela@uci.edu; Redirect: bvarela@exchange.uci.edu
20141021_150239.928_128.200.22.13: End Request. Took 44ms.

Using Splunk > Manager >> Data Inputs >> Files & Directories >> Data Preview I was able to Specify a pattern or regex to break before and this is the regex that I gave it:

(\d{8}_\d{1,6}\.\d{3}_)(\d{1,3}\.){3}\d{1,3}: Request Begin\.

This resulted in this (1st record)

1   10/14/01 4:15:28.000 PM 
20141021_150239.928_128.200.22.13: Request Begin. User Agent: Microsoft Office/14.0 (Windows NT 6.1; Microsoft Outlook 14.0.7128; Pro)
20141021_150239.928_128.200.22.13: XML Message: <?xml version="1.0" encoding="utf-8"?><Autodiscover xmlns="http://schemas.microsoft.com/exchange/autodiscover/outlook/requestschema/2006"><Request><EMailAddress>bvarela@uci.edu</EMailAddress><AcceptableResponseSchema>http://schemas.microsoft.com/exchange/autodiscover/outlook/responseschema/2006a</AcceptableResponseSchema></Request></Autodiscover>
20141021_150239.928_128.200.22.13: **** Start Header Dump ****
20141021_150239.928_128.200.22.13:  Cache-Control: no-cache
20141021_150239.928_128.200.22.13:  Connection: Keep-Alive
20141021_150239.928_128.200.22.13:  Pragma: no-cache
20141021_150239.928_128.200.22.13:  Content-Length: 348
20141021_150239.928_128.200.22.13:  Content-Type: text/xml
20141021_150239.928_128.200.22.13:  Cookie: OutlookSession="{54AE4359-2E0C-4A13-9486-1DD48DAD6B66}"
20141021_150239.928_128.200.22.13:  Host: autodiscover.uci.edu
20141021_150239.928_128.200.22.13:  User-Agent: Microsoft Office/14.0 (Windows NT 6.1; Microsoft Outlook 14.0.7128; Pro)
20141021_150239.928_128.200.22.13:  X-User-Identity: bvarela@uci.edu
20141021_150239.928_128.200.22.13:  Depth: 0
20141021_150239.928_128.200.22.13: **** End Header Dump ****
20141021_150239.928_128.200.22.13: Email address "bvarela@uci.edu" retrieved from XML request.
20141021_150239.928_128.200.22.13: Request: bvarela@uci.edu; Redirect: bvarela@exchange.uci.edu
20141021_150239.928_128.200.22.13: End Request. Took 44ms.

The format of the time/date-stamp & IP before each colon is:

YYYYmmdd_(24hr)(min)(sec).(millisec)_(ipnumber)

or, put another way

YYYYmmdd_HHMMss.mmm_(ipnumber)

So in this last example (20141021_150239.928_128.200.22.13) we would be expecting 10/21/2014 3:02:39.928 PM for the timestamp, but Splunk is not getting this. Plus, it would be nice if I could reformat the IP as being a separate field, removing the '_' and having IP=128.200.22.13, and also would be great to drop redundant headers through the remainder of the log entry.

Any ideas?

0 Karma

somesoni2
Revered Legend

Give this a try (either in props.conf directly OR in Data Preview -> Advanced mode)

BREAK_ONLY_BEFORE=(\d{8}_\d{1,6}\.\d{3}_)(\d{1,3}\.){3}\d{1,3}: Request Begin\.
MAX_TIMESTAMP_LOOKAHEAD=25
NO_BINARY_CHECK=1
SEDCMD-ipaddr=s/(\d{8}_\d{6}\.\d{3})_(.*)/\1 IP=\2/
SEDCMD-removeextra=s/(\d{8}_\d{6}\.\d{3}_\d+\.\d+\.\d+\.\d+\:\s*)//g
SHOULD_LINEMERGE=true
TIME_FORMAT=%Y%m%d_%H%M%S.%3Q_
0 Karma

wrangler2x
Motivator

This works brilliantly except the SEDCMD-ipaddr sed command. After much head scratching I realized that the equal sign in the replace string was causing the sed to fail. Then, of course, the SEDCMD-removeextra sed command removed all, leaving no IP address at all.

I can use IP: and it works fine. Is there some way to include the = sign though? I tried this in regular sed on linux and it had no problem with the = sign, so it must be unique to splunk.

0 Karma
Get Updates on the Splunk Community!

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Unleash Unified Security and Observability with Splunk Cloud Platform

     Now Available on Microsoft AzureThursday, March 27, 2025  |  11AM PST / 2PM EST | Register NowStep boldly ...

Splunk AppDynamics with Cisco Secure Application

Web applications unfortunately present a target rich environment for security vulnerabilities and attacks. ...