Getting Data In

How to configure timestamp and other data formatting for multiline Exchange Autodiscover logs?

wrangler2x
Motivator

I have been asked to take on some logs which have a predictable format but which on a one-shot test input shows that Splunk hasn't figured them out. Here is a sample log entry, which is multi-line:

20141021_150239.928_128.200.22.13: Request Begin. User Agent: Microsoft Office/14.0 (Windows NT 6.1; Microsoft Outlook 14.0.7128; Pro)
20141021_150239.928_128.200.22.13: XML Message: <?xml version="1.0" encoding="utf-8"?><Autodiscover xmlns="http://schemas.microsoft.com/exchange/autodiscover/outlook/requestschema/2006"><Request><EMailAddress>bvarela@uci.edu</EMailAddress><AcceptableResponseSchema>http://schemas.microsoft.com/exchange/autodiscover/outlook/responseschema/2006a</AcceptableResponseSchema></Request></Autodiscover>
20141021_150239.928_128.200.22.13: **** Start Header Dump ****
20141021_150239.928_128.200.22.13:  Cache-Control: no-cache
20141021_150239.928_128.200.22.13:  Connection: Keep-Alive
20141021_150239.928_128.200.22.13:  Pragma: no-cache
20141021_150239.928_128.200.22.13:  Content-Length: 348
20141021_150239.928_128.200.22.13:  Content-Type: text/xml
20141021_150239.928_128.200.22.13:  Cookie: OutlookSession="{54AE4359-2E0C-4A13-9486-1DD48DAD6B66}"
20141021_150239.928_128.200.22.13:  Host: autodiscover.uci.edu
20141021_150239.928_128.200.22.13:  User-Agent: Microsoft Office/14.0 (Windows NT 6.1; Microsoft Outlook 14.0.7128; Pro)
20141021_150239.928_128.200.22.13:  X-User-Identity: bvarela@uci.edu
20141021_150239.928_128.200.22.13:  Depth: 0
20141021_150239.928_128.200.22.13: **** End Header Dump ****
20141021_150239.928_128.200.22.13: Email address "bvarela@uci.edu" retrieved from XML request.
20141021_150239.928_128.200.22.13: Request: bvarela@uci.edu; Redirect: bvarela@exchange.uci.edu
20141021_150239.928_128.200.22.13: End Request. Took 44ms.

Using Splunk > Manager >> Data Inputs >> Files & Directories >> Data Preview I was able to Specify a pattern or regex to break before and this is the regex that I gave it:

(\d{8}_\d{1,6}\.\d{3}_)(\d{1,3}\.){3}\d{1,3}: Request Begin\.

This resulted in this (1st record)

1   10/14/01 4:15:28.000 PM 
20141021_150239.928_128.200.22.13: Request Begin. User Agent: Microsoft Office/14.0 (Windows NT 6.1; Microsoft Outlook 14.0.7128; Pro)
20141021_150239.928_128.200.22.13: XML Message: <?xml version="1.0" encoding="utf-8"?><Autodiscover xmlns="http://schemas.microsoft.com/exchange/autodiscover/outlook/requestschema/2006"><Request><EMailAddress>bvarela@uci.edu</EMailAddress><AcceptableResponseSchema>http://schemas.microsoft.com/exchange/autodiscover/outlook/responseschema/2006a</AcceptableResponseSchema></Request></Autodiscover>
20141021_150239.928_128.200.22.13: **** Start Header Dump ****
20141021_150239.928_128.200.22.13:  Cache-Control: no-cache
20141021_150239.928_128.200.22.13:  Connection: Keep-Alive
20141021_150239.928_128.200.22.13:  Pragma: no-cache
20141021_150239.928_128.200.22.13:  Content-Length: 348
20141021_150239.928_128.200.22.13:  Content-Type: text/xml
20141021_150239.928_128.200.22.13:  Cookie: OutlookSession="{54AE4359-2E0C-4A13-9486-1DD48DAD6B66}"
20141021_150239.928_128.200.22.13:  Host: autodiscover.uci.edu
20141021_150239.928_128.200.22.13:  User-Agent: Microsoft Office/14.0 (Windows NT 6.1; Microsoft Outlook 14.0.7128; Pro)
20141021_150239.928_128.200.22.13:  X-User-Identity: bvarela@uci.edu
20141021_150239.928_128.200.22.13:  Depth: 0
20141021_150239.928_128.200.22.13: **** End Header Dump ****
20141021_150239.928_128.200.22.13: Email address "bvarela@uci.edu" retrieved from XML request.
20141021_150239.928_128.200.22.13: Request: bvarela@uci.edu; Redirect: bvarela@exchange.uci.edu
20141021_150239.928_128.200.22.13: End Request. Took 44ms.

The format of the time/date-stamp & IP before each colon is:

YYYYmmdd_(24hr)(min)(sec).(millisec)_(ipnumber)

or, put another way

YYYYmmdd_HHMMss.mmm_(ipnumber)

So in this last example (20141021_150239.928_128.200.22.13) we would be expecting 10/21/2014 3:02:39.928 PM for the timestamp, but Splunk is not getting this. Plus, it would be nice if I could reformat the IP as being a separate field, removing the '_' and having IP=128.200.22.13, and also would be great to drop redundant headers through the remainder of the log entry.

Any ideas?

0 Karma

somesoni2
Revered Legend

Give this a try (either in props.conf directly OR in Data Preview -> Advanced mode)

BREAK_ONLY_BEFORE=(\d{8}_\d{1,6}\.\d{3}_)(\d{1,3}\.){3}\d{1,3}: Request Begin\.
MAX_TIMESTAMP_LOOKAHEAD=25
NO_BINARY_CHECK=1
SEDCMD-ipaddr=s/(\d{8}_\d{6}\.\d{3})_(.*)/\1 IP=\2/
SEDCMD-removeextra=s/(\d{8}_\d{6}\.\d{3}_\d+\.\d+\.\d+\.\d+\:\s*)//g
SHOULD_LINEMERGE=true
TIME_FORMAT=%Y%m%d_%H%M%S.%3Q_
0 Karma

wrangler2x
Motivator

This works brilliantly except the SEDCMD-ipaddr sed command. After much head scratching I realized that the equal sign in the replace string was causing the sed to fail. Then, of course, the SEDCMD-removeextra sed command removed all, leaving no IP address at all.

I can use IP: and it works fine. Is there some way to include the = sign though? I tried this in regular sed on linux and it had no problem with the = sign, so it must be unique to splunk.

0 Karma
Get Updates on the Splunk Community!

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Wednesday, May 29, 2024  |  11AM PST / 2PM ESTRegister now and join us to learn more about how you can ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer Certification at ...

We’re excited to announce a new Splunk certification exam being released at .conf24! If you’re headed to Vegas ...