Solved: Splunking mutiline logfiles

Josh · ‎04-14-2010

How do I setup multiline log files in splunk, specifically we have a set of logs which are irregular, Log entries do not begin with a date but the first line contains the log date this is then followed by x number of logs which pertains to the particular log entry.

What is the best practice to follow when splunking such a file?

Example log:

5688: This is xxxxxxxxxxxxxxxxxxx
5688: 
5688: Running multithreaded
5688: 
5688: PROGNUM: 20000091
5688: 
5688: NOT Publishing trade links
WRN:  Fri Jan 22 03:53:04 2010 xxxxxxxxxxx[4488] "\CC_Views\draha_BookSegregation\coretech_scm\CORETECH_EDG\Source\EDG_NDT\Source\sqlint\sqlConnection.C": 288
---
$CSFPR_SYBASE_TIMEOUT not set - using default timeout of 900 seconds

ERR:  Fri Jan 22 03:53:06 2010 xxxxxxxxxxxxx[4488] "\CC_Views\draha_BookSegregation\coretech_scm\CORETECH_EDG\Source\EDG_NDT\Source\sqlint\sqlCtlibConnection.C": 398
---
WARNING. Timout setting disabled pending CT-Lib bug-fix. setenv SQLINT_USE_TIMEOUT to enable

5688: Using url tcp://xxxxxx/dtd/tradeserverpubliser/publish for notification
5688: Using tcp://xxxxxxxxx/dtd/tradeservice/translation;tcp://127.0.0.1:9230/dtd/tradeservice/translation for translation
ERR:  Thu Jan 28 05:15:29 2010 xxxxxxxxxxxxxxxxxx[4488] "\CC_Views\draha_BookSegregation\coretech_scm\CORETECH_EDG\Source\EDG_RPCRW\Source\libRPCRW\RPCinteractorRW.C": 1521
---
RPCinteractor::getDomain - dname env variable not set. Ignoring

4436: 
4436: ################################################
4436: From xxxxxxxxxxxxxxx 28 Jan 2010 05:15:30 (pid=6084,TSRW.50.4,TSserverInteractor)
4436: IN ### PagefileUsage: 19,016 K, PeakPagefileUsage: 21,052 K
4436: Get Owner Groups
4436: *** WARNING *** - table OGM_PERMISSIONS contains reference to unknown user xxxx for owner group FIRM - ignoring this entry
4436: *** WARNING *** - table OGM_PERMISSIONS contains reference to unknown user xxxxx for owner group FIRM - ignoring this entry
4436: *** WARNING *** - table OGM_PERMISSIONS contains reference to unknown user xxxxxfor owner group FIRM - ignoring this entry
4436: Created TSuserPerm <id=xxxx>

TA 🙂

Lowell · ‎04-14-2010

You will want to setup custom sourcetypes for these log files. Then setup your event breakers and optionally timestamp parsing logic. This is a big topic, and the docs are the best place to get started.

http://docs.splunk.com/Documentation/Splunk/5.0/Data/WhatSplunkcanmonitor

You may also want to getting familiar with the props.conf file.

http://docs.splunk.com/Documentation/Splunk/5.0/Admin/Propsconf

If you update your post to be more readable (with <pre> tags), then I'm sure someone will take a crack at getting you an example props entry to get you started. It can be a little overwhelming at first, but once you get familiar with setting up these kinds of sources, it's pretty easy to get splunk to index almost any input.

Update:

Try something like:

[my_wacky_log_file]
SHOULD_LINEMERGE = True
BREAK_ONLY_BEFORE_DATE = True
# Breaking events on lines like:  "ERR:  Thu Jan 28 05:15:29 2010"
TIME_PREFIX = ^\w+:\s+
TIME_FORMAT = %a %b %d %T %Y

Additional note:

If you log content is truly difficult to get splunk to properly handle out of the box, you always have the option of reading a log file with a custom input script which does some pre-processing work on your log file. (For example, if you need to event breaking when the prefixed number changes (In your example, I see 5688: and 4436:), that would be pretty easy to do programmatically, but not something splunk would do out of the box. At least, not at index time.) You could even translate your timestamps to a common format, or make different parts of your log file appear to be coming from different sources (filenames), It's very flexible.

Helpful docs links:

I'd recommend keeping the scripted input option in reserve, I've rarely found it necessary to take this approach, but it's good to know that you have this option when you truly need it.

View solution in original post

gkanapathy · ‎04-14-2010

Default settings in Splunk will merge multi-line logs and break a new event when it sees a date on a line. (i.e., SHOULD_LINEMERGE = true and BREAK_ONLY_BEFORE_DATE = true). However, Splunk is also rather aggressive about interpreting things in lines as dates because of the many different date formats (for example, if you have other dates in the middle of your data, or a number that looks like it might a date all mashed together), so in a case like yours where the dates are rare, it would be a good idea to let Splunk know the exact timestamp format and location so it doesn't guess wrong:

TIME_PREFIX = ^\w{3}:\s+
TIME_FORMAT = %a %b %d %H:%M:%S %Y
MAX_TIMESTAMP_LOOKAHEAD = 35

I'm guessing myself here, since I can't exactly tell where you want to break your events, but I'm assuming it's the ERR: and WRN: lines that mark the start of an event.

You might also consider increasing the max number of lines merged from the default of 256 if you have events longer than that.:

MAX_EVENTS = 1000

View solution in original post

gkanapathy · ‎04-14-2010

Default settings in Splunk will merge multi-line logs and break a new event when it sees a date on a line. (i.e., SHOULD_LINEMERGE = true and BREAK_ONLY_BEFORE_DATE = true). However, Splunk is also rather aggressive about interpreting things in lines as dates because of the many different date formats (for example, if you have other dates in the middle of your data, or a number that looks like it might a date all mashed together), so in a case like yours where the dates are rare, it would be a good idea to let Splunk know the exact timestamp format and location so it doesn't guess wrong:

TIME_PREFIX = ^\w{3}:\s+
TIME_FORMAT = %a %b %d %H:%M:%S %Y
MAX_TIMESTAMP_LOOKAHEAD = 35

I'm guessing myself here, since I can't exactly tell where you want to break your events, but I'm assuming it's the ERR: and WRN: lines that mark the start of an event.

You might also consider increasing the max number of lines merged from the default of 256 if you have events longer than that.:

MAX_EVENTS = 1000

gkanapathy · ‎04-15-2010

Multiple date formats can also be accepted, but it's much harder to configure.

gkanapathy · ‎04-15-2010

there are two things: linebreaking can be done using a REGEX line breaker if there are multiple types of ways to break new lines, and prefix simply means that is has to occur somewhere before the date, not immediately before, so yes.

Josh · ‎04-15-2010

Can this be done if you have more than one type of prefix for a log date? And also more than once date format in the same log file?

Your are correct to assume that events should be broken @ ERR: and WRN:, however this log file is very irregular events also need to be broken @

4436: ################################################
4436:

Thanks,
Josh

Lowell · ‎04-14-2010

gkanapathy, just notice that the docs say that BREAK_ONLY_BEFORE_DATE defaults to false, but $SPLUNK_HOME/etc/system/default/props.conf definitely sets the default to "True". Do you know anyone who can get the docs updated?

Lowell · ‎04-14-2010

You will want to setup custom sourcetypes for these log files. Then setup your event breakers and optionally timestamp parsing logic. This is a big topic, and the docs are the best place to get started.

http://docs.splunk.com/Documentation/Splunk/5.0/Data/WhatSplunkcanmonitor

You may also want to getting familiar with the props.conf file.

http://docs.splunk.com/Documentation/Splunk/5.0/Admin/Propsconf

If you update your post to be more readable (with <pre> tags), then I'm sure someone will take a crack at getting you an example props entry to get you started. It can be a little overwhelming at first, but once you get familiar with setting up these kinds of sources, it's pretty easy to get splunk to index almost any input.

Update:

Try something like:

[my_wacky_log_file]
SHOULD_LINEMERGE = True
BREAK_ONLY_BEFORE_DATE = True
# Breaking events on lines like:  "ERR:  Thu Jan 28 05:15:29 2010"
TIME_PREFIX = ^\w+:\s+
TIME_FORMAT = %a %b %d %T %Y

Additional note:

If you log content is truly difficult to get splunk to properly handle out of the box, you always have the option of reading a log file with a custom input script which does some pre-processing work on your log file. (For example, if you need to event breaking when the prefixed number changes (In your example, I see 5688: and 4436:), that would be pretty easy to do programmatically, but not something splunk would do out of the box. At least, not at index time.) You could even translate your timestamps to a common format, or make different parts of your log file appear to be coming from different sources (filenames), It's very flexible.

Helpful docs links:

I'd recommend keeping the scripted input option in reserve, I've rarely found it necessary to take this approach, but it's good to know that you have this option when you truly need it.

Splunking mutiline logfiles

Unlock Database Monitoring with Splunk Observability Cloud

Purpose in Action: How Splunk Is Helping Power an Inclusive Future for All

[Upcoming Webinar] Demo Day: Transforming IT Operations with Splunk

Join the Conversation

Splunking mutiline logfiles

Unlock Database Monitoring with Splunk Observability Cloud

Purpose in Action: How Splunk Is Helping Power an Inclusive Future for All

[Upcoming Webinar] Demo Day: Transforming IT Operations with Splunk