Getting Data In

Splunking mutiline logfiles

Josh
Path Finder

How do I setup multiline log files in splunk, specifically we have a set of logs which are irregular, Log entries do not begin with a date but the first line contains the log date this is then followed by x number of logs which pertains to the particular log entry.

What is the best practice to follow when splunking such a file?

Example log:

5688: This is xxxxxxxxxxxxxxxxxxx
5688: 
5688: Running multithreaded
5688: 
5688: PROGNUM: 20000091
5688: 
5688: NOT Publishing trade links
WRN:  Fri Jan 22 03:53:04 2010 xxxxxxxxxxx[4488] "\CC_Views\draha_BookSegregation\coretech_scm\CORETECH_EDG\Source\EDG_NDT\Source\sqlint\sqlConnection.C": 288
---
$CSFPR_SYBASE_TIMEOUT not set - using default timeout of 900 seconds

ERR:  Fri Jan 22 03:53:06 2010 xxxxxxxxxxxxx[4488] "\CC_Views\draha_BookSegregation\coretech_scm\CORETECH_EDG\Source\EDG_NDT\Source\sqlint\sqlCtlibConnection.C": 398
---
WARNING. Timout setting disabled pending CT-Lib bug-fix. setenv SQLINT_USE_TIMEOUT to enable

5688: Using url tcp://xxxxxx/dtd/tradeserverpubliser/publish for notification
5688: Using tcp://xxxxxxxxx/dtd/tradeservice/translation;tcp://127.0.0.1:9230/dtd/tradeservice/translation for translation
ERR:  Thu Jan 28 05:15:29 2010 xxxxxxxxxxxxxxxxxx[4488] "\CC_Views\draha_BookSegregation\coretech_scm\CORETECH_EDG\Source\EDG_RPCRW\Source\libRPCRW\RPCinteractorRW.C": 1521
---
RPCinteractor::getDomain - dname env variable not set. Ignoring

4436: 
4436: ################################################
4436: From xxxxxxxxxxxxxxx 28 Jan 2010 05:15:30 (pid=6084,TSRW.50.4,TSserverInteractor)
4436: IN ### PagefileUsage: 19,016 K, PeakPagefileUsage: 21,052 K
4436: Get Owner Groups
4436: *** WARNING *** - table OGM_PERMISSIONS contains reference to unknown user xxxx for owner group FIRM - ignoring this entry
4436: *** WARNING *** - table OGM_PERMISSIONS contains reference to unknown user xxxxx for owner group FIRM - ignoring this entry
4436: *** WARNING *** - table OGM_PERMISSIONS contains reference to unknown user xxxxxfor owner group FIRM - ignoring this entry
4436: Created TSuserPerm <id=xxxx>

TA 🙂

2 Solutions

Lowell
Super Champion

You will want to setup custom sourcetypes for these log files. Then setup your event breakers and optionally timestamp parsing logic. This is a big topic, and the docs are the best place to get started.

http://docs.splunk.com/Documentation/Splunk/5.0/Data/WhatSplunkcanmonitor

You may also want to getting familiar with the props.conf file.

http://docs.splunk.com/Documentation/Splunk/5.0/Admin/Propsconf

If you update your post to be more readable (with <pre> tags), then I'm sure someone will take a crack at getting you an example props entry to get you started. It can be a little overwhelming at first, but once you get familiar with setting up these kinds of sources, it's pretty easy to get splunk to index almost any input.

Update:

Try something like:

[my_wacky_log_file]
SHOULD_LINEMERGE = True
BREAK_ONLY_BEFORE_DATE = True
# Breaking events on lines like:  "ERR:  Thu Jan 28 05:15:29 2010"
TIME_PREFIX = ^\w+:\s+
TIME_FORMAT = %a %b %d %T %Y

Additional note:

If you log content is truly difficult to get splunk to properly handle out of the box, you always have the option of reading a log file with a custom input script which does some pre-processing work on your log file. (For example, if you need to event breaking when the prefixed number changes (In your example, I see 5688: and 4436:), that would be pretty easy to do programmatically, but not something splunk would do out of the box. At least, not at index time.) You could even translate your timestamps to a common format, or make different parts of your log file appear to be coming from different sources (filenames), It's very flexible.

Helpful docs links:

I'd recommend keeping the scripted input option in reserve, I've rarely found it necessary to take this approach, but it's good to know that you have this option when you truly need it.

View solution in original post

gkanapathy
Splunk Employee
Splunk Employee

Default settings in Splunk will merge multi-line logs and break a new event when it sees a date on a line. (i.e., SHOULD_LINEMERGE = true and BREAK_ONLY_BEFORE_DATE = true). However, Splunk is also rather aggressive about interpreting things in lines as dates because of the many different date formats (for example, if you have other dates in the middle of your data, or a number that looks like it might a date all mashed together), so in a case like yours where the dates are rare, it would be a good idea to let Splunk know the exact timestamp format and location so it doesn't guess wrong:

TIME_PREFIX = ^\w{3}:\s+
TIME_FORMAT = %a %b %d %H:%M:%S %Y
MAX_TIMESTAMP_LOOKAHEAD = 35

I'm guessing myself here, since I can't exactly tell where you want to break your events, but I'm assuming it's the ERR: and WRN: lines that mark the start of an event.

You might also consider increasing the max number of lines merged from the default of 256 if you have events longer than that.:

MAX_EVENTS = 1000

View solution in original post

gkanapathy
Splunk Employee
Splunk Employee

Default settings in Splunk will merge multi-line logs and break a new event when it sees a date on a line. (i.e., SHOULD_LINEMERGE = true and BREAK_ONLY_BEFORE_DATE = true). However, Splunk is also rather aggressive about interpreting things in lines as dates because of the many different date formats (for example, if you have other dates in the middle of your data, or a number that looks like it might a date all mashed together), so in a case like yours where the dates are rare, it would be a good idea to let Splunk know the exact timestamp format and location so it doesn't guess wrong:

TIME_PREFIX = ^\w{3}:\s+
TIME_FORMAT = %a %b %d %H:%M:%S %Y
MAX_TIMESTAMP_LOOKAHEAD = 35

I'm guessing myself here, since I can't exactly tell where you want to break your events, but I'm assuming it's the ERR: and WRN: lines that mark the start of an event.

You might also consider increasing the max number of lines merged from the default of 256 if you have events longer than that.:

MAX_EVENTS = 1000

gkanapathy
Splunk Employee
Splunk Employee

Multiple date formats can also be accepted, but it's much harder to configure.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

there are two things: linebreaking can be done using a REGEX line breaker if there are multiple types of ways to break new lines, and prefix simply means that is has to occur somewhere before the date, not immediately before, so yes.

0 Karma

Josh
Path Finder

Can this be done if you have more than one type of prefix for a log date? And also more than once date format in the same log file?

Your are correct to assume that events should be broken @ ERR: and WRN:, however this log file is very irregular events also need to be broken @

4436: ################################################
4436:

Thanks,
Josh

0 Karma

Lowell
Super Champion

gkanapathy, just notice that the docs say that BREAK_ONLY_BEFORE_DATE defaults to false, but $SPLUNK_HOME/etc/system/default/props.conf definitely sets the default to "True". Do you know anyone who can get the docs updated?

0 Karma

Lowell
Super Champion

You will want to setup custom sourcetypes for these log files. Then setup your event breakers and optionally timestamp parsing logic. This is a big topic, and the docs are the best place to get started.

http://docs.splunk.com/Documentation/Splunk/5.0/Data/WhatSplunkcanmonitor

You may also want to getting familiar with the props.conf file.

http://docs.splunk.com/Documentation/Splunk/5.0/Admin/Propsconf

If you update your post to be more readable (with <pre> tags), then I'm sure someone will take a crack at getting you an example props entry to get you started. It can be a little overwhelming at first, but once you get familiar with setting up these kinds of sources, it's pretty easy to get splunk to index almost any input.

Update:

Try something like:

[my_wacky_log_file]
SHOULD_LINEMERGE = True
BREAK_ONLY_BEFORE_DATE = True
# Breaking events on lines like:  "ERR:  Thu Jan 28 05:15:29 2010"
TIME_PREFIX = ^\w+:\s+
TIME_FORMAT = %a %b %d %T %Y

Additional note:

If you log content is truly difficult to get splunk to properly handle out of the box, you always have the option of reading a log file with a custom input script which does some pre-processing work on your log file. (For example, if you need to event breaking when the prefixed number changes (In your example, I see 5688: and 4436:), that would be pretty easy to do programmatically, but not something splunk would do out of the box. At least, not at index time.) You could even translate your timestamps to a common format, or make different parts of your log file appear to be coming from different sources (filenames), It's very flexible.

Helpful docs links:

I'd recommend keeping the scripted input option in reserve, I've rarely found it necessary to take this approach, but it's good to know that you have this option when you truly need it.

Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...