Monitoring file truncates earlier than expected, b...

khevans · ‎07-11-2019

I have an input setup on a universal forwarder where I am monitoring a log file. The monitor on Splunk seems to read the file line-by-line and is truncating log entries way too early. Uploading the file into Splunk works just fine though.

Here is an example of a log entry that I am trying to read:

2019-07-08 22:25:42.314 INFO [MessageHandler.java:91] Processing the following message from Queue 
------------------------------------
<Metadata ready="false">
    <ApplicableStartTime>2019-07-07T23:11:39.000</ApplicableStartTime>
    <ApplicableEndTime>2019-08-04T23:36:29.000</ApplicableEndTime>
    <ActualStartTime>2019-07-07T22:49:51.000</ActualStartTime>
    <ActualEndTime>2019-08-04T23:58:50.000</ActualEndTime>
    <FileName>test.bsp</FileName>
    <FileID>...</FileID>
    <Status>...</Status>
    <Grade>...</Grade>
</Metadata>

------------------------------------

However, when I search for this data, I find an entry that looks like this:

2019-07-11 17:00:27.192 INFO [MessageHandler.java:91] Processing the following message from Queue 
------------------------------------
<Metadata ready="true">

Then, when I search for the time within the <ApplicableStartTime> XML tag, I've found this as a standalone event:

<ApplicableStartTime>2019-07-07T23:11:39.000</ApplicableStartTime>

It seems that the monitor is reading the file line-by-line instead of respecting the line break rules defined in the props.conf, then the date parser takes the incorrect time as the log time. The logger might be flushing the file to disk after each line but that is something that is completely out of my control.

Here is the sourcetype in my props.conf:

[my-sourcetype]
BREAK_ONLY_BEFORE_DATE =
DATETIME_CONFIG =
LINE_BREAKER = ([\r\n\s]*)\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}\.\d{3}\s[\w+\s\[\w\.]+(\:\d+)?\]
MAX_EVENTS = 2000
MAX_TIMESTAMP_LOOKAHEAD = 23
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
TIME_FORMAT = %Y-%m-%d %H:%M:%S.%3N
TIME_PREFIX = ^
TRUNCATE = 50000
category = Custom
description = This is my source type yo
pulldown_type = true

When I test this out by uploading the file directly, this source type works just fine AND reads the entire log entry as a single one. How can I adjust this so it works for the monitor:// input type as well?

Edit: here is the properties from the upload:

Edit 2: where does the event breaking and fitting to a sourcetype actually occur? I did change the sourcetype on my universal forwarder instance. Do I need to rename the sourcetype on the UF, or will the indexer/searcher update the sourcetype? Or does it not matter?

Edit 3: I've also tried using these parameters in the inputs.conf under the monitor:// stanza:

multiline_event_extra_waittime = true
time_before_close = 15

and it did not seem to affect it.

Edit 4 (6/15):

I had this extra capturing group (\:\d+)? which should've probably been (?:\:\d+)? in the LINE_BREAKER setting. Would that cause an issue like this? I've since removed the optional-ness, so it's just \:\d+. And now: waiting until these messages occur again in our production environment.

Edit 5: Nope, that was not the issue. 😞

Edit 6: Moved it to the indexer + HFs apparently, and it does not work still.

woodcock · ‎07-15-2019

You should be using this in props.conf:

[<your sourcetype>]
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n\s]+-+[\r\n\s]+)(?>\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}\.\d{3}\s\S+\s\[[^\]]+)
NO_BINARY_CHECK = true
TIME_PREFIX = ^
MAX_TIMESTAMP_LOOKAHEAD = 23
TIME_FORMAT = %Y-%m-%d %H:%M:%S.%3N
TRUNCATE = 50000
MAX_EVENTS = 2000
category = Custom
description = This is my source type yo ho ho

If you are doing a sourcetype override/overwrite, you must use the ORIGINAL value, NOT the new value, then you must deploy this to the first full instance(s) of Splunk that handles the events (usually either the HF tier, if you use one, or else your Indexer tier), restart all Splunk instances there, send in new events (old events will stay broken), then test using _index_earliest=-5m to be absolutely certain that you are only examining the newly indexed events.

khevans · ‎07-15-2019

...first full instance(s) of Splunk that handles the events (usually either the HF tier, if you use one, or else your Indexer tier)

I wasn't updating it on the indexer (d'oh), just been trying it on the universal forwarder. I've contacted my company's Splunk team and hopefully they should be able to update the props.conf on the indexer cluster today or tomorrow. Hopefully that'll do it.

woodcock · ‎07-15-2019

Be sure to come back here and click Accept if it works.

khevans · ‎07-31-2019

Didn't seem to work. I will continue working with our Splunk admins and see if they can provide any insight on it.

woodcock · ‎08-01-2019

Did you see my warning paragraph at the end? There are several things to beware when evaluating effectiveness of the change(s).

woodcock · ‎07-15-2019

The Logback code must be carefully setup or you will have these problems; see the conversation here which includes your solution:
https://stackoverflow.com/questions/11829922/logback-file-appender-doesnt-flush-immediately

khevans · ‎07-15-2019

I don't think this is the issue. If I tail -f the file, I can see the entries coming in and it's nearly instantaneous after I trigger an event. I also did watch -c -d -n 0.5 'tail -n 10 ... && date' and it's always under a second delay.

The other multiline entries, which don't include timestamps in the additional lines, work just fine and I'm able to search through them as multiline log entries. I think specifically, something about the timestamps aren't being handled correctly.

woodcock · ‎07-15-2019

If you are using rsyslog, older versions have a well-known bug that causes it to have HUGE pauses while writing (NEVER use rsyslog; always use syslog-ng). If the thing writing is something else, it might have a similar flush to disk problem which would be BEST fixed by taking a clue-by-four to the developer. Such is rarely practical, so the only other option is to tell Splunk to "wait a while before assuming the event is done writing" with this setting in inputs.conf:
https://docs.splunk.com/Documentation/Splunk/latest/Data/Monitorfilesanddirectorieswithinputs.conf

time_before_close = <integer>
The modification time delta required before Splunk Enterprise can close a file on End-of-file (EOF).
Tells the system not to close files that have been updated in the past <integer> seconds.
Defaults to 3.

landen99_gdms · ‎06-27-2023

We are seeing a similar kind of issue with syslog-ng. A partial event combined with a full event merged into one event, periodically.

khevans · ‎07-15-2019

Thanks for the reply! I'm actually using a Logback, a Java logger similar to log4j. As far as I can tell, it flushes to the disk fairly quickly. I actually tried using that setting over the weekend (set at 15 seconds), and unfortunately it did not seem to affect it.

I used these settings:

 multiline_event_extra_waittime = true
 time_before_close = 15

khevans · ‎07-15-2019

FWIW, I've tried using multiline_event_extra_waittime and time_before_close and it does not seem to affect it. Is it possible that this is caused by having the file forwarded?

marycordova · ‎07-11-2019

can you post a screen clip of the props options from the upload?

@marycordova

khevans · ‎07-11-2019

Here is the screenshot from the upload: https://i.imgur.com/2AlJ6bJ.png

I chose the sourcetype defined in the post to upload it.

Monitoring file truncates earlier than expected, but uploading file works just fine.

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

From Data to Insight: Announcing the Winners of the Splunk Dashboard Contest

Splunk Developers: Construct Your Future at the .conf26 Builder Bar

Quick connection discovery mode for forwarders

Join the Conversation