Getting Data In

Monitoring file truncates earlier than expected, but uploading file works just fine.

khevans
Path Finder

I have an input setup on a universal forwarder where I am monitoring a log file. The monitor on Splunk seems to read the file line-by-line and is truncating log entries way too early. Uploading the file into Splunk works just fine though.

Here is an example of a log entry that I am trying to read:

2019-07-08 22:25:42.314 INFO [MessageHandler.java:91] Processing the following message from Queue 
------------------------------------
<Metadata ready="false">
    <ApplicableStartTime>2019-07-07T23:11:39.000</ApplicableStartTime>
    <ApplicableEndTime>2019-08-04T23:36:29.000</ApplicableEndTime>
    <ActualStartTime>2019-07-07T22:49:51.000</ActualStartTime>
    <ActualEndTime>2019-08-04T23:58:50.000</ActualEndTime>
    <FileName>test.bsp</FileName>
    <FileID>...</FileID>
    <Status>...</Status>
    <Grade>...</Grade>
</Metadata>

------------------------------------

However, when I search for this data, I find an entry that looks like this:

2019-07-11 17:00:27.192 INFO [MessageHandler.java:91] Processing the following message from Queue 
------------------------------------
<Metadata ready="true">

Then, when I search for the time within the <ApplicableStartTime> XML tag, I've found this as a standalone event:

<ApplicableStartTime>2019-07-07T23:11:39.000</ApplicableStartTime>

It seems that the monitor is reading the file line-by-line instead of respecting the line break rules defined in the props.conf, then the date parser takes the incorrect time as the log time. The logger might be flushing the file to disk after each line but that is something that is completely out of my control.

Here is the sourcetype in my props.conf:

[my-sourcetype]
BREAK_ONLY_BEFORE_DATE =
DATETIME_CONFIG =
LINE_BREAKER = ([\r\n\s]*)\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}\.\d{3}\s[\w+\s\[\w\.]+(\:\d+)?\]
MAX_EVENTS = 2000
MAX_TIMESTAMP_LOOKAHEAD = 23
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
TIME_FORMAT = %Y-%m-%d %H:%M:%S.%3N
TIME_PREFIX = ^
TRUNCATE = 50000
category = Custom
description = This is my source type yo
pulldown_type = true

When I test this out by uploading the file directly, this source type works just fine AND reads the entire log entry as a single one. How can I adjust this so it works for the monitor:// input type as well?


Edit: here is the properties from the upload:

screenshot of file upload props


Edit 2: where does the event breaking and fitting to a sourcetype actually occur? I did change the sourcetype on my universal forwarder instance. Do I need to rename the sourcetype on the UF, or will the indexer/searcher update the sourcetype? Or does it not matter?


Edit 3: I've also tried using these parameters in the inputs.conf under the monitor:// stanza:

multiline_event_extra_waittime = true
time_before_close = 15

and it did not seem to affect it.


Edit 4 (6/15):

I had this extra capturing group (\:\d+)? which should've probably been (?:\:\d+)? in the LINE_BREAKER setting. Would that cause an issue like this? I've since removed the optional-ness, so it's just \:\d+. And now: waiting until these messages occur again in our production environment.

Edit 5: Nope, that was not the issue. 😞


Edit 6: Moved it to the indexer + HFs apparently, and it does not work still.

0 Karma

woodcock
Esteemed Legend

You should be using this in props.conf:

[<your sourcetype>]
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n\s]+-+[\r\n\s]+)(?>\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}\.\d{3}\s\S+\s\[[^\]]+)
NO_BINARY_CHECK = true
TIME_PREFIX = ^
MAX_TIMESTAMP_LOOKAHEAD = 23
TIME_FORMAT = %Y-%m-%d %H:%M:%S.%3N
TRUNCATE = 50000
MAX_EVENTS = 2000
category = Custom
description = This is my source type yo ho ho

If you are doing a sourcetype override/overwrite, you must use the ORIGINAL value, NOT the new value, then you must deploy this to the first full instance(s) of Splunk that handles the events (usually either the HF tier, if you use one, or else your Indexer tier), restart all Splunk instances there, send in new events (old events will stay broken), then test using _index_earliest=-5m to be absolutely certain that you are only examining the newly indexed events.

0 Karma

khevans
Path Finder

...first full instance(s) of Splunk that handles the events (usually either the HF tier, if you use one, or else your Indexer tier)

I wasn't updating it on the indexer (d'oh), just been trying it on the universal forwarder. I've contacted my company's Splunk team and hopefully they should be able to update the props.conf on the indexer cluster today or tomorrow. Hopefully that'll do it.

0 Karma

woodcock
Esteemed Legend

Be sure to come back here and click Accept if it works.

0 Karma

khevans
Path Finder

Didn't seem to work. I will continue working with our Splunk admins and see if they can provide any insight on it.

0 Karma

woodcock
Esteemed Legend

Did you see my warning paragraph at the end? There are several things to beware when evaluating effectiveness of the change(s).

0 Karma

woodcock
Esteemed Legend

The Logback code must be carefully setup or you will have these problems; see the conversation here which includes your solution:
https://stackoverflow.com/questions/11829922/logback-file-appender-doesnt-flush-immediately

0 Karma

khevans
Path Finder

I don't think this is the issue. If I tail -f the file, I can see the entries coming in and it's nearly instantaneous after I trigger an event. I also did watch -c -d -n 0.5 'tail -n 10 ... && date' and it's always under a second delay.

The other multiline entries, which don't include timestamps in the additional lines, work just fine and I'm able to search through them as multiline log entries. I think specifically, something about the timestamps aren't being handled correctly.

0 Karma

woodcock
Esteemed Legend

If you are using rsyslog, older versions have a well-known bug that causes it to have HUGE pauses while writing (NEVER use rsyslog; always use syslog-ng). If the thing writing is something else, it might have a similar flush to disk problem which would be BEST fixed by taking a clue-by-four to the developer. Such is rarely practical, so the only other option is to tell Splunk to "wait a while before assuming the event is done writing" with this setting in inputs.conf:
https://docs.splunk.com/Documentation/Splunk/latest/Data/Monitorfilesanddirectorieswithinputs.conf

time_before_close = <integer>
The modification time delta required before Splunk Enterprise can close a file on End-of-file (EOF).
Tells the system not to close files that have been updated in the past <integer> seconds.
Defaults to 3.
0 Karma

landen99_gdms
Engager

We are seeing a similar kind of issue with syslog-ng.  A partial event combined with a full event merged into one event, periodically.

0 Karma

khevans
Path Finder

Thanks for the reply! I'm actually using a Logback, a Java logger similar to log4j. As far as I can tell, it flushes to the disk fairly quickly. I actually tried using that setting over the weekend (set at 15 seconds), and unfortunately it did not seem to affect it.

I used these settings:

 multiline_event_extra_waittime = true
 time_before_close = 15
0 Karma

khevans
Path Finder

FWIW, I've tried using multiline_event_extra_waittime and time_before_close and it does not seem to affect it. Is it possible that this is caused by having the file forwarded?

0 Karma

marycordova
SplunkTrust
SplunkTrust

can you post a screen clip of the props options from the upload?

@marycordova
0 Karma

khevans
Path Finder

Here is the screenshot from the upload: https://i.imgur.com/2AlJ6bJ.png

I chose the sourcetype defined in the post to upload it.

0 Karma
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...