All Apps and Add-ons

Events are not properly split

eirc
Engager

Hello,

We have a multiline event log setup and are noticing many events are not properly split.

The setup we're using is we monitor some log files with splunk forwarder and set an index and sourcetype there. The inputs.conf file on the forwarder looks like this:

[monitor:///path/to/logs/production*.log]
index = index_name
sourcetype = sourcetype_name
time_before_close = 120

Note here that we are monitoring multiple log files that are rotated with copy truncate, so they are always "live" and receive new data. This does not seem to have caused any trouble, especially with the long configured time_before_close. Also lines of events are not written to the files in a single write but in a streaming fashion.

On the splunk server we have configured the sourcetype to properly split the events with a props.conf file that looks like:

[sourcetype_name]
TRUNCATE = 0
SHOULD_LINEMERGE = true
BREAK_ONLY_BEFORE_DATE = false
BREAK_ONLY_BEFORE = Regex that begins the first line of events (and only that :)
MAX_EVENTS = 100000000

So while "most" events are properly split, some are broken in two or more. After some more digging it looks like it happens in a two minute interval per file so we imagine that it could be related to the time_before_close config.

Finally if it's of any relevance we're on splunk 5.0.2.

vbumgarner
Contributor

That appears to have been added in Splunk 6.3. Good to know it’s an option now.

0 Karma

splunk_kk
Path Finder

Did u try using multiline_event_extra_waittime = true?

0 Karma

vbumgarner
Contributor

The problem is "in streaming fashion". If you are writing events in chunks, say 4k chunks, every time the Splunk process hits EOF, it will call that the end of the event. When the next 4k chunk is written, the next event will start where it left off, somewhere in the middle of your event.

I don't know a way around this problem, unfortunately. What Splunk really needs is a setting that says, "don't send the last event until you see this pattern at the beginning of a line." Or after a timeout.

This wouldn't guaruntee anything, since your process could sit on the next chunk until after the timeout, but it would be better than the current behavior.

You also mentioned copy and truncate. If you can get out of that business and simply use dated file names, you will also be better off.

0 Karma

eirc
Engager

Thanks for the detailed answer! But that's a bummer... We actually thought the time_before_close setting would act as the timeout you mentioned.

The streaming happens because it's a Rails application and that is a feature of the Rails logger (to guarantee that even if everything breaks you'll still get at least half the log and also avoid using memory for buffering the log throughout a whole request).

However it seems very weird that the indexer cannot "join" events that came through 2 different updates from the forwarder. I mean that should be the point of defining the sourcetype splitting there. If the forwarder (or the one that writes to the logs) needs to know the same event splitting info that the indexer knows to ensure the log is flushed only in between events... well that's not DRY and it's a very hidden dependency.

vbumgarner
Contributor

It can't join the events because by the time it's parsed the event and sent it off to the indexer, there is nothing to tie them together anymore. That's the only way it can deal with massive data streams efficiently -- forget about what you no longer need to know. 🙂

I've actually been out of the Splunk admin game for a year or so. Maybe someone else will have a different idea...

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...