Getting Data In

How do we keep Splunk from complaining about an event it's going to drop anyway?

Path Finder

Here's the setup:

We have a sourcetype that we exclude certain events by routing them to the nullQueue based on a REGEX with a props/transforms stanza duo. This action works as we would expect; the events we do not want are not indexed.

What we've come to learn is that some of these lines are more than 10,000 characters. We have not updated the TRUNCATE setting for these events because we drop them and to our knowledge no other events in the file exceed this length.

We didn't realize that LineBreakingProcessor would still complain about truncating the events. If I understand Splunk data correctly, the nullQueue routing occurs at the index pipeline which is downstream from the parsing pipeline.

In an effort to clear up our _internal and reduce the amount of time Splunk spends on the parsing pipeline, is there any way to catch these events before they hit the parsing queue? Perhaps at the forwarder layer?

If not, what are the performance implications of continually raising the TRUNCATE setting? We've done this for several data sources that we need and are actually really long (splunkd_remote_searches, itsi_internal_log, various structured XML/JSON data sources) and I would like to be confident we are not setting ourselves up for more issues. We index multiple terabytes a day.

Thank you!

0 Karma

SplunkTrust
SplunkTrust

The best performance is not to send it. If you use Universal Forwarders to pick up the files then I would black list those files from the folder and just send the ones you want. Making Splunk process it for null queue at scale can seriously backup your indexing pipelines.

0 Karma

Path Finder

The challenge is that we want most of the events in the log, so blacklisting the file will not work for us. It's only certain junk lines within the file we are indexing that we want to discard. If there's a way to filter only certain events within a log file we need this at the UF layer, please share!

example.log
TIMESTAMP INFO INFO StuffWeWant
TIMESTAMP INFO INFO StuffWeWant
TIMESTAMP INFO INFO StuffWeWant
TIMESTAMP INFO INFO StuffWeWant
TIMESTAMP ABSOLUTE JUNK THAT MAKES NO SENSE AND JUST CONFUSES SEARCHES
TIMESTAMP INFO INFO StuffWeWant
TIMESTAMP INFO INFO StuffWeWant
TIMESTAMP INFO INFO StuffWeWant
TIMESTAMP INFO INFO StuffWeWant
TIMESTAMP INFO INFO StuffWeWant

0 Karma

SplunkTrust
SplunkTrust

There is no parsing and filtering at the UF with the one exception being windows event codes. If you require inspecting parsing the log stream to decide to throw it away then you are already doing what you can. Best you can do is keep increasing truncate which can impact memory etc. The best option is going to tune what you log. If this is some application log you control I would put into your roadmap to log application events via HEC and get away from trying to eat things java logs. You could consider increasing your input pipelines so you are less likely to jam inbound for other sources. Or send these logs onto to dedicated inputs on certain systems to segregate it.

0 Karma

Path Finder

I was afraid of that. The events are quite infrequent and we run at a pretty low mem% so this may be the only option for now. Thanks!

I just found it odd that Splunk would process the event through the entire pipeline just to throw it away at the end. Seems it would make more sense to pick up that the event was destined for the nullQueue and drop it first before applying additional processing.

0 Karma

SplunkTrust
SplunkTrust

That is based on my experience. You can always open a support ticket and see what Splunk says.

0 Karma