Getting Data In

Are there best practices when indexing very large log files while they are being rotated?

romedome
Path Finder

Are there any best practices or recommendations when dealing with very large log files?

I have a 50 GB log file that takes 2-3 hours to rotate. While this happens, a lot of events get duplicated. The rotation script in place right now does a cp and then truncate, but I see around 1,500 events on the forwarder saying:

Will begin reading at offset=0 

throughout the rotation period.

Any thoughts?

0 Karma

pgreer_splunk
Splunk Employee
Splunk Employee

How are you configured at the moment? Universal forwarder sending straight to an indexer?

You could consider excluding the data through the props.conf configuration file to exclude data that is not of value before it is indexed.

Here's a question/answer that might be relevant: https://answers.splunk.com/answers/44865/remove-out-section-of-log.html

Here's a cisco specific howto on excluding events in networking logs that had little value, it too could be relevant to what you're looking to do:

http://networkerslog.blogspot.com/2012/01/how-to-filter-unwanted-data-without.html

  • it does require the use of a 'heavy forwarder' to parse the data and use regular expressions to exclude data you don't want to index before it is sent to the indexer(s).

Hope that helps!

romedome
Path Finder

Yes, we're using an universal forwarder and doing the filtering on the indexer. Only 5% of the logs in the logfile are relevant.

The problem is not the filtering, though. When you rotate a 50GB file weird things start to happen as contention becomes and issue. The issue at the heart of this is that having a single file accumulate 50GB of data in a single day is a bad practice. I found a few posts stating that the upper limit of a log file is a few GB before a forwarder starts to have issues.

I got together with the application developer and the person that wrote the log rotation script and they're going to have to redesign the logging to add more structure (have separate files for unrelated events) and change the way the file is being rotated.

Thanks!

0 Karma
Get Updates on the Splunk Community!

Splunk Observability Cloud's AI Assistant in Action Series: Auditing Compliance and ...

This is the third post in the Splunk Observability Cloud’s AI Assistant in Action series that digs into how to ...

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...

What You Read The Most: Splunk Lantern’s Most Popular Articles!

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...