Getting Data In

Best way to monitor tens of thousands of GZipped files?

Jason
Motivator

I'm at a client now that needs to import files from their centralized log server, where they have tens of thousands of GZipped files. The files are not active and do not require monitoring once read, but are not allowed to be deleted like when using Batch mode.

Is there a mode similar to Batch mode that does not remove the file? Putting a Monitor on this many files (even with whitelists) kills all other monitors, including those on Splunk's own log files.

Tags (3)
1 Solution

amrit
Splunk Employee
Splunk Employee

a few ideas...

  • if they have the disk space, copy all the files into var/spool/splunk, so only the copies get deleted after indexing.

  • if they don't have the space, copy them to spool in batches of say, 500 or so files at a time (we can provide a shell script that you can run in a screen session, or something of the sort...)

  • upgrade to 4.1.x, which will happily chug along with 10s of thousands of files monitored.

the last solution is preferred... 🙂

View solution in original post

gkanapathy
Splunk Employee
Splunk Employee

You may consider writing a script that invokes ./splunk add oneshot /path/to/file.gz. This will have Splunk index the specified file once, and then forget about it. It's not actually the same, but it does work similarly to batch mode without removing the file.

0 Karma

amrit
Splunk Employee
Splunk Employee

a few ideas...

  • if they have the disk space, copy all the files into var/spool/splunk, so only the copies get deleted after indexing.

  • if they don't have the space, copy them to spool in batches of say, 500 or so files at a time (we can provide a shell script that you can run in a screen session, or something of the sort...)

  • upgrade to 4.1.x, which will happily chug along with 10s of thousands of files monitored.

the last solution is preferred... 🙂

Jason
Motivator

I ended up setting up a 4.1.x forwarder to the 4.0.x indexer - both on the same box. The forwarder used monitor to keep an eye on the large directory, with a blacklist to ignore files more than a few days ago on the initial load.

0 Karma

Dan
Splunk Employee
Splunk Employee

With this much data, shouldn't you take some care to feed in the files in a time-coherent order?

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to July and August Tech Talks, Office Hours, and Webinars!

Dive into our sizzling summer lineup for July and August Community Office Hours and Tech Talks. Scroll down to ...

Edge Processor Scaling, Energy & Manufacturing Use Cases, and More New Articles on ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Get More Out of Your Security Practice With a SIEM

Get More Out of Your Security Practice With a SIEMWednesday, July 31, 2024  |  11AM PT / 2PM ETREGISTER ...