I have an input setup to monitor a folder where new log files get generated daily. Today however, a bad process generated a file that is 4GB in size, the process that generated it was stuck in an infinite loop, generating the same log output over and over, until we noticed and killed the process. That shot up my splunk limit for the day in one pass. I have 2 questions:
Is there a way to delete entirely that one file from the index (and reclaim the space that it used up)?
Is there a way to tell splunk to automatically ignore such bad files? Like a limit, tell it to stop indexing any file if it grows bigger than a certain size (say 100MB)
Any over clever setting that could tell splunk to watch out for such occurrences? (where a rogue process gets stuck in an infinite loop and generates garbage logs ad vitam eternam)
First, there is only one way to delete a file from an index and reclaim the space: Use the Splunk clean command to remove everything from the index and then re-index just what you want. This isn't usually a practical solution for production environments.
However, you can use the delete command to remove the data from Splunk. The space on disk is not reclaimed, but the data from the file will never be searchable by anyone. This action cannot be undone.
AFAIK, there is no way to tell Splunk to index (or not to index) inputs based on their size. However, you could write a script that examines file sizes on disk and outputs that info. If you index the output of the script, you could easily write an search that would send an alert based on the file size. How hard this would be -- well, that depends on your programming skills.
You might also want to look at the Deployment Monitor that is part of Splunk 4.2. The Deployment Monitor has some dashboards and alerts that you can use to notify you if Splunk is indexing "too much" of a particular source or sourcetype, etc. Even if the Deployment Monitor is not exactly what you want, you can look at the searches and alerts it uses, as a starting point for writing your own.