We wanted to index the log file for one of our IIS web servers. Given the fact that IIS by default writes a lot of data to the log files(log entry for every request to every asset), we had to come up with a way to reduce the log file before feeding it to Splunk and stay under our daily limit(we average 100 MB on a busy day and we have other servers that we'd like to monitor without upgrading the license).
Fun Facts: IIS either logs everything or nothing. You can't tell it to log certain activities or requests to certain assets out of the box.
Here is what we did:
On the first day we added the log file above to Splunk, we went over our limit. Splunk has indexed 200+ MB worth of data out of the log file. My guess is that Splunk is thinking it should consume the file again every time it detects it got replaced by LogParser.
My question is:
I think the main difference between a standard IIS log file and our filtered log file is that IIS appends to the log file whereas we(Log Parser) replace it and hence its CRC/header/footer information gets changed. Hence, Splunk resets its pointer for that file and thinks it needs to go back to the beginning and reindex everything.
Much obliged for any answers/comments,
Actually, thinking about this for a minute, I will try the following workaround:
Since Splunk will reindex the file every X minutes(X is the interval of our internal scheduled task), then why not include ONLY entries that had been added to the original log file in the last X minutes?
Here is what the new process will look like:
Will try this revision and post the results.
Update:
Have been running the process shown above for the last couple hours. The log file has reached 1.25 MB during that time(expected). This means that by 24 hours, we would have an index volume of roughly 12 MB for that server which is what we would expect. Compare this to 200 MB in the last couple days!