Hi all,
I have a fairly basic (but confusing) question for you all. Essentially, this is the go...:
For a prod Apache server I manage, I have just created a new index in Splunk. Now, there are numerous logs setup for each Vhost under the single Apache instance. I want to basically drag all the log files into Splunk for indexing but I have realized a few large files exist that have not been rotated...
Now, I pull my data into Splunk by Rsync, then watching the directory (or file), this directory containing all the Apache logs is not yet monitored but, I wish to add it to Splunk after removing entries in these "large" logs, that date back more than 1 month.
Which is fine, but, I don't want to run a restart on Apache now, during business hours. And not doing so will mean the log in it's previous state "huge", will just be synced right back to Splunk (after my 5 min Rsync interval).
So after all that, this is my question:
Can I edit the log and remove old events, then add the file to be indexed by Splunk. Once indexed am I able to start the automatic Rsyncs, copying the large file back to the Splunk server and not have older events indexed?
The important part... once the original events <1 month old, are indexed, when older events appear in the same log >1 month old, will they also be indexed...?
Sorry if this comes across confusing, I understand it will... Just having a bit of a hard time trying to get my situation across in words.
Thanks, Aaron.
Yes, it will work just fine, provided Splunk is reading the timestamps out of the log file, which in the case of Apache logs it usually does. In general if you are indexing many years or hundred of gigabytes of archives, you may want to adjust some settings to improve performance, but it will work just fine regardless.
Yes, it seems to be an issue I am now continually running into in getting my setup together and running.
this is a really good question, and i bet a lot of people have the same one.
Yes, it will work just fine, provided Splunk is reading the timestamps out of the log file, which in the case of Apache logs it usually does. In general if you are indexing many years or hundred of gigabytes of archives, you may want to adjust some settings to improve performance, but it will work just fine regardless.
Sadly, I just gave it a go, but it simply indexed the whole large file... 😞
Sweet, thought this would be the case... and my boss was getting me all worried. haha
Thanks for your help.