Getting Data In

Will Splunk index events older than existing indexed events (same source)

aaronnicoli
Path Finder

Hi all,

I have a fairly basic (but confusing) question for you all. Essentially, this is the go...:

For a prod Apache server I manage, I have just created a new index in Splunk. Now, there are numerous logs setup for each Vhost under the single Apache instance. I want to basically drag all the log files into Splunk for indexing but I have realized a few large files exist that have not been rotated...

Now, I pull my data into Splunk by Rsync, then watching the directory (or file), this directory containing all the Apache logs is not yet monitored but, I wish to add it to Splunk after removing entries in these "large" logs, that date back more than 1 month.

Which is fine, but, I don't want to run a restart on Apache now, during business hours. And not doing so will mean the log in it's previous state "huge", will just be synced right back to Splunk (after my 5 min Rsync interval).

So after all that, this is my question:

Can I edit the log and remove old events, then add the file to be indexed by Splunk. Once indexed am I able to start the automatic Rsyncs, copying the large file back to the Splunk server and not have older events indexed?

The important part... once the original events <1 month old, are indexed, when older events appear in the same log >1 month old, will they also be indexed...?

Sorry if this comes across confusing, I understand it will... Just having a bit of a hard time trying to get my situation across in words.

Thanks, Aaron.

Tags (2)
1 Solution

gkanapathy
Splunk Employee
Splunk Employee

Yes, it will work just fine, provided Splunk is reading the timestamps out of the log file, which in the case of Apache logs it usually does. In general if you are indexing many years or hundred of gigabytes of archives, you may want to adjust some settings to improve performance, but it will work just fine regardless.

View solution in original post

aaronnicoli
Path Finder

Yes, it seems to be an issue I am now continually running into in getting my setup together and running.

0 Karma

piebob
Splunk Employee
Splunk Employee

this is a really good question, and i bet a lot of people have the same one.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Yes, it will work just fine, provided Splunk is reading the timestamps out of the log file, which in the case of Apache logs it usually does. In general if you are indexing many years or hundred of gigabytes of archives, you may want to adjust some settings to improve performance, but it will work just fine regardless.

aaronnicoli
Path Finder

Sadly, I just gave it a go, but it simply indexed the whole large file... 😞

0 Karma

aaronnicoli
Path Finder

Sweet, thought this would be the case... and my boss was getting me all worried. haha

Thanks for your help.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...