Getting Data In

Will Splunk index events older than existing indexed events (same source)

aaronnicoli
Path Finder

Hi all,

I have a fairly basic (but confusing) question for you all. Essentially, this is the go...:

For a prod Apache server I manage, I have just created a new index in Splunk. Now, there are numerous logs setup for each Vhost under the single Apache instance. I want to basically drag all the log files into Splunk for indexing but I have realized a few large files exist that have not been rotated...

Now, I pull my data into Splunk by Rsync, then watching the directory (or file), this directory containing all the Apache logs is not yet monitored but, I wish to add it to Splunk after removing entries in these "large" logs, that date back more than 1 month.

Which is fine, but, I don't want to run a restart on Apache now, during business hours. And not doing so will mean the log in it's previous state "huge", will just be synced right back to Splunk (after my 5 min Rsync interval).

So after all that, this is my question:

Can I edit the log and remove old events, then add the file to be indexed by Splunk. Once indexed am I able to start the automatic Rsyncs, copying the large file back to the Splunk server and not have older events indexed?

The important part... once the original events <1 month old, are indexed, when older events appear in the same log >1 month old, will they also be indexed...?

Sorry if this comes across confusing, I understand it will... Just having a bit of a hard time trying to get my situation across in words.

Thanks, Aaron.

Tags (2)
1 Solution

gkanapathy
Splunk Employee
Splunk Employee

Yes, it will work just fine, provided Splunk is reading the timestamps out of the log file, which in the case of Apache logs it usually does. In general if you are indexing many years or hundred of gigabytes of archives, you may want to adjust some settings to improve performance, but it will work just fine regardless.

View solution in original post

aaronnicoli
Path Finder

Yes, it seems to be an issue I am now continually running into in getting my setup together and running.

0 Karma

piebob
Splunk Employee
Splunk Employee

this is a really good question, and i bet a lot of people have the same one.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Yes, it will work just fine, provided Splunk is reading the timestamps out of the log file, which in the case of Apache logs it usually does. In general if you are indexing many years or hundred of gigabytes of archives, you may want to adjust some settings to improve performance, but it will work just fine regardless.

aaronnicoli
Path Finder

Sadly, I just gave it a go, but it simply indexed the whole large file... 😞

0 Karma

aaronnicoli
Path Finder

Sweet, thought this would be the case... and my boss was getting me all worried. haha

Thanks for your help.

0 Karma
Get Updates on the Splunk Community!

Technical Workshop Series: Splunk Data Management and SPL2 | Register here!

Hey, Splunk Community! Ready to take your data management skills to the next level? Join us for a 3-part ...

Spotting Financial Fraud in the Haystack: A Guide to Behavioral Analytics with Splunk

In today's digital financial ecosystem, security teams face an unprecedented challenge. The sheer volume of ...

Solve Problems Faster with New, Smarter AI and Integrations in Splunk Observability

Solve Problems Faster with New, Smarter AI and Integrations in Splunk Observability As businesses scale ...