Getting Data In

Large apache log files

mq20123167
New Member

Hello!

I'm new to Splunk and just getting my head around it all.

Our company is already using Splunk and we are considering using it on an apache server to gather web statistics in a similar fashion to AWstats.

We have enabled a log rotation on our server and we have 1 month worth of logs that is rotated. My concern is that once the apache server deletes the logs older then one month then I assume we will no longer be able to be search on that old information through splunk.

Ideally I would like 6-12 months worth of data. We have already racked up 645,000 events in a single month.

If we saved our logs somewhere else and got splunk to review our 6-12 months of data we would be going over a few million events. If splunk the right tool for this job? Can it handle that number of events? Or is it mostly made for short term log analysis?

Tags (1)
0 Karma
1 Solution

Ayn
Legend

First, regarding your concern - your assumption is incorrect, because Splunk doesn't work directly on the source files. What happens when you add a file/directory to be monitored by Splunk is that events are indexed - you could say they're copied to Splunk's index (database). Once that's done, it doesn't matter what happens to the source file. The events are in the index, and will be indefinitely (or at least for as long as you've told Splunk to keep events).

There's really no limit to how many events Splunk can handle. Many use it for analysis of huge amounts of data spanning over several years. There are Splunk deployments out there indexing several terabytes of data each day. For that kind of deployment you obviously can't just put your one so-so specced Splunk indexer, but you can scale your deployment easily by adding more indexers and other Splunk instances as you go.

View solution in original post

Ayn
Legend

First, regarding your concern - your assumption is incorrect, because Splunk doesn't work directly on the source files. What happens when you add a file/directory to be monitored by Splunk is that events are indexed - you could say they're copied to Splunk's index (database). Once that's done, it doesn't matter what happens to the source file. The events are in the index, and will be indefinitely (or at least for as long as you've told Splunk to keep events).

There's really no limit to how many events Splunk can handle. Many use it for analysis of huge amounts of data spanning over several years. There are Splunk deployments out there indexing several terabytes of data each day. For that kind of deployment you obviously can't just put your one so-so specced Splunk indexer, but you can scale your deployment easily by adding more indexers and other Splunk instances as you go.

mq20123167
New Member

Thanks Ayn, appreciate your help with this.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Data Management Digest – June 2026

Welcome to the June 2026 edition of Data Management Digest! This month’s update is short and sweet, with a ...

Think Like an Architect: Introducing the Splunk Certified Cybersecurity Defense ...

In cybersecurity, defenders respond to threats. Architects design the systems that stop them.    As ...

Index This | What has goals but no motivation?

June 2026 Edition  Hayyy Splunk Education Enthusiasts and the Eternally Curious!   We’re back with this ...