Getting Data In

Large apache log files

mq20123167
New Member

Hello!

I'm new to Splunk and just getting my head around it all.

Our company is already using Splunk and we are considering using it on an apache server to gather web statistics in a similar fashion to AWstats.

We have enabled a log rotation on our server and we have 1 month worth of logs that is rotated. My concern is that once the apache server deletes the logs older then one month then I assume we will no longer be able to be search on that old information through splunk.

Ideally I would like 6-12 months worth of data. We have already racked up 645,000 events in a single month.

If we saved our logs somewhere else and got splunk to review our 6-12 months of data we would be going over a few million events. If splunk the right tool for this job? Can it handle that number of events? Or is it mostly made for short term log analysis?

Tags (1)
0 Karma
1 Solution

Ayn
Legend

First, regarding your concern - your assumption is incorrect, because Splunk doesn't work directly on the source files. What happens when you add a file/directory to be monitored by Splunk is that events are indexed - you could say they're copied to Splunk's index (database). Once that's done, it doesn't matter what happens to the source file. The events are in the index, and will be indefinitely (or at least for as long as you've told Splunk to keep events).

There's really no limit to how many events Splunk can handle. Many use it for analysis of huge amounts of data spanning over several years. There are Splunk deployments out there indexing several terabytes of data each day. For that kind of deployment you obviously can't just put your one so-so specced Splunk indexer, but you can scale your deployment easily by adding more indexers and other Splunk instances as you go.

View solution in original post

Ayn
Legend

First, regarding your concern - your assumption is incorrect, because Splunk doesn't work directly on the source files. What happens when you add a file/directory to be monitored by Splunk is that events are indexed - you could say they're copied to Splunk's index (database). Once that's done, it doesn't matter what happens to the source file. The events are in the index, and will be indefinitely (or at least for as long as you've told Splunk to keep events).

There's really no limit to how many events Splunk can handle. Many use it for analysis of huge amounts of data spanning over several years. There are Splunk deployments out there indexing several terabytes of data each day. For that kind of deployment you obviously can't just put your one so-so specced Splunk indexer, but you can scale your deployment easily by adding more indexers and other Splunk instances as you go.

mq20123167
New Member

Thanks Ayn, appreciate your help with this.

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to December Tech Talks, Office Hours, and Webinars!

❄️ Celebrate the season with our December lineup of Community Office Hours, Tech Talks, and Webinars! ...

Splunk and Fraud

Watch Now!Watch an insightful webinar where we delve into the innovative approaches to solving fraud using the ...

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

You’ve probably heard the latest about AppDynamics joining the Splunk Observability portfolio, deepening our ...