Getting Data In

Large apache log files

mq20123167
New Member

Hello!

I'm new to Splunk and just getting my head around it all.

Our company is already using Splunk and we are considering using it on an apache server to gather web statistics in a similar fashion to AWstats.

We have enabled a log rotation on our server and we have 1 month worth of logs that is rotated. My concern is that once the apache server deletes the logs older then one month then I assume we will no longer be able to be search on that old information through splunk.

Ideally I would like 6-12 months worth of data. We have already racked up 645,000 events in a single month.

If we saved our logs somewhere else and got splunk to review our 6-12 months of data we would be going over a few million events. If splunk the right tool for this job? Can it handle that number of events? Or is it mostly made for short term log analysis?

Tags (1)
0 Karma
1 Solution

Ayn
Legend

First, regarding your concern - your assumption is incorrect, because Splunk doesn't work directly on the source files. What happens when you add a file/directory to be monitored by Splunk is that events are indexed - you could say they're copied to Splunk's index (database). Once that's done, it doesn't matter what happens to the source file. The events are in the index, and will be indefinitely (or at least for as long as you've told Splunk to keep events).

There's really no limit to how many events Splunk can handle. Many use it for analysis of huge amounts of data spanning over several years. There are Splunk deployments out there indexing several terabytes of data each day. For that kind of deployment you obviously can't just put your one so-so specced Splunk indexer, but you can scale your deployment easily by adding more indexers and other Splunk instances as you go.

View solution in original post

Ayn
Legend

First, regarding your concern - your assumption is incorrect, because Splunk doesn't work directly on the source files. What happens when you add a file/directory to be monitored by Splunk is that events are indexed - you could say they're copied to Splunk's index (database). Once that's done, it doesn't matter what happens to the source file. The events are in the index, and will be indefinitely (or at least for as long as you've told Splunk to keep events).

There's really no limit to how many events Splunk can handle. Many use it for analysis of huge amounts of data spanning over several years. There are Splunk deployments out there indexing several terabytes of data each day. For that kind of deployment you obviously can't just put your one so-so specced Splunk indexer, but you can scale your deployment easily by adding more indexers and other Splunk instances as you go.

mq20123167
New Member

Thanks Ayn, appreciate your help with this.

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...