topic Re: Splunk in really BIG environment in Alerting

Splunk in really BIG environment

Katsche — Fri, 09 Sep 2011 07:33:08 GMT

Hi all,

I was told to evaluate Splunk to run in a really BIG company. We are talking about a big amount of log files each day. My boss asked me to clarify some of his concerns and I thought this board could be a starting point.

I tested Splunk (Free) on my local computer and found out that 1GB of our log files result in 250MB of Index. Is this normal for Splunk? If this were true we would reach the capacity of every reasonable hdd setup very soon. So I am asking myself:

Is there a possibility to minimize splunk's Index size?
Are there special options to index a bis amount of log files very fast? Or is splunk just as fast as the hardware it is given?
Is there a possibility to keep an eye on the network traffic so that there are no disruptions in real time operation? There will be lot of traffic to get all the data to the indexer. Can this be done by forwarders? I didn't have the time to read all the docs yet.

I think there are more questions to asked when running splunk "BIG" but until now this are the most important questions for us.

Thank you in advance. Kind reagrds,
Katsche

Re: Splunk in really BIG environment

Ayn — Fri, 09 Sep 2011 08:13:50 GMT

I really think the best thing to do is to contact sales@splunk.com and book a meeting where you can discuss these questions more thoroughly. That said, there are good sections in the docs that you should read (like http://docs.splunk.com/Documentation/Splunk/latest/Installation/CapacityplanningforalargerSplunkdeployment and http://docs.splunk.com/Documentation/Splunk/latest/Installation/HowHowmuchspaceyouwillneed ).

Re: Splunk in really BIG environment

Katsche — Fri, 09 Sep 2011 09:10:37 GMT

I will check your links. Thank you very much. To schedule an appointment is a good idea, too.

Re: Splunk in really BIG environment

yannK — Fri, 09 Sep 2011 16:49:53 GMT

quick answer :

1 - Is there a possibility to minimize splunk's Index size?

Not really, the indexes are as much compact as possible already, however you can improve a bit if you have recurrent patterns see http://docs.splunk.com/Documentation/Splunk/4.2.3/Data/Improvedatacompressionwithsegmentation

However, if your question is "what should be my disk strategy to store a large amount of data", you should look at :

filter data at index time (in order to drop useless events before)
data retention policy, by storing your data in different indexes with different life cycle (you can specify per index a maximum size and maximum retention period) see http://www.splunk.com/wiki/Deploy:BucketRotationAndRetention
secondary storage (the homePath and coldPath for the buckets of each indexes can be located on separate file systems)
cluster of indexers sharing the same license volume (more servers, more storage capacity, and better performances)

2 - Are there special options to index a bis amount of log files very fast? Or is splunk just as fast as the hardware it is given?

You can improve your indexing speed with better hardware (ie : no VM, faster cpu, mem, SSD drives for the hot buckets) or by clustering (adding more indexers).
And If one particular input is critical, you can forward them to dedicated indexers.
At the end you can be searching over all your indexers (search-head + X*search-peers)

3 - Is there a possibility to keep an eye on the network traffic so that there are no disruptions in real time operation? There will be lot of traffic to get all the data to the indexer. Can this be done by forwarders?

FYI, the Universal Forwarders and Light Weight Forwarder have a a default limitation of 256KBps on the network traffic to keep a low profile, but this can be remove easily.

You can rely on the Deployment-monitor app to detect if a forwarder is not sending data because there is not data to send, or because something is wrong (down, blocked, queuing...)

But the other approach is to setup monitoring/alerting on :

the incoming traffic on the indexers (metrics.log or license_usage.log )
on the latency of the events (index time vs event timestamp)
or event easier, on the events themselves If you know that serverA sends a particular event B every minute, you can setup alerting on that event. The difficulty is to define what is an anomaly.