I was told to evaluate Splunk to run in a really BIG company. We are talking about a big amount of log files each day. My boss asked me to clarify some of his concerns and I thought this board could be a starting point.
I tested Splunk (Free) on my local computer and found out that 1GB of our log files result in 250MB of Index. Is this normal for Splunk? If this were true we would reach the capacity of every reasonable hdd setup very soon. So I am asking myself:
I think there are more questions to asked when running splunk "BIG" but until now this are the most important questions for us.
Thank you in advance. Kind reagrds,
I really think the best thing to do is to contact email@example.com and book a meeting where you can discuss these questions more thoroughly. That said, there are good sections in the docs that you should read (like http://docs.splunk.com/Documentation/Splunk/latest/Installation/CapacityplanningforalargerSplunkdepl... and http://docs.splunk.com/Documentation/Splunk/latest/Installation/HowHowmuchspaceyouwillneed ).
quick answer :
1 - Is there a possibility to minimize splunk's Index size?
Not really, the indexes are as much compact as possible already, however you can improve a bit if you have recurrent patterns see http://docs.splunk.com/Documentation/Splunk/4.2.3/Data/Improvedatacompressionwithsegmentation
However, if your question is "what should be my disk strategy to store a large amount of data", you should look at :
2 - Are there special options to index a bis amount of log files very fast? Or is splunk just as fast as the hardware it is given?
You can improve your indexing speed with better hardware (ie : no VM, faster cpu, mem, SSD drives for the hot buckets) or by clustering (adding more indexers).
And If one particular input is critical, you can forward them to dedicated indexers.
At the end you can be searching over all your indexers (search-head + X*search-peers)
3 - Is there a possibility to keep an eye on the network traffic so that there are no disruptions in real time operation? There will be lot of traffic to get all the data to the indexer. Can this be done by forwarders?
FYI, the Universal Forwarders and Light Weight Forwarder have a a default limitation of 256KBps on the network traffic to keep a low profile, but this can be remove easily.
You can rely on the Deployment-monitor app to detect if a forwarder is not sending data because there is not data to send, or because something is wrong (down, blocked, queuing...)
But the other approach is to setup monitoring/alerting on :