I came across a page that answered this once but I can't seem to find it again...
For best practices purposes, what is a good rule of thumb you should follow when deciding to add an indexer? How about a forwarder?
I think I heard that performance will start being impacted when an indexer starts to consume more than 50GB a day and that a new indexer should be added then. This is assuming that the indexer was built to the standard performance baseline.
How do you troubleshoot the forwarder buffers/queues to see if they're getting backed up?
Thanks for any assistance!
For hardware capacity planning there's this: docs.splunk.com/Documentation/Splunk/6.1.1/Installation/CapacityplanningforalargerSplunkdeployment and this: http://docs.splunk.com/Documentation/Splunk/6.1.1/Deploy/HardwarecapacityplanningforadistributedSplu... which includes a neat table of rule-of-thumb numbers:
Daily Volume Number of Search Users Recommended Indexers Recommended Search Heads < 2 GB/day < 2 1, shared N/A 2 to 250 GB/day up to 4 1, dedicated N/A 100 to 250 GB/day up to 8 2 1 200 to 300 GB/day up to 12 3 1 300 to 400 GB/day up to 8 4 1 400 to 500 GB/day up to 16 5 2 500 GB to 1 TB/day up to 24 10 2 1 TB to 20 TB/day up to 100 100 24 20 TB to 60 TB/day up to 100 300 32
There's also info in there on how to adapt to virtualized environments.
For your specific environment, grab the SoS app and look at your indexer's queues and processors. http://apps.splunk.com/app/748/
Concerning forwarders, look for logs stating that it's hit its thruput limit... and if it hits that frequently consider increasing it in limits.conf.
Adding forwarders usually is the same as adding hosts that produce input data. Cases where you ingest one source with multiple forwarders are rare.
Thanks Martin. Thats excellent info!
So is it not a good practice to have a single universal forwarder consuming multiple sources of input? For example, I have a single universal forwarder forwarding data from multiple firewalls (approx 25).
That depends on the type of input. In that case I'm guessing syslog? Nothing wrong with having multiple syslog sources handled by one UF.
For robustness it's good practice to have syslog-ng or similar daemons receive the data and let the UF read that log file.