Getting Data In

When to add indexer/forwarder

lbogle
Contributor

Hello Splunkers,
I came across a page that answered this once but I can't seem to find it again...
For best practices purposes, what is a good rule of thumb you should follow when deciding to add an indexer? How about a forwarder?
I think I heard that performance will start being impacted when an indexer starts to consume more than 50GB a day and that a new indexer should be added then. This is assuming that the indexer was built to the standard performance baseline.
How do you troubleshoot the forwarder buffers/queues to see if they're getting backed up?
Thanks for any assistance!

0 Karma
1 Solution

martin_mueller
SplunkTrust
SplunkTrust

For hardware capacity planning there's this: docs.splunk.com/Documentation/Splunk/6.1.1/Installation/CapacityplanningforalargerSplunkdeployment and this: http://docs.splunk.com/Documentation/Splunk/6.1.1/Deploy/HardwarecapacityplanningforadistributedSplu... which includes a neat table of rule-of-thumb numbers:

Daily Volume        Number of Search Users  Recommended Indexers Recommended Search Heads
< 2 GB/day              < 2                   1, shared          N/A
2 to 250 GB/day     up to 4                   1, dedicated       N/A
100 to 250 GB/day   up to 8                   2                    1
200 to 300 GB/day   up to 12                  3                    1
300 to 400 GB/day   up to 8                   4                    1
400 to 500 GB/day   up to 16                  5                    2
500 GB to 1 TB/day  up to 24                 10                    2
1 TB to 20 TB/day   up to 100               100                   24
20 TB to 60 TB/day  up to 100               300                   32 

There's also info in there on how to adapt to virtualized environments.

For your specific environment, grab the SoS app and look at your indexer's queues and processors. http://apps.splunk.com/app/748/

Concerning forwarders, look for logs stating that it's hit its thruput limit... and if it hits that frequently consider increasing it in limits.conf.
Adding forwarders usually is the same as adding hosts that produce input data. Cases where you ingest one source with multiple forwarders are rare.

View solution in original post

martin_mueller
SplunkTrust
SplunkTrust

For hardware capacity planning there's this: docs.splunk.com/Documentation/Splunk/6.1.1/Installation/CapacityplanningforalargerSplunkdeployment and this: http://docs.splunk.com/Documentation/Splunk/6.1.1/Deploy/HardwarecapacityplanningforadistributedSplu... which includes a neat table of rule-of-thumb numbers:

Daily Volume        Number of Search Users  Recommended Indexers Recommended Search Heads
< 2 GB/day              < 2                   1, shared          N/A
2 to 250 GB/day     up to 4                   1, dedicated       N/A
100 to 250 GB/day   up to 8                   2                    1
200 to 300 GB/day   up to 12                  3                    1
300 to 400 GB/day   up to 8                   4                    1
400 to 500 GB/day   up to 16                  5                    2
500 GB to 1 TB/day  up to 24                 10                    2
1 TB to 20 TB/day   up to 100               100                   24
20 TB to 60 TB/day  up to 100               300                   32 

There's also info in there on how to adapt to virtualized environments.

For your specific environment, grab the SoS app and look at your indexer's queues and processors. http://apps.splunk.com/app/748/

Concerning forwarders, look for logs stating that it's hit its thruput limit... and if it hits that frequently consider increasing it in limits.conf.
Adding forwarders usually is the same as adding hosts that produce input data. Cases where you ingest one source with multiple forwarders are rare.

martin_mueller
SplunkTrust
SplunkTrust

That depends on the type of input. In that case I'm guessing syslog? Nothing wrong with having multiple syslog sources handled by one UF.
For robustness it's good practice to have syslog-ng or similar daemons receive the data and let the UF read that log file.

0 Karma

lbogle
Contributor

Thanks Martin. Thats excellent info!
So is it not a good practice to have a single universal forwarder consuming multiple sources of input? For example, I have a single universal forwarder forwarding data from multiple firewalls (approx 25).

0 Karma
Get Updates on the Splunk Community!

Federated Search for Amazon S3 | Key Use Cases to Streamline Compliance Workflows

Modern business operations are supported by data compliance. As regulations evolve, organizations must ...

New Dates, New City: Save the Date for .conf25!

Wake up, babe! New .conf25 dates AND location just dropped!! That's right, this year, .conf25 is taking place ...

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud  In today’s fast-paced digital ...