Is there a maximum number of forwarders that a single indexer can support, or is the limiting factor on the indexer just the amount of data sent by the forwarders? If there is a maximum # of forwarders per indexer, how does that scale in an auto-lb environment with multiple indexers?
I don't know of a theoretical limit. The highest I've seen is 6,000 forwarders reporting to a single indexer. It was a subset of WinEventLogs, so in aggregate the volume was low - about 150GB/day. Splunk does a good job of managing connections, and the overhead is low so long as you don't require SSL. You will have to increase the file descriptor limit (ulimit -n) to allow for all those sockets to be referenced.
Assuming there is a limit somewhere, you make a good point that auto-lb would not help. I'd recommend an intermediate forwarding tier. You would explicitly assign each client forwarder to one of multiple intermediate forwarders, and then have the intermediate forwarders LB amongst the indexing tier. Keep in mind that a forwarder will only be able to LB about 200GB/day of data. You'll also have to be careful which forwarders execute which parts of the parsing pipeline, which can get tricky.
There will be a practical limit imposed by the TCP/IP network stack implementation on the indexer. At a very minimum, there will be a limit on the number of available ports that other servers can connect to, something less than 65,535. There will be lower limits because of reserved ranges, and there may be even lower limits simply because of limitation on the operating system to track a number of connections.
There will be a practical limit imposed by the TCP/IP network stack implementation on the indexer. At a very minimum, there will be a limit on the number of available ports that other servers can connect to, something less than 65,535. There will be lower limits because of reserved ranges, and there may be even lower limits simply because of limitation on the operating system to track a number of connections.
I don't know of a theoretical limit. The highest I've seen is 6,000 forwarders reporting to a single indexer. It was a subset of WinEventLogs, so in aggregate the volume was low - about 150GB/day. Splunk does a good job of managing connections, and the overhead is low so long as you don't require SSL. You will have to increase the file descriptor limit (ulimit -n) to allow for all those sockets to be referenced.
Assuming there is a limit somewhere, you make a good point that auto-lb would not help. I'd recommend an intermediate forwarding tier. You would explicitly assign each client forwarder to one of multiple intermediate forwarders, and then have the intermediate forwarders LB amongst the indexing tier. Keep in mind that a forwarder will only be able to LB about 200GB/day of data. You'll also have to be careful which forwarders execute which parts of the parsing pipeline, which can get tricky.
I see. I would only split the forwarders if I had to, it sounds like between what you and gkanapathy have said is that there isn't a hard limit beyond what the TCP/IP stack and the OS will allow.
You might be sacrificing some performance at search time. The optimum state for distributed search is to have the data evenly dispersed amongst the indexers. If you can split the forwarders in a way that still disperses the data evenly, I would say go for it.
Thanks. It would be helpful to know if anyone seen more than 6k running. We're potentially going to have over 20k forwarders sending to an autolb cluster (4 indexers running linux). We could split the cluster in half if necessary to reduce the total number of forwaders per auto-lb cluster. That would seem easier than an intermediate forwarding tier, right?