Deployment Architecture

Deployment Server: Best practices for scaling

Contributor

We've currently got just one DS in our environment handling about 1100 forwarders. Performance has been pretty stable at this level, but I'm wondering what the cap is. We may be tasked with adding another 5000-6000 more UFs to capture logs from our workstations (thanks, Splunk, for removing Win7 support in 6.5+ by the way /s). I'm wondering how other admins balance their clients vs multiple (if necessary) deployment servers.

I'm wondering if it may be more feasible to configure Windows Event Collector and stand up a couple 2012 boxes, then collect using the UF/HF from there for this new set of clients.

Super Champion

We have tested with 5000 clients and with 30mins polling interval. Works smooth.
The only problem is the time taken to deploy the code config into all the configs (which may be Ok in most of the customers), but you need to think into it. A good link is this. So for 5000 clients, for a 50MB apps it takes 50mins to apply the code.
Also when you extend try to make the serverclass.conf may become unmanageable. WE do script it with csv files rather than putting wildcards to prevent conflicts

Path Finder

In case you use intermediate Heavy Forwarders you can leverage on them also as intermediate Deployment Servers.
See picture below as an high level overview:

alt text
If you need additional details let me know.

New Member

Can you please explain a bit how to use an intermediate deployment server.

  • Which directories on the heavy forwarder are used? (only $SPLUNK_HOME/etc/apps for both, receiving and pushing apps?)
  • Usage of repositoryLocation and/or targetRepositoryLocation setting (only repositoryLocation = $SPLUNK_HOME/etc/apps?)
  • Can targetRepositoryLocation be used on serverClass level? According to documentation, this is not possible. But needed, if connecting heavy and universal forwarders to the central deployment server (so, only repositoryLocation can be used in this case?)
  • stateOnClient is used to enable the app contents only on the target universal forwarder?
  • The serverclass.conf for the heavy forwarder is deployed by a seperate app?
0 Karma

Contributor

This is a good point, and this is how we forward logs out of the DMZ, but I hadn't considered using them as Deployment Servers in our internal network (duh). Thanks for the additional info!

0 Karma

Splunk Employee
Splunk Employee

Off-topic side note: If you do use intermediary forwarders, make sure you have at least 2x forwarders than indexers to prevent potential issues with even event distribution across your indexers. Often, intermediary forwarding tiers introduce such issues, if not properly architected.
You can employ parallel pipelines on your forwarder to make them behave like multiple instances.

0 Karma

Contributor

Would you still experience the uneven distribution of events even with forceTimebasedAutoLB enabled in outputs.conf on the intermediate forwarder?

We're rolling 6 indexers and have one IF in the DMZ for ACL reasons, and one internally as a syslog relay (with rsyslog). It doesn't appear to be an issue at this point (ingesting 250GB-300GB/day) but that doesn't necessarily mean it won't cause us issues as we expand.

0 Karma

Splunk Employee
Splunk Employee

Hello coltwanger,
we have tested DS performance on a 2 CPU box with 10000 deployment clients (10 Apps, 10 server classes). You should be fine, but a lot depends on how many apps and serverclasses you have and what your expectations are with respect to deployment times.
Many larger customers also tune their phoneHomeInterval to a larger number to help with Deployment server load.

We don't generally recommend going the WEC server route, because it requires custom processing to preserve the source hostname and causes some of our TAs for Windows-related apps to not work properly. I would stay away from that if you can.

HTH!