Deployment Architecture

How to size and grow a Splunk deployment in a small shop?

Explorer

Hello,

I've been using Splunk for less than a year and I'm looking for real-world insight on how to size and grow a Splunk deployment. I've read the Splunk Capacity Planning manual and the admin guides but would like to hear from people who have done it.

My department has a Splunk Enterprise server on a small isolated LAN consisting of about 40 Windows clients with the Universal Forwarder. The server indexes about 4 GB a day. We get about 10 daily reports from it each morning. Other than that, I logon to it a few times a week and run a few searches. So it's lightly used in terms of search. It works fine.

We have another isolated LAN about 400 clients, 200 Windows and 200 Linux where we are going to deploy Splunk. We plan to purchase one server for it with:

  • 2 processors
  • 10 cores each
  • 32 GB of RAM
  • disks capable of meeting the 800 IOPS requirements.

I'm estimating the server will index about 20-40 GB a day. We will have about 10 daily reports emailed to us. Other than that, it will be lightly used for search by a few people.

If we do buy one server and find it's not sufficient, how do you recommend we add another server properly to handle the load? Do we cluster them or have separates servers, one for Windows and one for Linux? Actually, the Linux clients will send to a syslog server. Then the syslog server will send to Splunk. Still looking into how that works.

Again, I'd appreciate any recommendations.

Thanks in advance,

Greg

0 Karma
1 Solution

Splunk Employee
Splunk Employee

Hi Greg,
a server with reference specs (12/12/800) is able to index the amount of data you are processing with ease. Unless you are concerned about availability, I see no need for you to worry about scaling out from a data ingest perspective.
Your intended server has 20 cores, of which I am estimating you use maybe two or three cores max for indexing (if that). The remaining cores can be used for search, giving you about somewhere between 22 and 24 concurrent searches. Your described search workload does not come close to that either.

So, unless you are getting to a daily indexing volume of upwards of 200GB/day, you will be fine with the hardware you have in place.

Your approach of handling syslog data streams via a syslog server is best practice. Configure your syslog server to break out various sources into separate files/directories and monitor those directories using a universal forwarder. This allows you to assign meaningful sourcetypes to the syslog data sources and gives you the other benefits of using a forwarder.

Hope this helps!

View solution in original post

SplunkTrust
SplunkTrust

Hi Greg,

consider planning for a indexer cluster consisting of 2 indexers with a replication-factor of "2" and a search-factor of "2".
Use one server (not an indexer) for the role of one dedicated searchhead. You seem to not have a lot of searches running or users doing searches, so one searchhead should suffice.
In an indexer-cluster-deployment you also should consider using one node for "master" functionallity.

Indexer-Clusters have the advantage of data-redundancy, aswell as search-load balancing and the ability to scale your environment easily to your needs.

Your environment should look like the following:
One standalone Master Instance that manages the indexer-cluster.
Two Indexers, clustered with RF 2 and SF 2.
One standalone Search-Head that uses the distributed search functionality to search across your indexer-cluster.

Buy one server, install a hypervisor on it and host 4 VM's.
If you would aks me about my favorite OS for splunk servers I would prefer linux.

Hope this helps you any further.
If you have any further questions, don't hesitate to tell me.

Best regards,
pyro_wood

0 Karma

Explorer

Thanks for the reply. That is a good recommendation. I'm going to consider it. There are two obstacles--the budget for the server and my limited experience with Splunk, not to mention the challenge of setting up the multi-server environment with a deadline closing in on me.

0 Karma

Splunk Employee
Splunk Employee

Hi Greg,
a server with reference specs (12/12/800) is able to index the amount of data you are processing with ease. Unless you are concerned about availability, I see no need for you to worry about scaling out from a data ingest perspective.
Your intended server has 20 cores, of which I am estimating you use maybe two or three cores max for indexing (if that). The remaining cores can be used for search, giving you about somewhere between 22 and 24 concurrent searches. Your described search workload does not come close to that either.

So, unless you are getting to a daily indexing volume of upwards of 200GB/day, you will be fine with the hardware you have in place.

Your approach of handling syslog data streams via a syslog server is best practice. Configure your syslog server to break out various sources into separate files/directories and monitor those directories using a universal forwarder. This allows you to assign meaningful sourcetypes to the syslog data sources and gives you the other benefits of using a forwarder.

Hope this helps!

View solution in original post

Explorer

Thanks for the reply. I appreciate it. It's good to know that the server can handle the load.