Designing a Splunk Refresh for very non-standard e...

aubine · ‎08-27-2020

Hello all,

I've been using Splunk for the past four years and am loving it. I would like to know from the Splunk Community what you all think of the following configuration setups.

I am currently running a single instance Splunk server with 16 cores, 64 GB of RAM and 7.2k HDD's. I'm monitoring about 50-100 servers (90%+ are different flavors of Linux) which have very low indexing. All servers together amount for about 150-200MB/day. But I also have about 75 users which need access to the dashboards. I inherited the server about four years ago and it was intially deisgned as of a PoC and it eventually got shipped into Production (with no changes). The server is old and is needs to be evergreened. My question is this: If you could design a new system, what would you go with? I have the possibility of using VM's to create a more distributed environment instead of a single instance, but is it worth it? My idea was to throw AMD's new 64 core EPYC chips with similar 64GB of RAM, have a 500GB SATA SSD and 4TB of 7.2k drives for historical searches. I'm curious to see what the Splunk Trust & Community has to say, because I was talking with several people at .conf19 and most of them had not seen this kind of environment before (low amounts of data and high numbers of users). Any ideas or suggestions would be appreciated.

Thanks!

Erick

gcusello · ‎08-28-2020

Hi @aubine,

redesigning you architecture, in my opinion, you should make some checks on your existing infrastructure and then design your new architecture defining the requisites (see below).

Analyzing your actual infrastructure, you could know (using Monitoring Console) how many searches you have in a day and in the peaks.

Using these informations you can define how many CPU's you need.

In addition, you should define if you have HA requisites or not, if yes, you should migrate your infrastructure from a stanb alone to a distributed one with Indexers and/or Search Heads Clusters.

Another parameter to analyze is the IOPS of your storage: Splunk recommends at lest 800 IOPS for the storage, that means at least n.8 15k SAS disks; to reach this you could divide storage in more Indexers.

In addition, you could also analyze if some of your searches can be optimized:

Post Process Searches,
replacing Real Time searches with scheduled reports,
accelerated searches,
Data Models,
etc...

As I said, the main parameters (but there could also be other) to design your architecture are:

HA yes or not,
if yes, full HA (SH and IX) or only Indexers,
number of active users,
number of searches in peak period,
average daily number of searches,
daily volume of data,
retention period,
Real Time Searches (if possible to avoid),
scheduled searches,
storage throughput.

In other words, this isn't a question for the Community, but I think that it needs at least a Splunk Architect or (better) a Splunk PS.

Ciao.

Giuseppe

isoutamo · ‎08-28-2020

If you want to look Splunk Validated Architecture https://www.splunk.com/pdfs/technical-briefs/splunk-validated-architectures.pdf it describes your options as @gcusello said, but it don't give to you any exact HW combinations to deploy.

I totally agree with @richgalloway and @gcusello that you must first understand and document your current usage and future needs. Based on that you can start to define your target architecture.

Remember that if you are going to distributed architecture then you need more cores to indexer layers also. It not make sense if you have lot of cores and memory only on SH( C ) -layer if there haven't been enough resources to server them!

r. Ismo

thambisetty · ‎08-27-2020

Since you have mentioned 75 users and you need to allocate 1 cpu= 1 search for 1 user at any time.
having 75 cpus in box may cost you more. I would suggest you to have search head cluster to balance resources very effectively.

search head cluster with 3 search members and one load balancer.
each search member should have at least 24 cpus. The reason for taking more cpus, we are not sure how many concurrent searches are received by search members.

there should not be any delay while serving to users.

————————————
If this helps, give a like below.

richgalloway · ‎08-27-2020

75 CPUs for 75 users is what I'd call a worst-case scenario. It's unlikely all users will be running searches at the same time. That's why it's important to know how many searches the system needs to be able to run at a time.
BTW, searches includes those run by Splunk in the "background", such as alerts, datamodel and report accelerations, etc.

---
If this reply helps you, Karma would be appreciated.

richgalloway · ‎08-27-2020

A key metric you haven't shared is the number of concurrent searches the system runs. That will factor into the architecture.

---
If this reply helps you, Karma would be appreciated.

Designing a Splunk Refresh for very non-standard environment

Upcoming Webinar: Unmasking Insider Threats with Slunk Enterprise Security’s UEBA

.conf25 technical session recap of Observability for Gen AI: Monitoring LLM ...

A Season of Skills: New Splunk Courses to Light Up Your Learning Journey

Join the Conversation

Designing a Splunk Refresh for very non-standard environment

Upcoming Webinar: Unmasking Insider Threats with Slunk Enterprise Security’s UEBA

.conf25 technical session recap of Observability for Gen AI: Monitoring LLM ...

A Season of Skills: New Splunk Courses to Light Up Your Learning Journey