topic Re: indexer sizing and virtualization in Getting Data In

indexer sizing and virtualization

Steve_Litras — Wed, 31 Aug 2011 19:27:15 GMT

Hi -

I'm embarking on a re-organization in my splunk environment. I've come into possession of a couple big x86 boxes (4 socket, 8 core, 256GB RAM), and given that I heard multiple times that indexing is better distributed horizontally across smaller boxes, it leads me to wonder this: If I replace two of my 3 indexers with these boxes, can splunk take advantageof the hardware? Or am I better off partitioning these boxes via virtualization into a number of smaller boxes (but still using local disk resources, etc.)?

Thanks
Steve

Re: indexer sizing and virtualization

RicoSuave — Thu, 01 Sep 2011 12:38:44 GMT

You might want to check out this link http://docs.splunk.com/Documentation/Splunk/4.2.3/Installation/CapacityplanningforalargerSplunkdeployment

The real question is, how much data are you indexing and how many users do you have? I would probably just set up the "big" boxes up as dedicated indexers and the lesser box as a dedicated search head. The biggest consideration is Disk I/O performance. If you have a lot of users, it might make more sense to make one of the big boxes a dedicated search head instead to accommodate more concurrent searches.

Re: indexer sizing and virtualization

lguinn2 — Sat, 03 Sep 2011 17:33:32 GMT

Splunk can definitely use all the resources of the box. It is high-performance multi-threaded code, I would not virtualize the box; this will not help Splunk use it better. The overhead of virtualization will not be offset by any performance improvements. (I am a VMware-certified VCP4, if that helps my credibility on this one.)

Think of the Splunk advice as "Spend your money on lots of average-sized servers, rather than a few giant servers." Why?

average-sized "commodity" servers are relatively cheap. When you start buying "extra large" memory, etc., it tends to come at a premium price. So "commodity" servers are the most effective way to spend the money. (But your servers were free!)
more boxes gives you more concurrent IO - for Splunk indexers, IO is usually the bottleneck
Splunk is designed to be distributed; adding more indexers increases search performance nearly linearly

Lucky you - I wish someone would give me some physical servers (that weren't ready for the junk heap)! I would probably make them all indexers, unless my search head was overloaded. Adding indexers = reduced search time = more searches per minute = can serve more users effectively.