Getting Data In

How to take advantage of a multi-core indexer?

muebel
SplunkTrust
SplunkTrust

My indexer has a Intel Xeon X5570 which has four cores.

http://ark.intel.com/Product.aspx?id=37111

How can I make sure that Splunk is using this multicore capability to its full advantage?

Tags (2)
1 Solution

gkanapathy
Splunk Employee
Splunk Employee

You don't have to do anything with just four cores, other than to ensure the disk is fast enough. (And of course that you are actually providing enough input to keep it busy.) Indexing is performed in a multi-stage parallel pipeline which typically uses up to 2 cores, but can use more. Optimizations of the indexes also run every few seconds if data has been written, and these will consume another thread of activity. Slow disk will cause underutilization for indexing and optimization, since it can only process as fast as it can write.

For most people, searches dominate over indexing as far as dictating CPU usage and requirements. Searches, both background and interactive, typically will consume an entire CPU core while they are running. Thus you will probably need a core for every two or three concurrently logged-in users of the system depending on their usage, plus one or two or more for scheduled searches and summarization.

You will probably find that four cores is not enough for more than 5 or 10 GB/day of data and the typical concomitant search load.

We usually recommend more smaller servers (with 8 or 12 cores) rather than fewer larger ones (16 or 32 cores) to scale. If you have many many cores (say, more than 16), a very high input volume, and an unusually low search load on the data, you may run multiple instances of Splunk on the server (listening on different ports) to increase usage of cores by creating a full new indexing pipeline. This instance would be treated like an distributed instance on another server, though on a single server, you may find it becomes IO or disk bound.

View solution in original post

gkanapathy
Splunk Employee
Splunk Employee

You don't have to do anything with just four cores, other than to ensure the disk is fast enough. (And of course that you are actually providing enough input to keep it busy.) Indexing is performed in a multi-stage parallel pipeline which typically uses up to 2 cores, but can use more. Optimizations of the indexes also run every few seconds if data has been written, and these will consume another thread of activity. Slow disk will cause underutilization for indexing and optimization, since it can only process as fast as it can write.

For most people, searches dominate over indexing as far as dictating CPU usage and requirements. Searches, both background and interactive, typically will consume an entire CPU core while they are running. Thus you will probably need a core for every two or three concurrently logged-in users of the system depending on their usage, plus one or two or more for scheduled searches and summarization.

You will probably find that four cores is not enough for more than 5 or 10 GB/day of data and the typical concomitant search load.

We usually recommend more smaller servers (with 8 or 12 cores) rather than fewer larger ones (16 or 32 cores) to scale. If you have many many cores (say, more than 16), a very high input volume, and an unusually low search load on the data, you may run multiple instances of Splunk on the server (listening on different ports) to increase usage of cores by creating a full new indexing pipeline. This instance would be treated like an distributed instance on another server, though on a single server, you may find it becomes IO or disk bound.

View solution in original post