Splunk Dev

Converting CherryPy To A Forking Only Web Server

ngift
Engager

Hi,

I am in a unique situation of having a 24 core box with 64GB's of RAM as a Splunk Search head. Giving the nature of how Python's threading works[1], [2], has anyone converted the default CherryPy configuration to use processes instead of threads. In the past, with other Python web frameworks, I have used mod_wsgi with Apache prefork or MPM to accomplish this.

In looking through the CherryPy documentation, you can't disable all threading, but it appears reasonable to fork all requests, or use the multiprocessing module to create a process pool that requests talk to. Of course, another way to really verify what is going on, is to use a mod_wsgi monitoring middleware to time the whole request and response cycle. I am somewhat skeptical, given reading through Python core's bug report on threading, that things are always obvious with threading, even with only I/O bound requests.

My main

  1. http://www.dabeaz.com/python/GIL.pdf
  2. http://bugs.python.org/issue7946

noahgift
Explorer

If that is the case, then I wonder if it makes sense for customers to run virtual machines to use the extra procs then. I can't get that machine to use more then, say 5 procs.

0 Karma

noahgift
Explorer

I see. In this case this is only the search head that has 24 cores, and I am assuming the indexer is doing most of the work anyway, in reading through your Splunk presentation. So really this box is waiting for Network I/O, because it forks a few splunkd instances, which then makes REST calls to the indexer.

Due to the horizontal scaling architecture of Splunk, it is subtle to figure out how you can exactly speed things up. In our case what we really want to speed up is the number of events per second piped into a timechart. Will create another question about this.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Each search process will consume one core, and runs as a separate process. If your are unable to use up all CPUs with multiple searches running in parallel, then your bottleneck is probably disk I/O, which won't be improved by running more instances.

We generally recommend horizontal scaling using 8-core servers each with independent disk I/O subsystems for this reason.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

I'd say this is mostly a waste of time. The Splunk Web interface/SplunkWeb/CherryPy consumes an insignificant amount of resource compared to the Splunkd process(es) and any kind of load on the machine will use machine resources up running searches (via multiple forked Splunkd processes, one per search) well before the web interface becomes a bottleneck.

Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...