I'm seeing some odd delays in web user interface, and I'm trying to figure out if this is a browser issue or something in splunkweb. Is there any way to figure out how long requests are taking in splunkweb?
It seems like splunkweb is freezing periodically, and this is happening more frequently over time, but I'm looking for some way to quantify this.
Notes and observations:
splunkd_access.log, but there doesn't appear to be anything similar in
PERFlog message that shows view vs. template time, but that hasn't been helpful in tracking the issue. It appears that the issue is not the initial page load when first opening a view, but instead the unresponsiveness must be with one of the background aync HTTP calls.
I'm running Splunk 4.1.8 on Ubuntu 8.04 (32 bit).
This is a hack to log the duration of requests in splunkweb. This allows you to (1) log the time of very long requests (any request over a specified number of milliseconds), and (2) log the request processing time for each request in the
web_access.log access log.
If you are not at all familiar with python, and/or don't feel comfortable poking around in the splunkweb (cherrypy) code, than you should avoid this hack. You have been warned.
NOTE: (1.) You should backup all files before making any modifications. (2.) Be aware that any splunk upgrade will overwrite your changes. (3.) This is intended to be used to temporarily monitor request processing times, and hasn't been tested on a large scale or for long-term production use. Your millage may vary. (4.) Please read all the instructions before beginning.
Okay, enough disclaimers. On to the fun stuff...
Basically, this hack involves the following:
root.py) to import the CherryPy custom request logging module.
web.confto enable our custom cherrypy tool.
customlogmanager.pyto add request processing time to each request in the
Create a new file
$SPLUNK_HOME/lib/python2.6/site-packages/requesttime.py, with the following contents:
# The idea and code borrowed from here: # http://groups.google.com/group/cherrypy-users/browse_thread/thread/cb520cfa6588b550 import cherrypy, logging, time class RequestDurationTool(cherrypy.Tool): """Tool to provide logging if a request exceeds a configured duration (in ms) set by the threshold configuration parameter""" CFG_THRESHOLD="tools.request_duration.threshold" def __init__(self): cherrypy.Tool.__init__(self, 'before_request_body', self._start_request) self.logger = logging.getLogger('splunk') def _setup(self): cherrypy.Tool._setup(self) # Attach extra hook to compute timing after both raw processing and I/O cherrypy.request.hooks.attach('on_end_resource', self._end_resource) cherrypy.request.hooks.attach('on_end_request', self._end_request) def _start_request(self, **kwargs): cherrypy.request.request_start = time.time() def _end_resource(self): cherrypy.request.request_stop = time.time() def _end_request(self): # Elapsed time and threshold is in ms elapsed = round((time.time() - cherrypy.request.request_start) * 1000) limit = cherrypy.request.config.get(self.CFG_THRESHOLD, None) if (limit is not None and elapsed > limit): rsc_elapsed = round((cherrypy.request.request_stop - cherrypy.request.request_start) * 1000) self.logger.log(logging.WARNING, 'Request duration: %d [%d] ms (%s)' % (elapsed, rsc_elapsed, cherrypy.request.request_line)) cherrypy.tools.request_duration = RequestDurationTool()
The root CherryPy script must be updated to import our "requesttime" module. (Make a backup of the original
root.py file before starting.) This file is located here:
Add the following python block to the top of the
root.py script. When you are done, the first few lines should look like this:
#!/usr/bin/python # CUSTOM HACK: Import the "requesttime" cherrypy tool try: import requesttime except Exception, e: print "Failed to import requesttime cherrypy request time tracker!!" print e # END OF CUSTOM HACK
$SPLUNK_HOME/etc/system/local/web.conf, add the following to the
# Enable custom cherrypy toolbox for debugging request processing time # NOTE: Comment out these two lines if you experience problems starting splunkweb!! tools.request_duration.on = True tools.request_duration.threshold = 5000
threshold line is optional. If you choose to perform step 4, and you do not want to log long requests separately, then you can exclude this entry.
Note: After a splunk upgrade, the
root.pyscript will be replaced and the above configuration entries could prevent splunkweb from starting. You will see these errors in
web_service.log. You will need to redo step 2 after the upgrade, or comment out these entries in
This step is optional, if you only want to log really long request, then you can skip it. If you want to do any trending, then you'll probably want to capture the timing of all requests.
We will need to update the following python module:
Towards the bottom of this module, you'll find this line:
self.access_log.log(logging.INFO, self.access_log_format % atoms + ' - ' + get_request_id())
Replace that line with the following block of code:
try: timming = "%dms" % (round((cherrypy.request.request_stop - cherrypy.request.request_start) * 1000)) except: timming = "-" self.access_log.log(logging.INFO, self.access_log_format % atoms + ' - ' + get_request_id() + ' ' + timming)
You will need to make sure you have the indentation correct, or the python module will have issues; this could prevent splunkweb from even starting.
You may also want to setup field extractions for the request duration, which can be done by adding the following to your
$SPLUNK_HOME/etc/system/local/props.conf config file:
[splunk_web_access] EXTRACT-request-time = \s(?<req_duration>\d+)ms$
Another option: Upgrade to Splunk 4.2.3. There is now a field called
spent that contains basically the same information that I manually collected above. (The time is calculated slightly differently, but Splunk's implementation probably involves less overhead.)