Monitoring Splunk

How do I track request time for splunkweb?

Lowell
Super Champion

I'm seeing some odd delays in web user interface, and I'm trying to figure out if this is a browser issue or something in splunkweb. Is there any way to figure out how long requests are taking in splunkweb?

It seems like splunkweb is freezing periodically, and this is happening more frequently over time, but I'm looking for some way to quantify this.


Notes and observations:

  1. Using browser-side analysis, I can tell that some HTTP requests are taking many seconds to return. But I'm not 100% sure if this is a client or server issue.
  2. I found request time information in splunkd_access.log, but there doesn't appear to be anything similar in web_access.log.
  3. I've noted that when the web interface is slow, it's hanging aross multiple clients all at the same time; so it doesn't seem to be a session related problem.
  4. I've also seen a very significant CPU spike for the "splunkweb" (python) process whenever this problem is occurring. Often the cpu usage goes to 100% and sharply drops within a few seconds of when the web interface starts responding again.
  5. There is a mechanism that displays the view PERF log message that shows view vs. template time, but that hasn't been helpful in tracking the issue. It appears that the issue is not the initial page load when first opening a view, but instead the unresponsiveness must be with one of the background aync HTTP calls.

I'm running Splunk 4.1.8 on Ubuntu 8.04 (32 bit).

0 Karma
1 Solution

Lowell
Super Champion

Another option: Upgrade to Splunk 4.2.3. There is now a field called spent that contains basically the same information that I manually collected above. (The time is calculated slightly differently, but Splunk's implementation probably involves less overhead.)

View solution in original post

0 Karma

Lowell
Super Champion

Another option: Upgrade to Splunk 4.2.3. There is now a field called spent that contains basically the same information that I manually collected above. (The time is calculated slightly differently, but Splunk's implementation probably involves less overhead.)

0 Karma

Lowell
Super Champion

This is a hack to log the duration of requests in splunkweb. This allows you to (1) log the time of very long requests (any request over a specified number of milliseconds), and (2) log the request processing time for each request in the web_access.log access log.

If you are not at all familiar with python, and/or don't feel comfortable poking around in the splunkweb (cherrypy) code, than you should avoid this hack. You have been warned.

NOTE: (1.) You should backup all files before making any modifications. (2.) Be aware that any splunk upgrade will overwrite your changes. (3.) This is intended to be used to temporarily monitor request processing times, and hasn't been tested on a large scale or for long-term production use. Your millage may vary. (4.) Please read all the instructions before beginning.

Okay, enough disclaimers. On to the fun stuff...

Overview

Basically, this hack involves the following:

  1. Install a simple CherryPy custom tool that captures the start/end times of your request. (Basically you just copy-n-paste into a file.)
  2. Update the core splunkweb startup script (root.py) to import the CherryPy custom request logging module.
  3. Add new entries to the web.conf to enable our custom cherrypy tool.
  4. If desired, modify the customlogmanager.py to add request processing time to each request in the splunkweb_access.log

Steps

Step 1: Create new 'requesttime' python module.

Create a new file $SPLUNK_HOME/lib/python2.6/site-packages/requesttime.py, with the following contents:

# The idea and code borrowed from here:
# http://groups.google.com/group/cherrypy-users/browse_thread/thread/cb520cfa6588b550
import cherrypy, logging, time

class RequestDurationTool(cherrypy.Tool):
    """Tool to provide logging if a request exceeds a configured duration
    (in ms) set by the threshold configuration parameter"""
    CFG_THRESHOLD="tools.request_duration.threshold"
    def __init__(self):
        cherrypy.Tool.__init__(self, 'before_request_body', self._start_request)
        self.logger = logging.getLogger('splunk')
    def _setup(self):
        cherrypy.Tool._setup(self)
        # Attach extra hook to compute timing after both raw processing and I/O
        cherrypy.request.hooks.attach('on_end_resource', self._end_resource)
        cherrypy.request.hooks.attach('on_end_request', self._end_request)
    def _start_request(self, **kwargs):
        cherrypy.request.request_start = time.time()
    def _end_resource(self):
        cherrypy.request.request_stop = time.time()
    def _end_request(self):
        # Elapsed time and threshold is in ms
        elapsed = round((time.time() - cherrypy.request.request_start) * 1000)
        limit = cherrypy.request.config.get(self.CFG_THRESHOLD, None)
        if (limit is not None and elapsed > limit):
            rsc_elapsed = round((cherrypy.request.request_stop -
                                 cherrypy.request.request_start) * 1000)
            self.logger.log(logging.WARNING,
                            'Request duration: %d [%d] ms (%s)' %
                            (elapsed, rsc_elapsed,
                             cherrypy.request.request_line))

cherrypy.tools.request_duration = RequestDurationTool()

Step 2: Update your root.py main application

The root CherryPy script must be updated to import our "requesttime" module. (Make a backup of the original root.py file before starting.) This file is located here:

$SPLUNK_HOME/lib/python2.6/site-packages/splunk/appserver/mrsparkle/root.py

Add the following python block to the top of the root.py script. When you are done, the first few lines should look like this:

#!/usr/bin/python

# CUSTOM HACK:   Import the "requesttime" cherrypy tool
try:
    import requesttime
except Exception, e:
    print "Failed to import requesttime cherrypy request time tracker!!"
    print e
# END OF CUSTOM HACK

Step 3: Enable the request_duration tool in web.conf

In $SPLUNK_HOME/etc/system/local/web.conf, add the following to the [settings] section:

# Enable custom cherrypy toolbox for debugging request processing time
# NOTE:  Comment out these two lines if you experience problems starting splunkweb!!
tools.request_duration.on = True
tools.request_duration.threshold = 5000

The threshold line is optional. If you choose to perform step 4, and you do not want to log long requests separately, then you can exclude this entry.

Note: After a splunk upgrade, the root.py script will be replaced and the above configuration entries could prevent splunkweb from starting. You will see these errors in web_service.log. You will need to redo step 2 after the upgrade, or comment out these entries in web.conf.

Step 4: Enable request duration logging in web_access.log

This step is optional, if you only want to log really long request, then you can skip it. If you want to do any trending, then you'll probably want to capture the timing of all requests.

We will need to update the following python module:

$SPLUNK_HOME/lib/python2.6/site-packages/splunk/appserver/mrsparkle/lib/customlogmanager.py

Towards the bottom of this module, you'll find this line:

self.access_log.log(logging.INFO, self.access_log_format % atoms + ' - ' + get_request_id())

Replace that line with the following block of code:

        try:
            timming = "%dms" % (round((cherrypy.request.request_stop - cherrypy.request.request_start) * 1000))
        except:
            timming = "-"
        self.access_log.log(logging.INFO, self.access_log_format % atoms + ' - ' + get_request_id() + ' ' + timming)

You will need to make sure you have the indentation correct, or the python module will have issues; this could prevent splunkweb from even starting.

You may also want to setup field extractions for the request duration, which can be done by adding the following to your $SPLUNK_HOME/etc/system/local/props.conf config file:

[splunk_web_access]
EXTRACT-request-time = \s(?<req_duration>\d+)ms$
0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...