Dashboards & Visualizations

pdfserver 'The read operation timed out' on dashboard rendering

mtnnid
Engager

Hi

I have set-up a scheduled pdf report to be emailed of a complex dashboard with several graphs, unfortunately I keep getting a timeout error
I have increased the timeout on the pdfserver in the pdf_server.conf file to 600s which hasn't helped.
firefox_timeout = 600

The dashboard takes about 2 minutes to display. A simple dashboard with a single graph works perfectly so I'm sure the rest of the config is okay.

What I did notice however is that I receive the email, with the following error about a minute after the report was scheduled:
Scheduled view delivery.
A PDF snapshot has been generated for the view: splunk_license_usage.

An error occurred while generating a PDF of this report:
Failed to generate PDF: Appserver failed to dispatch report request to https://10.217.225.93:8089/services/pdfserver/renderpdf: Splunkd daemon is not responding: ('The read operation timed out',)

It would seem there's another timeout occurring between the application server and the pdfserver?
Any ideas where that needs to be set?

Here's the log entry for the report creation, from python.log on the pdfserver:

2012-01-31 11:15:16,343 INFO    pdfhandler:646 - Starting PDF App Renderer Version 1.3
2012-01-31 11:15:16,343 INFO Starting PDF App Renderer Version 1.3
2012-01-31 11:15:16,362 INFO    xvfb:45 - Started new X server. Now 1 active
2012-01-31 11:15:16,362 INFO Started new X server. Now 1 active
2012-01-31 11:15:16,363 INFO    pdfhandler:558 - Executing /app/splunk/etc/apps/pdfserver/bin/firefox-x86_64/firefox -print http://nidsplunk01:8000/app/splunk_license_usage/splunk_license_usage -printmode pdf -printfile /tmp/tmpONGvN9 -title Splunk Alert: splunk_license_usage -orientation portrait -paperwidth 219 -paperheight 297 -mode splunk -timeout 600 -windowsize 793x768 -footer_right Generated by Splunk at &D -cookie session_id_8000=9c5513f742a59d0661e61216c98950ce1a238531
2012-01-31 11:15:16,363 INFO Executing /app/splunk/etc/apps/pdfserver/bin/firefox-x86_64/firefox -print http://nidsplunk01:8000/app/splunk_license_usage/splunk_license_usage -printmode pdf -printfile /tmp/tmpONGvN9 -title Splunk Alert: splunk_license_usage -orientation portrait -paperwidth 219 -paperheight 297 -mode splunk -timeout 600 -windowsize 793x768 -footer_right Generated by Splunk at &D -cookie session_id_8000=9c5513f742a59d0661e61216c98950ce1a238531
2012-01-31 11:15:16,383 INFO    xvfb:205 - Assigned DISPLAY :8
2012-01-31 11:15:16,383 INFO Assigned DISPLAY :8
2012-01-31 11:15:16,384 INFO    xvfb:115 - Starting X Server: ['/usr/bin/Xvfb', ':8', '-screen', '0', '3000x768x24', '-nolisten', 'tcp']
2012-01-31 11:15:16,384 INFO Starting X Server: ['/usr/bin/Xvfb', ':8', '-screen', '0', '3000x768x24', '-nolisten', 'tcp']
2012-01-31 11:15:16,384 INFO    xvfb:116 - Starting X Server env: {'XAUTHORITY': '/app/splunk/var/run/splunk/xvfb/xauth-8/Xauthority'}
2012-01-31 11:15:16,384 INFO Starting X Server env: {'XAUTHORITY': '/app/splunk/var/run/splunk/xvfb/xauth-8/Xauthority'}
2012-01-31 11:15:18,387 INFO    xvfb:122 - X Started
2012-01-31 11:15:18,387 INFO X Started
2012-01-31 11:25:24,490 INFO    pdfhandler:628 - Generated pdf file.  bytes=57244 time=605.94
2012-01-31 11:25:24,490 INFO Generated pdf file.  bytes=57244 time=605.94
2012-01-31 11:25:24,754 WARNING pdfhandler:749 - Firefox timed out while waiting for the report to be rendered - Incomplete snapshot generated
2012-01-31 11:25:24,754 WARNING Firefox timed out while waiting for the report to be rendered - Incomplete snapshot generated
2012-01-31 11:25:24,771 WARNING xvfb:132 - Stopping X Server 8
2012-01-31 11:25:24,771 WARNING Stopping X Server 8

oestreicher
Explorer

Thank you sgarvin55 for the details about this workaround.

It still seems to be an issue with current versions. Does anyone has seen this behaviour for scheduled reports based on complex dashboards which obviously make use of splunkweb as client as well? These reports apparently make GET requests instead of POST as I found them clearly in the logfiles at scheduled time:

127.0.0.1 - splunk-system-user [26/Oct/2013:16:00:10.070 +0200] "GET /services/pdfgen/render?now=1382796000&owner=admin&input-dashboard=my_report_name&namespace=my_app_name&paper-size=a4 HTTP/1.0" 200 131705 - - - 3156421ms

As we can see they have a long execution time for almost 1h. I'm also playing with the SPLUNKD_CONNECTION_TIMEOUT setting right now... The odd thing is that the same reports sometimes run fine and sometimes just timeout with an HTTP 400 bad request:

127.0.0.1 - splunk-system-user [29/Oct/2013:16:04:14.279 +0100] "GET /services/pdfgen/render?owner=admin&input-dashboard=my_report_name&now=1383058800&namespace=my_app_name&paper-size=a4 HTTP/1.0" 400 250 - - - 127281ms

btw: you should divide the spent time in your search by 1000 to get the correct amount of seconds 😛

regards
alex

0 Karma

sgarvin55
Splunk Employee
Splunk Employee

--- Update ---

In Splunk 6 you no longer have to edit $SPLUNK_HOME/lib/python2.7/site-packages/splunk/rest/__init__.py you can set SplunkdConnectionTimeout in web.conf. -> web.conf documentation

--- Update end ---

Upon upgrading from version 4.2 to 4.3, the default time out of 30 seconds for HTTP client connections initiated by Splunkweb isn’t enough for splunkd's renderpdf REST endpoint to render a complete report for some dashboards or scheduled searches.

When this timeout occurs you will get a message in python.log similar to :

2012-02-19 07:01:00,444 ERROR An error occurred while generating a PDF of this report: Failed to generate PDF: Appserver failed to dispatch report request to /services/pdfserver/renderpdf: Splunkd daemon is not responding: ('The read operation timed out',)

Steps to check for PDF timeouts:

Here are a few steps required to verify that you are encountering this issue.

  • Search for accesses to the PDF rendering REST endpoint :

    index=_internal source=*splunkd_access.log "POST /services/pdfserver/renderpdf"

(Alternatively, you can grep splunkd_access.log for POST.*renderpdf)

If you find requests with a status code of 200 and a response time of more than 30secs, then you are in all likelihood a victim of this issue. Example of a matching request:

POST /services/pdfserver/renderpdf HTTP/1.1" 200 302 - - - 366747ms

Note that you are in this situation only if you see a status code of 200, which means the request was fulfilled. A status code of 500 means there was an internal error or unexpected condition and is not a symptom of this particular issue.

  • Make sure that the report generation request to splunkweb is timing out due to splunkd's renderpdf REST endpoint response time, by running the following search:

    index=_internal source=*web_access.log "POST /*/report/"

Steps to work around the issue:

The following steps ensure that there will be adequate time for the renderpdf REST endpoint to generate a PDF report before the HTTP connection between Splunkweb (client) and splunkd (server) times out.

IMPORTANT: Do not attempt to deploy this work-around unless you have thoroughly verified that your symptoms match those described just above. Failing to follow this directive may cause further issues with your PDF server. If you are not sure that this is the problem you are hitting, or uncomfortable with the work-around steps, contact Splunk Support.

  • See if there are requests to REST endpoints other than renderpdf which take more than 30 seconds :

    index=_internal source=*splunkd_access.log spent>30000

You should find that only requests to the rtstearch or renderpdf REST endpoints take longer than 30000ms.

  • Measure the duration of the longest renderpdf request with the following search :

    index=_internal source=*splunkd_access.log "POST /services/pdfserver/renderpdf" | stats max(eval(spent/1024)) AS "Max request time (seconds)"

  • In order to work around this issue, we have to increase the time out used by Splunkweb as an HTTP client in the file `$SPLUNK_HOME/lib/python2.7/site-packages/splunk/rest/__init__.py. This connection timeout defaults to 30 seconds. You will need to change this value to accommodate the longest duration needed to complete the appserver render process.

  1. Stop Splunkweb
  2. Make a copy of __init__.py, under path $SPLUNK_HOME/lib/python2.7/site-packages/splunk/rest
  3. Download the patched __init__.py and use it to replace $SPLUNK_HOME/lib/python2.7/site-packages/splunk/rest/__init__.py
  4. Restart Splunkweb

I had one customer that changed this value to 100 which worked for them, but the needed value will differ from installation to installation.

Kudos to Becky, Mathew and everyone on the Splunk team who worked on this issue

chris
Motivator

I updated the answer for Splunk 6. Hope thats ok for ypu sgarvin55

0 Karma

kallu
Communicator

Are you sure you have followed instructions in the answer above and not just bumped your SPLUNKD_CONNECTION_TIMEOUT to 300? 300 seconds is just an example (that should be fine for most cases) but in your logs there seems to be some very long response times >300000ms.

mukherjee_mk
Explorer

Team,

I am still struggling with the pdf reports generated out of the Splunk WAS Plugin. I am getting the emails and I see the report placeholders, but they are empty 😞

alt text

The python.log says -

2012-07-25 14:35:59,270 INFO    pdfhandler:628 - Generated pdf file.  bytes=103335 time=24.04
2012-07-25 14:35:59,270 INFO Generated pdf file.  bytes=103335 time=24.04
2012-07-25 14:35:59,276 WARNING xvfb:132 - Stopping X Server 7
2012-07-25 14:35:59,276 WARNING Stopping X Server 7
2012-07-25 14:36:03,317 INFO Generated PDF for email
2012-07-25 14:36:03,317 DEBUG simpleRequest > GET https://127.0.0.1:8089/services/search/jobs/scheduler__admin_c3BsdW5rX2FwcF93YXM_X1NjaGVkdWxlZFZpZXdfX2Vycm9ycw_at_1343244900_cba568bb194d124c?message_level=warn [] sessionSource=direct
2012-07-25 14:36:03,324 DEBUG simpleRequest < server responded status=200 responseTime=0.0067s
2012-07-25 14:36:03,324 DEBUG getStatus - elapsed=0.00692391395569 nextRetry=0.0500000026555
2012-07-25 14:36:03,537 INFO Sending email. subject="Splunk Alert: errors", results_link="http://d-csdttk1.target.com:9000/app/splunk_app_was/@go?sid=scheduler__admin_c3BsdW5rX2FwcF93YXM_X1NjaGVkdWxlZFZpZXdfX2Vycm9ycw_at_1343244900_cba568bb194d124c", recepients="['xxx.xxx@xxx.xxx]"
2012-07-25 14:36:29,346 INFO    pdfhandler:628 - Generated pdf file.  bytes=81621 time=360.36
2012-07-25 14:36:29,346 INFO Generated pdf file.  bytes=81621 time=360.36
2012-07-25 14:36:29,352 WARNING pdfhandler:749 - Firefox timed out while waiting for the report to be rendered - Incomplete snapshot generated
2012-07-25 14:36:29,352 WARNING Firefox timed out while waiting for the report to be rendered - Incomplete snapshot generated
2012-07-25 14:36:29,353 WARNING xvfb:132 - Stopping X Server 5
2012-07-25 14:36:29,353 WARNING Stopping X Server 5

Some more checks -
grep POST.*renderpdf splunkd_access.log

    127.0.0.1 - - [25/Jul/2012:14:30:26.719 -0500] "POST /services/pdfserver/renderpdf HTTP/1.1" 200 103648 - - - 31290ms
127.0.0.1 - - [25/Jul/2012:14:35:32.952 -0500] "POST /services/pdfserver/renderpdf HTTP/1.1" 200 103648 - - - 30361ms
127.0.0.1 - - [25/Jul/2012:14:30:26.704 -0500] "POST /services/pdfserver/renderpdf HTTP/1.1" 200 81934 - - - 366687ms
127.0.0.1 - - [25/Jul/2012:14:40:33.007 -0500] "POST /services/pdfserver/renderpdf HTTP/1.1" 200 103935 - - - 29943ms
127.0.0.1 - - [25/Jul/2012:14:35:32.932 -0500] "POST /services/pdfserver/renderpdf HTTP/1.1" 200 81572 - - - 366620ms
127.0.0.1 - - [25/Jul/2012:14:45:15.297 -0500] "POST /services/pdfserver/renderpdf HTTP/1.1" 200 103465 - - - 30515ms
127.0.0.1 - - [25/Jul/2012:14:40:09.085 -0500] "POST /services/pdfserver/renderpdf HTTP/1.1" 200 81765 - - - 366597ms
127.0.0.1 - - [25/Jul/2012:14:50:21.510 -0500] "POST /services/pdfserver/renderpdf HTTP/1.1" 200 103465 - - - 30488ms
127.0.0.1 - - [25/Jul/2012:14:45:15.289 -0500] "POST /services/pdfserver/renderpdf HTTP/1.1" 200 81799 - - - 366623ms 

I tried the following to get things moving -

  1. Navigated to '$SPLUNK_HOME/lib/python2.7/site-packages/splunk/appserver/mrsparkle/controllers/debug.py' and changed the value of 'timeout' to 200.
  2. Navigated to $SPLUNK_HOME/etc/system/default/pdf_server.conf and added the following lines to the end of the file

    [settings]
    screenshot_enabled = True
    restrict_to_splunkweb = False
    firefox_timeout = 340

  3. Edited the $SPLUNK_HOME/lib/python2.7/site-packages/splunk/rest/init.py file and updated the following -

    SPLUNKD_CONNECTION_TIMEOUT = 300

  4. Restarted the Splunk Server

Any other tips/tricks to get this working?

0 Karma

zjerrybrennock
Engager

The answer from sgarvin55 above solved it for me. I changed /opt/splunk/lib/python2.7/site-packages/splunk/rest/init.py to increase SPLUNKD_CONNECTION_TIMEOUT from 30 from 300.

0 Karma

gradycraig
Explorer

Same issue here after the 4.2 to 4.3 upgrade, sometimes PDF printing works and sometimes it times out. Also seeing
"Your network connection was either restored or Splunk web was restarted."
"Your network connection may have been lost or Splunk web may be down." even though neither is true. Working with Splunk support without any luck yet, looks like it could be a wide spread issue.

0 Karma

ytamura
Path Finder

yes, after trying various things, we are awaiting splunk support to get a resolution from their developers... will update if anything useful comes back.

0 Karma

zjerrybrennock
Engager

My dashboards also started timing out with the upgrade to 4.3, timeouts are not consistent. Has anyone entered a support ticket on this yet?

0 Karma

ytamura
Path Finder

Exact same problem here upon upgrading to 4.3, and also PDF Server 1.3. When I reduced the PDF generation time to <20 seconds by using scheduling saved searches and making the dashboard use the cache, it worked, however not 100% of the time. For this workaround, I referred to: http://splunk-base.splunk.com/answers/862/can-i-make-my-dashboards-load-faster-by-scheduling-the-sea...

Firefox timeout seems to have nothing to do with it, nor do some other hard-coded timeouts, including SPLUNKD_CONNECTION_TIMEOUT=30 in /python2.7/site-packages/splunk/rest/__init__.py. Got sendemail.py to print out the stack trace, and investigated other scripts but to no avail:

2012-02-22 15:10:32,715 ERROR An error occurred while generating a PDF of this report: Failed to generate PDF: Appserver failed to dispatch report request to /services/pdfserver/renderpdf: Splunkd daemon is not responding: ('The read operation timed out',)
Internal Server Error
Traceback (most recent call last):
  File "/opt/splunk/etc/apps/search/bin/sendemail.py", line 557, in generatePDF
    }), TIMEOUT)
  File "/opt/splunk/lib/python2.7/urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "/opt/splunk/lib/python2.7/urllib2.py", line 400, in open
    response = meth(req, response)
  File "/opt/splunk/lib/python2.7/urllib2.py", line 513, in http_response
    'http', request, response, code, msg, hdrs)
  File "/opt/splunk/lib/python2.7/urllib2.py", line 438, in error
    return self._call_chain(*args)
  File "/opt/splunk/lib/python2.7/urllib2.py", line 372, in _call_chain
    result = func(*args)
  File "/opt/splunk/lib/python2.7/urllib2.py", line 521, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 500: Internal Server Error
0 Karma

domteich
Explorer

Hi.

Same problem here:

2012-02-20 16:55:47,826 ERROR An error occurred while generating a PDF of this report: Failed to generate PDF: Appserver failed to dispatch report request to https://<IP>:8089/services/pdfserver/renderpdf: Splunkd daemon is not responding: ('The read operation timed out',)
2012-02-20 16:55:47,827 DEBUG simpleRequest > GET https://127.0.0.1:8089/services/search/jobs/scheduler_YWRtLWR0ZWljaG1hbm4__hscon_X1NjaGVkdWxlZFZpZXd... [] sessionSource=direct

Is a solution available?

Thanks

0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...