All Apps and Add-ons
Highlighted

Website input app: Python error "LookupError: unknown encoding: 3Dutf-8="

Path Finder

I get the error:

13/12/2017
20:51:38.141    
2017-12-13 20:51:38,141 ERROR An exception occurred when attempting to retrieve information from the web-page, stanza=web_input://www_mos_eisley_dk
Traceback (most recent call last):
  File "/splunk/etc/apps/website_input/bin/web_input.py", line 349, in run
    https_only=self.is_on_cloud(input_config.session_key))
  File "/splunk/etc/apps/website_input/bin/website_input_app/web_scraper.py", line 710, in scrape_page
    additional_fields=additional_fields, **kw)
  File "/splunk/etc/apps/website_input/bin/website_input_app/web_scraper.py", line 446, in get_result_single
    content_decoded = content.decode(encoding=encoding, errors='replace')
LookupError: unknown encoding: 3Dutf-8=
0 Karma
Highlighted

Re: Website input app: Python error "LookupError: unknown encoding: 3Dutf-8="

Champion

Could you share the URL that you are using if it is a publically available one? I would like to reproduce this myself. It looks like the website is provided an invalid encoding and the Website Inputs app doesn't handle that yet. I want to update the app to handle it more gracefully.

0 Karma
Highlighted

Re: Website input app: Python error "LookupError: unknown encoding: 3Dutf-8="

Path Finder

Its http://www.mos-eisley.dk - feel free 🙂

Splunk 7.0.1

And feel free to ask for futher info !

0 Karma
Highlighted

Re: Website input app: Python error "LookupError: unknown encoding: 3Dutf-8="

Path Finder

BTW . Its Confluence from Atlassian

0 Karma
Highlighted

Re: Website input app: Python error "LookupError: unknown encoding: 3Dutf-8="

Champion

This is a confirmed bug. I was able to reproduce this using the unit test framework which simulates a web-server providing an encoding that is invalid. See the bug report here: https://lukemurphey.net/issues/2190.

I have updated the app to now be forgiving if it sees an encoding it doesn't recognize. This is currently working. This fix will go out in version 4.5.2 (ETA: early next week).

0 Karma
Highlighted

Re: Website input app: Python error "LookupError: unknown encoding: 3Dutf-8="

Champion

@moseisleydk: thanks for the report.

Incidentally, I was unable to reproduce this on http://www.mos-eisley.dk today. Not sure if something changed.

This was still valid bug report though as I was able to reproduce this by recreating the scenario based on the stacktrace you provided.

0 Karma
Highlighted

Re: Website input app: Python error "LookupError: unknown encoding: 3Dutf-8="

Path Finder

Excellent - looking forward to it. I still get the error on 4.5.1:

2018-01-27 07:29:58,776 ERROR An exception occurred when attempting to retrieve information from the web-page, stanza=webinput://wwwmoseisleydk
Traceback (most recent call last):
File "/splunk/etc/apps/websiteinput/bin/webinput.py", line 349, in run
httpsonly=self.isoncloud(inputconfig.sessionkey))
File "/splunk/etc/apps/website
input/bin/websiteinputapp/webscraper.py", line 710, in scrapepage
additionalfields=additionalfields, **kw)
File "/splunk/etc/apps/websiteinput/bin/websiteinputapp/webscraper.py", line 446, in getresultsingle
content_decoded = content.decode(encoding=encoding, errors='replace')
LookupError: unknown encoding: 3Dutf-8=

0 Karma
Highlighted

Re: Website input app: Python error "LookupError: unknown encoding: 3Dutf-8="

Champion

@moseisleydk: Would you mind testing 4.5.2? You can get the app here: https://github.com/LukeMurphey/splunk-web-input/releases/tag/4.5.2-rc.1

I want to make sure that this fixes the issue since I wasn't able to reproduce the issue on 4.5.1 with your website.

0 Karma
Highlighted

Re: Website input app: Python error "LookupError: unknown encoding: 3Dutf-8="

Path Finder

Logs:

02/07/2018 21:14:00.529 INFO    The content could not be parsed, it doesn't appear to be valid HTML, url="http://www.mos-eisley.dk/dashboard/\"
02/07/2018 21:14:00.529 INFO    The content is going to be parsed without decoding because the parser refused to parse it with the detected encoding (http://goo.gl/4GRjJF), url="http://www.mos-eisley.dk/dashboard/\", encoding="cp1252"
02/07/2018 21:12:12.258 INFO    The content is going to be parsed without decoding because the parser refused to parse it with the detected encoding (http://goo.gl/4GRjJF), url="http://www.mos-eisley.dk/feeds/network.action?username=bnp&max=40&publicFeed=false&os_authType=basic&rssType=atom", encoding="UTF-8"
02/07/2018 21:12:08.922 ERROR   An exception occurred when attempting to retrieve information from the web-page, stanza=web_input://www_mos_eisley_dk
02/07/2018 21:12:08.922 ERROR   A general exception was thrown when executing a web request
02/07/2018 21:12:08.921 ERROR   A general exception was thrown when executing a web request
02/07/2018 21:11:28.858 INFO    The content could not be parsed, it doesn't appear to be valid HTML, url="http://www.mos-eisley.dk/plugins/inlinetasks/\"
02/07/2018 21:11:28.858 INFO    The content is going to be parsed without decoding because the parser refused to parse it with the detected encoding (http://goo.gl/4GRjJF), url="http://www.mos-eisley.dk/plugins/inlinetasks/\", encoding="cp1252"
02/07/2018 21:11:27.651 INFO    The content could not be parsed, it doesn't appear to be valid HTML, url="http://www.mos-eisley.dk/users/\"
02/07/2018 21:11:27.651 INFO    The content is going to be parsed without decoding because the parser refused to parse it with the detected encoding (http://goo.gl/4GRjJF), url="http://www.mos-eisley.dk/users/\", encoding="cp1252"
02/07/2018 21:11:25.590 INFO    The content could not be parsed, it doesn't appear to be valid HTML, url="http://www.mos-eisley.dk/spaces/\"
02/07/2018 21:11:25.590 INFO    The content is going to be parsed without decoding because the parser refused to parse it with the detected encoding (http://goo.gl/4GRjJF), url="http://www.mos-eisley.dk/spaces/\", encoding="cp1252"
02/07/2018 21:11:12.047 WARNING Detected encoding was not recognized and the content will be evaluated (possibly with the wrong encoding), encoding_detected="Shift_JIS"
02/07/2018 21:11:08.724 WARNING Detected encoding was not recognized and the content will be evaluated (possibly with the wrong encoding), encoding_detected="3Dutf-8="
02/07/2018 21:10:59.968 INFO    The content could not be parsed, it doesn't appear to be valid HTML, url="http://www.mos-eisley.dk/\"
02/07/2018 21:10:59.968 INFO    The content is going to be parsed without decoding because the parser refused to parse it with the detected encoding (http://goo.gl/4GRjJF), url="http://www.mos-eisley.dk/\", encoding="cp1252"
02/07/2018 21:10:59.117 INFO    Running web input, url="http://www.mos-eisley.dk"
0 Karma
Highlighted

Re: Website input app: Python error "LookupError: unknown encoding: 3Dutf-8="

Path Finder
02/07/2018 21:14:00.529 INFO    The content could not be parsed, it doesn't appear to be valid HTML, url="http://www.mos-eisley.dk/dashboard/\"
02/07/2018 21:14:00.529 INFO    The content is going to be parsed without decoding because the parser refused to parse it with the detected encoding (http://goo.gl/4GRjJF), url="http://www.mos-eisley.dk/dashboard/\", encoding="cp1252"
02/07/2018 21:12:12.258 INFO    The content is going to be parsed without decoding because the parser refused to parse it with the detected encoding (http://goo.gl/4GRjJF), url="http://www.mos-eisley.dk/feeds/network.action?username=bnp&max=40&publicFeed=false&os_authType=basic&rssType=atom", encoding="UTF-8"
02/07/2018 21:12:08.922 ERROR   An exception occurred when attempting to retrieve information from the web-page, stanza=web_input://www_mos_eisley_dk
02/07/2018 21:12:08.922 ERROR   A general exception was thrown when executing a web request
02/07/2018 21:12:08.921 ERROR   A general exception was thrown when executing a web request
02/07/2018 21:11:28.858 INFO    The content could not be parsed, it doesn't appear to be valid HTML, url="http://www.mos-eisley.dk/plugins/inlinetasks/\"
02/07/2018 21:11:28.858 INFO    The content is going to be parsed without decoding because the parser refused to parse it with the detected encoding (http://goo.gl/4GRjJF), url="http://www.mos-eisley.dk/plugins/inlinetasks/\", encoding="cp1252"
02/07/2018 21:11:27.651 INFO    The content could not be parsed, it doesn't appear to be valid HTML, url="http://www.mos-eisley.dk/users/\"
02/07/2018 21:11:27.651 INFO    The content is going to be parsed without decoding because the parser refused to parse it with the detected encoding (http://goo.gl/4GRjJF), url="http://www.mos-eisley.dk/users/\", encoding="cp1252"
02/07/2018 21:11:25.590 INFO    The content could not be parsed, it doesn't appear to be valid HTML, url="http://www.mos-eisley.dk/spaces/\"
02/07/2018 21:11:25.590 INFO    The content is going to be parsed without decoding because the parser refused to parse it with the detected encoding (http://goo.gl/4GRjJF), url="http://www.mos-eisley.dk/spaces/\", encoding="cp1252"
02/07/2018 21:11:12.047 WARNING Detected encoding was not recognized and the content will be evaluated (possibly with the wrong encoding), encoding_detected="Shift_JIS"
02/07/2018 21:11:08.724 WARNING Detected encoding was not recognized and the content will be evaluated (possibly with the wrong encoding), encoding_detected="3Dutf-8="
02/07/2018 21:10:59.968 INFO    The content could not be parsed, it doesn't appear to be valid HTML, url="http://www.mos-eisley.dk/\"
02/07/2018 21:10:59.968 INFO    The content is going to be parsed without decoding because the parser refused to parse it with the detected encoding (http://goo.gl/4GRjJF), url="http://www.mos-eisley.dk/\", encoding="cp1252"
02/07/2018 21:10:59.117 INFO    Running web input, url="http://www.mos-eisley.dk"
0 Karma