I've just run across an interesting issue with the use of urldecode : if the attempt to decode fails, the function returns an empty string ( "" ). My application logs to Splunk, via HTTP from all around the world, and so all my logs come in encoded. What I've been seeing is akin to the following:
I receive a line like "message=URLError%3A+%3Curlopen+error+%5BErrno+10061%5D+%CF%EE%E4%EA%EB%FE%F7%E5%ED%E8%E5+%ED%E5+%F3%F1%F2%E0%ED%EE%E2%EB%E5%ED%EE%2C%3E%0A" which I then usually decode with eval line=urldecode(message) | table line
This would usually print me out a table of the logs I'm receiving.
However, the above message ( URLError%3A+%3Curlopen+error+%5BErrno+10061%5D+%CF%EE%E4%EA%EB%FE%F7%E5%ED%E8%E5+%ED%E5+%F3%F1%F2%E0%ED%EE%E2%EB%E5%ED%EE%2C%3E%0A ) fails to be decoded by the urldecode function.
If you trim the line you can see that it decodes fine until you pass 10061%5D . This decodes as 10061] Unfortunately after this point, the decoding fails, and returns the entire things as an empty string.
If you visit http://www.url-encode-decode.com/urldecode and enter the following string you will see that it decodes the bulk of the message, but fails on some of the values, and instead of bombing out completely, returns them as question marks, similar to, for example, 'replace' method used in Python's Unicode encoding works ( https://docs.python.org/2/howto/unicode.html?highlight=replace#the-unicode-type 😞
>>> u = unichr(40960) + u'abcd' + unichr(1972)
>>> u.encode('ascii', 'replace')
'?abcd?'
This is what I expected to happen, since it means that I can actually use some of the logged information rather than just dropping it.
Does anyone know of a way I can resolve this? Or tell the urldecode function to either a) use a different encoding, or b) to use something akin to the 'replace' functionality?
Thank you for your time.
... View more