首页 > 编程语言 > 详细

python3下urlopen解析中文url编码错误

时间:2015-12-16 17:09:31      阅读:635      评论:0      收藏:0      [点我收藏+]

  这是在ipython下测试的结果:

In [24]: x
Out[24]: http://127.0.0.1:8000/xxx/?id=a45ex0bad3c9&game=五子棋

In [25]: urlopen(x)
---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
<ipython-input-25-441e6d61f53c> in <module>()
----> 1 urlopen(x)

/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault)
    154     else:
    155         opener = _opener
--> 156     return opener.open(url, data, timeout)
    157
    158 def install_opener(opener):

/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py in open(self, fullurl, data, timeout)
    467             req = meth(req)
    468
--> 469         response = self._open(req, data)
    470
    471         # post-process response

/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py in _open(self, req, data)
    485         protocol = req.type
    486         result = self._call_chain(self.handle_open, protocol, protocol +
--> 487                                   _open, req)
    488         if result:
    489             return result

/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
    445         for handler in handlers:
    446             func = getattr(handler, meth_name)
--> 447             result = func(*args)
    448             if result is not None:
    449                 return result

/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py in http_open(self, req)
   1272
   1273     def http_open(self, req):
-> 1274         return self.do_open(http.client.HTTPConnection, req)
   1275
   1276     http_request = AbstractHTTPHandler.do_request_

/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py in do_open(self, http_class, req, **http_conn_args)
   1246
   1247         try:
-> 1248             h.request(req.get_method(), req.selector, req.data, headers)
   1249         except socket.error as err: # timeout error
   1250             h.close()

/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/http/client.py in request(self, method, url, body, headers)
   1063     def request(self, method, url, body=None, headers={}):
   1064         """Send a complete request to the server."""
-> 1065         self._send_request(method, url, body, headers)
   1066
   1067     def _set_content_length(self, body):

/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/http/client.py in _send_request(self, method, url, body, headers)
   1091             skips[skip_accept_encoding] = 1
   1092
-> 1093         self.putrequest(method, url, **skips)
   1094
   1095         if body is not None and (content-length not in header_names):

/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/http/client.py in putrequest(self, method, url, skip_host, skip_accept_encoding)
    955
    956         # Non-ASCII characters should have been eliminated earlier
--> 957         self._output(request.encode(ascii))
    958
    959         if self._http_vsn == 11:

UnicodeEncodeError: ascii codec cant encode characters in position 46-48: ordinal not in range(128)

原因是因为中文字符串无法编码成为ascii码。

看了看后台提交的中文url,最后是转换为百分号编码模式来提交的。所以我们只需要对中文进行百分号编码之后,就可以encode为ascii了。

python3中我们可以找到这个编码的函数:

In [36]: from urllib.request import quote

In [37]: quote(你好)
Out[37]: %E4%BD%A0%E5%A5%BD

 

python3下urlopen解析中文url编码错误

原文:http://www.cnblogs.com/Blaxon/p/5051514.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!