【存疑】爬虫学习中 decode 问题

作者：S

2022 年 9 月 23 日
上海
本文字数：770 字
阅读完需：约 3 分钟

import urllib.requestfrom urllib import request
# 定义常用变量# url = "http://www.baidu.com/"url = "https://www.baidu.com/"headers = {    # 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',    # 加上下面这一行对响应结果decode()会报错    # 'Accept-Encoding': 'gzip, deflate, br',    'Accept-Language': 'zh-CN,zh;q=0.9,en-US;q=0.8,en;q=0.7',    'Cache-Control': 'max-age=0',    'Connection': 'keep-alive',    'sec-ch-ua': '"Google Chrome";v="89", "Chromium";v="89", ";Not A Brand";v="99"',    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36'}
# 创建请求对象（包装请求）req = urllib.request.Request(url=url, headers=headers)# 发送请求，获取响应对象 -urlopenres = request.urlopen(req)# 读取内容  - read……code = res.getcode()html = res.read().decode('utf-8')print(code)print(html)# filename = "wen.html"# with open(filename, "w", encoding="utf-8") as f:#     f.write(html)

复制代码

代码如上，在实验过程中，把第 10 行加上就会报错（报错如下），但是不加就没事，虽然浅浅地明白是编码问题，但是究竟为什么出现了这种问题还不理解，浏览器中默认也是第 10 行这样的参数，为什么加上会报错呢，是哪点细节不一样呢

Traceback (most recent call last):  File "C:\pythonProject\Request.py", line 23, in <module>    html = res.read().decode('utf-8')UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

复制代码

如果有路过的大佬，希望能不吝赐教

发布于: 刚刚阅读数: 3

S

关注

还未添加个人签名 2020.10.20 加入

还未添加个人简介

发布

暂无评论

创作场景

【存疑】爬虫学习中 decode 问题

S

评论