针对猫眼电影反爬理解,貌似也就只有猫眼电影的了

时间：2019-06-29 21:31:54 阅读：121 评论：0 收藏：0 [点我收藏+]

import re
import requests
from fontTools.ttLib import TTFont
from lxml import etree

headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36 "}

index_url = ‘https://maoyan.com/board/1‘
# 获取首页内容
response_index = requests.get(index_url, headers=headers).text

# 获取新的字体文件的url
woff_ = re.search(r"url\(‘(.*\.woff)‘\)", response_index).group(1)
woff_url = ‘http:‘ + woff_
response_woff = requests.get(woff_url, headers=headers).content

with open(‘fonts.woff‘, ‘wb‘) as f:
    f.write(response_woff)

# 自己手动下载一个字体文件,前提是所有字体文件编码的对象是一样的,如果是经过改动的,那么不行,
# 如果只有很少的字体需要替换,那么可以试试,多个就不行了,比如汽车之家帖子,汉字都包含
#base_nums， base_fonts 需要自己手动解析映射关系， 要和basefonts.woff一致
# 将自己下载的文件加载到内存中去

baseFonts = TTFont(‘309b80902447ba44c30dff21dcb11a012076.woff‘)
base_nums = [‘4‘, ‘6‘, ‘3‘, ‘5‘, ‘9‘, ‘2‘, ‘0‘, ‘8‘, ‘1‘, ‘7‘]
base_fonts = [‘uniF83F‘, ‘uniF045‘, ‘uniEA3E‘, ‘uniE5DE‘, ‘uniE4FC‘, ‘uniF066‘, ‘uniE380‘, ‘uniEB23‘, ‘uniE6B8‘,‘uniF128‘]

# 加载新文件到内存中去
onlineFonts = TTFont(‘fonts.woff‘)
# 将字体文件中flyp字段中 前面和后面去掉,不属于编码的对象
uni_list = onlineFonts.getGlyphNames()[1:-1]
temp = {}
# 解析字体库 默认0-9 10个数字
for i in range(10):
    onlineGlyph = onlineFonts[‘glyf‘][uni_list[i]]
    for j in range(10):
        baseGlyph = baseFonts[‘glyf‘][base_fonts[j]]
        if onlineGlyph == baseGlyph:
            temp["&#x" + uni_list[i][3:].lower() + ‘;‘] = base_nums[j]

# 字符替换
pat = ‘(‘ + ‘|‘.join(temp.keys()) + ‘)‘
response_index = re.sub(pat, lambda x: temp[x.group()],     response_index)

print(response_index)

原文：https://www.cnblogs.com/zengxm/p/11107660.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)