1. 去重的场景
url去重:防止发送重复请求
数据文本去重:防止储存重复数据
2.数据去重的原理
什么类型的数据:
重复的依据是什么:
例如: data1 = ["123",123,"456","qwe","qwe"]
列表去重方法:
# 方法一:集合法:乱序 data = ["123",123,"qwe","qwe","456","123"] ret = list(set(data)) print(ret) # 方法二:字典键值法:有序 data = ["123",123,"qwe","qwe","456","123"] # {‘123‘: None, 123: None, ‘qwe‘: None, ‘456‘: None} ret_dict = {}.fromkeys(data) # dict_keys([‘123‘, 123, ‘qwe‘, ‘456‘]) ret_list = ret_dict.keys() # [‘123‘, 123, ‘qwe‘, ‘456‘] print(list(ret_list))) # 方法三:循环判断法:有序 demo_list = list() for i in data: if i not in data: demo_list.append(i)
# [‘123‘, 123, ‘qwe‘, ‘456‘] print(demo_list)
例如: data1 = ["123",123,"456","qwe","qwe"]
限制:"123"和123是重复的,进行去重
data = ["123",123,"qwe","qwe","456","123"] ret_list = list(set[str(i) for i in data]) print(ret_list)
例如:对象去重
class Test(object): def __init__(self,v): self.v = v t1 = Test(100) t2 = Test(100) t3 = Test(200) t4 = t1 data = [t1,t2,t3,t4] # [<__main__.Test object at 0x000000000227E208>, <__main__.Test object at 0x000000000227E2B0>, <__main__.Test object at 0x00000000026FE0F0>] print(list(set(data)))
需求:剔除重复数据,Test对象的v相同则为重复数据
ret_list = list()
ret_set = set()
for i in range(len(data)):
if data[i].v not in ret_list:
ret_list.append(data[i].v)
ret_set.add(data[i])
# {<__main__.Test object at 0x00000000004E9320>, <__main__.Test object at 0x00000000004E9E48>}
print((ret_set))
需求:剔除重复数据,Test对象的继承的类相同则为重复数据
ret_list = list()
ret_set = set()
for i in range(len(data)):
if data[i].__class__ not in ret_list:
ret_list.append(data[i].__class__)
ret_set.add(data[i])
# {<__main__.Test object at 0x00000000026A9320>}
print((ret_set))
对即将产生的数据进行去重:容器去重(存储判断依据)
data2 = ["123",123,"456","qwe","qwe"]
ret_list = []
for data in data2:
if data not in ret_list:
ret_list.append(data)
print(ret_list)
原文:https://www.cnblogs.com/meloncodezhang/p/11483748.html