使用如下函数:
drop_duplicates
具体示例如下:
import pandas as pd # 建立一个dataframe数据 df = pd.DataFrame({‘k1‘:[‘one‘]*3+[‘two‘]*4,‘k2‘:[1,1,2,3,3,4,4]}) df[‘v1‘]=range(7) df # 结果: k1 k2 v1 0 one 1 0 1 one 1 1 2 one 2 2 3 two 3 3 4 two 3 4 5 two 4 5 6 two 4 6
df.drop_duplicates() # 由于没有完全重复的行,因此返回结果跟原数据一致 # 结果: k1 k2 v1 0 one 1 0 1 one 1 1 2 one 2 2 3 two 3 3 4 two 3 4 5 two 4 5 6 two 4 6
df.drop_duplicates(‘k1‘,keep=‘first‘) # 结果: k1 k2 v1 0 one 1 0 3 two 3 3
df.drop_duplicates([‘k2‘,‘k1‘],keep=‘first‘) # 结果: k1 k2 v1 0 one 1 0 2 one 2 2 3 two 3 3 5 two 4 5
keep:{‘first’, ‘last’, False}, 默认值 ‘first’
参考链接:https://www.jianshu.com/p/cb217042aca9
原文:https://www.cnblogs.com/leoych/p/14286635.html