首页 > 其他 > 详细

数据聚合与分组操作-数据透视表

时间:2020-03-16 23:32:29      阅读:75      评论:0      收藏:0      [点我收藏+]

数据聚合与分组操作-数据透视表

import numpy as np
import pandas as pd

tips = pd.DataFrame({‘total_bill‘:np.arange(50,70),
                    ‘tip‘:np.arange(20,40),
                    ‘smoker‘:[‘Yes‘,‘No‘,‘Yes‘]*6+[‘No‘,‘No‘],
                    ‘day‘:[‘Fri‘,‘Sun‘,‘Thu‘,‘Sat‘]*5,
                    ‘time‘:[‘Lunch‘,‘Dinner‘]*10,
                    ‘size‘:np.arange(1,21),
                    ‘tip_pct‘:np.random.rand(20)})
tips

  daysizesmokertimetiptip_pcttotal_bill

0Fri1YesLunch200.15309350

1Sun2NoDinner210.42492351

2Thu3YesLunch220.01462852

3Sat4YesDinner230.55622553

4Fri5NoLunch240.53368354

5Sun6YesDinner250.98833955

6Thu7YesLunch260.75553856

7Sat8NoDinner270.20225357

8Fri9YesLunch280.50649358

9Sun10YesDinner290.71737759

10Thu11NoLunch300.97546760

11Sat12YesDinner310.26811861

12Fri13YesLunch320.86060962

13Sun14NoDinner330.28733263

14Thu15YesLunch340.78252064

15Sat16YesDinner350.67300265

16Fri17NoLunch360.12678566

17Sun18YesDinner370.46973867

18Thu19NoLunch380.67784768

19Sat20NoDinner390.61011069

tips.pivot_table(index=[‘day‘,‘smoker‘])  # 默认取平均值

  

  sizetiptip_pcttotal_bill
daysmoker    
FriNo 11.000000 30.000000 0.330234 60.000000
Yes 7.666667 26.666667 0.506731 56.666667
SatNo 14.000000 33.000000 0.406182 63.000000
Yes 10.666667 29.666667 0.499115 59.666667
SunNo 8.000000 27.000000 0.356128 57.000000
Yes 11.333333 30.333333 0.725151 60.333333
ThuNo 15.000000 34.000000 0.826657 64.000000
Yes 8.333333 27.333333 0.517562 57.333333
tips.groupby([‘day‘,‘smoker‘]).mean()  # 与上句同效果

  

  sizetiptip_pcttotal_bill
daysmoker    
FriNo 11.000000 30.000000 0.330234 60.000000
Yes 7.666667 26.666667 0.506731 56.666667
SatNo 14.000000 33.000000 0.406182 63.000000
Yes 10.666667 29.666667 0.499115 59.666667
SunNo 8.000000 27.000000 0.356128 57.000000
Yes 11.333333 30.333333 0.725151 60.333333
ThuNo 15.000000 34.000000 0.826657 64.000000
Yes 8.333333 27.333333 0.517562 57.333333
tips.pivot_table(‘tip_pct‘,index=[‘time‘,‘smoker‘],columns=‘day‘,
                aggfunc=len,margins=True)

  

 dayFriSatSunThuAll
timesmoker     
DinnerNo NaN 2.0 2.0 NaN 4.0
Yes NaN 3.0 3.0 NaN 6.0
LunchNo 2.0 NaN NaN 2.0 4.0
Yes 3.0 NaN NaN 3.0 6.0
All  5.0 5.0 5.0 5.0 20.0
tips.pivot_table(‘tip_pct‘,index=[‘time‘,‘smoker‘],columns=‘day‘,
                aggfunc=‘count‘,margins=True)   # 集成函数写成‘count‘,与上句同效果

  

 dayFriSatSunThuAll
timesmoker     
DinnerNo NaN 2.0 2.0 NaN 4.0
Yes NaN 3.0 3.0 NaN 6.0
LunchNo 2.0 NaN NaN 2.0 4.0
Yes 3.0 NaN NaN 3.0 6.0
All  5.0 5.0 5.0 5.0 20.0

 

 

 

 

 

 

 


10YesDinner290.7173775910Thu11NoLunch300.9754676011Sat12YesDinner310.2681186112Fri13YesLunch320.8606096213Sun14NoDinner330.2873326314Thu15YesLunch340.7825206415Sat16YesDinner350.6730026516Fri17NoLunch360.1267856617Sun18YesDinner370.4697386718Thu19NoLunch380.6778476819Sat20NoDinner390.61011069

数据聚合与分组操作-数据透视表

原文:https://www.cnblogs.com/djlbolgs/p/12507332.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!