1.读入csv文件
import pandas as pd df = pd.read_csv(r‘F:\catering_sale.csv‘) print(df) 日期 销量 0 2015-03-01 51.0 1 2015-02-28 2618.2 2 2015-02-27 2608.4 3 2015-02-26 2651.9 4 2015-02-25 3442.1 5 2015-02-24 3393.1 6 2015-02-23 3136.6 7 2015-02-22 3744.1 8 2015-02-21 6607.4 9 2015-02-20 4060.3 10 2015-02-19 3614.7 11 2015-02-18 3295.5 12 2015-02-16 2332.1 13 2015-02-15 2699.3 14 2015-02-14 NaN 15 2015-02-13 3036.8 16 2015-02-12 865.0 17 2015-02-11 3014.3 18 2015-02-10 2742.8 19 2015-02-09 2173.5 20 2015-02-08 3161.8 21 2015-02-07 3023.8 22 2015-02-06 2998.1 23 2015-02-05 2805.9 24 2015-02-04 2383.4 25 2015-02-03 2620.2 26 2015-02-02 2600.0 27 2015-02-01 2358.6 28 2015-01-31 2682.2 29 2015-01-30 2766.8 .. ... ... 171 2014-08-31 3494.7 172 2014-08-30 3691.9 173 2014-08-29 2929.5 174 2014-08-28 2760.6 175 2014-08-27 2593.7 176 2014-08-26 2884.4 177 2014-08-25 2591.3 178 2014-08-24 3022.6 179 2014-08-23 3052.1 180 2014-08-22 2789.2 181 2014-08-21 2909.8 182 2014-08-20 2326.8 183 2014-08-19 2453.1 184 2014-08-18 2351.2 185 2014-08-17 3279.1 186 2014-08-16 3381.9 187 2014-08-15 2988.1 188 2014-08-14 2577.7 189 2014-08-13 2332.3 190 2014-08-12 2518.6 191 2014-08-11 2697.5 192 2014-08-10 3244.7 193 2014-08-09 3346.7 194 2014-08-08 2900.6 195 2014-08-07 2759.1 196 2014-08-06 2915.8 197 2014-08-05 2618.1 198 2014-08-04 2993.0 199 2014-08-03 3436.4 200 2014-08-02 2261.7 [201 rows x 2 columns]
2.求最大、最小、均值
max_number = pd.DataFrame(df["销量"]).max() min_number = pd.DataFrame(df["销量"]).min() average = pd.DataFrame(df["销量"]).mean() print("max:") print(max_number) print("min:") print(min_number) print("ave:") print(average) max: 销量 9106.44 dtype: float64 min: 销量 22.0 dtype: float64 ave: 销量 2755.2147 dtype: float64
3.求缺失值个数和样本个数
missing_value = df[‘销量‘].shape[0] - df[‘销量‘].count() print("缺失个数:",missing_value) print("样本个数:",df[‘销量‘].shape[0]) 缺失个数: 1 样本个数: 201
4.绘制箱式图判断异常
import pandas as pd import matplotlib.pyplot as plt df = pd.read_csv(r‘F:\catering_sale.csv‘) #解决中文显示问题 plt.rcParams[‘font.sans-serif‘] = [‘KaiTi‘] # 指定默认字体 plt.rcParams[‘axes.unicode_minus‘] = False # 解决保存图像是负号‘-‘显示为方块的问题 fig,axes = plt.subplots() df.boxplot(column=‘销量‘,ax=axes) # column参数表示要绘制成箱形图的数据,可以是一列或多列 # by参数表示分组依据 axes.set_ylabel(‘values of data‘) fig.savefig(r‘F:\demo1.png‘)
原文:https://www.cnblogs.com/wenyan1123/p/14524182.html