现有两个Series如下:
import pandas as pd s1 = pd.Series([.2, .0, .6, .2]) s2 = pd.Series([.3, .6, .0, .1])
求两个Series的皮尔逊系数
解决方法就是把Series当成是一个向量去处理,如下:
s1.corr(s2)
输出
code:-0.85106449634699
现有数据如下:
import pandas as pd sr = pd.Series([‘New York‘, ‘Chicago‘, ‘Toronto‘, ‘Lisbon‘, ‘Rio‘, ‘Moscow‘]) didx = pd.date_range(start =‘2014-08-01 10:00‘, freq =‘W‘, periods = 6, tz = ‘Europe/Berlin‘) sr.index = didx print(sr)
输出如下:
2014-08-03 10:00:00+02:00 New York 2014-08-10 10:00:00+02:00 Chicago 2014-08-17 10:00:00+02:00 Toronto 2014-08-24 10:00:00+02:00 Lisbon 2014-08-31 10:00:00+02:00 Rio 2014-09-07 10:00:00+02:00 Moscow Freq: W-SUN, dtype: object
把数据向下移动两行,解决方法如下:
sr.shift(periods = 2)
输出
2014-08-03 10:00:00+02:00 NaN 2014-08-10 10:00:00+02:00 NaN 2014-08-17 10:00:00+02:00 New York 2014-08-24 10:00:00+02:00 Chicago 2014-08-31 10:00:00+02:00 Toronto 2014-09-07 10:00:00+02:00 Lisbon Freq: W-SUN, dtype: object
现有数据如下:
import pandas as pd sr = pd.Series([‘New York‘, ‘Chicago‘, ‘Toronto‘, ‘Lisbon‘, ‘Rio‘, ‘Moscow‘]) didx = pd.date_range(start =‘2014-08-01 10:00‘, freq =‘W‘, periods = 6, tz = ‘Europe/Berlin‘) sr.index = didx print(sr)
输出:
2014-08-03 10:00:00+02:00 New York 2014-08-10 10:00:00+02:00 Chicago 2014-08-17 10:00:00+02:00 Toronto 2014-08-24 10:00:00+02:00 Lisbon 2014-08-31 10:00:00+02:00 Rio 2014-09-07 10:00:00+02:00 Moscow Freq: W-SUN, dtype: object
把数据向上移动两行,解决方法如下:
sr.shift(periods = -2)
输出
2014-08-03 10:00:00+02:00 Toronto 2014-08-10 10:00:00+02:00 Lisbon 2014-08-17 10:00:00+02:00 Rio 2014-08-24 10:00:00+02:00 Moscow 2014-08-31 10:00:00+02:00 NaN 2014-09-07 10:00:00+02:00 NaN Freq: W-SUN, dtype: object
自相关系数
平稳序列的自相关系数会快速收敛,从哪一阶开始快速收敛(忽然从一个较大的值降到0附近)就说明是哪一阶模型,例如自相关函数图拖尾,偏自相关函数图截尾,n从2或3开始控制在置信区间之内,因而可判定为AR(2)模型或者AR(3)模型。 如果你不懂时间序列是啥就别看这段了,这需要你系统的学习时间序列。
现有数据如下:
import pandas as pd sr = pd.Series([11, 21, 8, 18, 65, 18, 32, 10, 5, 32, None]) index_ = pd.date_range(‘2010-10-09 08:45‘, periods = 11, freq =‘H‘) sr.index = index_ print(sr)
输出:
2010-10-09 08:45:00 11.0
2010-10-09 09:45:00 21.0
2010-10-09 10:45:00 8.0
2010-10-09 11:45:00 18.0
2010-10-09 12:45:00 65.0
2010-10-09 13:45:00 18.0
2010-10-09 14:45:00 32.0
2010-10-09 15:45:00 10.0
2010-10-09 16:45:00 5.0
2010-10-09 17:45:00 32.0
2010-10-09 18:45:00 NaN
Freq: H, dtype: float64
求该Series的自相关系数
result = sr.autocorr()
result
输出:
result:-0.13907359397344918
result:-0.13907359397344918
如果你没学过时间序列,还非得想知道啥事autocorr,那好吧,我看了一下源码,其实autocorr就是
result = sr.corr(sr.shift())
result
输出:
原文:https://www.cnblogs.com/tracydzf/p/14084096.html