pandas 学习第3篇：序列的处理（排序、连接、替换、更新和缺失值）

时间：2019-09-23 11:24:06 阅读：198 评论：0 收藏：0 [点我收藏+]

对序列进行处理，包括对序列进行排序、追加一个序列、对序列值进行替换、对序列的值进行更新，并处理序列中出现的缺失值。

一，序列的排序

按照值或索引对序列进行排序：

Series.sort_values(self, axis=0, ascending=True, inplace=False, kind=‘quicksort‘, na_position=‘last‘)
Series.sort_index(self, axis=0, level=None, ascending=True, inplace=False, kind=‘quicksort‘, na_position=‘last‘, sort_remaining=True)

参数注释：

axis：对Series而言，只能是0
ascending：默认值是True，按照升序排序；如果设置为False，按照降序排序。
inplace：是否就地对原始序列进行排序，如果设置为True，那么对原始序列进行排序；如果设置为False，那么原始序列不会改变，返回有序的序列。
kind：排序的方法，有效值是quicksort,mergesort,heapsort，默认值是quicksort
na_position：first 把Nan放在顺序的开始，last把Nan放在顺序的最后
level：多级索引的级别，默认是None，按照level 0的索引进行排序；
sort_remaining：如果设置为True，对多级索引而言，其他级别的索引也会相应的进行排序。

1，按照值来排序

按照序列的值进行排序，Nan放在last位置，

>>> s = pd.Series([np.nan, 1, 3, 10, 5])
>>> s.sort_values()
1     1.0
2     3.0
4     5.0
3    10.0
0     NaN
dtype: float64

2，按照索引来排序

按照索引的值进行排序

>>> s = pd.Series([‘a‘, ‘b‘, ‘c‘, ‘d‘], index=[3, 2, 1, 4])
>>> s.sort_index()
1    c
2    b
3    a
4    d
dtype: object

二，序列的追加连接

使用追加的方法，把序列追加在另一个序列之后，合并为一个新的序列：

Series.append(self, to_append, ignore_index=False, verify_integrity=False)

参数注释：

to_append：追加的序列
ignore_index：默认值是False，不忽略索引；如果设置为True，那么连接之后的序列会重建索引。
verify_integrity：默认值是False，如果设置为True，在创建索引时出现重复会抛出异常。

举个例子，把两个序列合并为一个，当不忽略索引时，把序列的索引作为合并之后的索引；当忽略索引时，新的序列会重建索引。

>>> s1 = pd.Series([1, 2, 3])
>>> s2 = pd.Series([4, 5, 6])
>>> s1.append(s2)
0    1
1    2
2    3
0    4
1    5
2    6
dtype: int64
>>> s1.append(s2, ignore_index=True)
0    1
1    2
2    3
3    4
4    5
5    6
dtype: int64

三，序列值的替换

把序列中的值替换为另一个值：

Series.replace(self, to_replace=None, value=None, inplace=False, limit=None, regex=False, method=‘pad‘)

参数注释：

to_replace：参数to_replace是序列中原有的值，查找到该值之后，把该值替换为参数value指定的值
inplace：默认值是False，不就地修改序列
limit：替换的最大次数
regex：是否把to_replace 解释为正则表达式，默认值是False。如果设置为True，那么参数to_replace必须是字符串
method：有效值是pad、ffill、bfill，当参数to_replace是标量、列表或字典，并且参数value是None时，使用method参数来替换。

场景1：参数to_replace是标量，参数value也是标量

>>> s = pd.Series([0, 1, 2, 3, 4])
>>> s.replace(0, 5)
0    5
1    1
2    2
3    3
4    4
dtype: int64

场景2：参数to_replace是列表，参数value是标量，例如，把序列中匹配to_replace列表中的值替换为5

>>> s = pd.Series([0, 1, 2, 3, 4])
>>> s.replace([0,2],5)
0    5
1    1
2    5
3    3
4    4
dtype: int64

场景3，参数to_replace是字典，参数value是None，例如，把序列中匹配字典的key的值替换为字典的value。

>>> s = pd.Series([0, 1, 2, 3, 4])
>>> s.replace({0:5,1:7})
0    5
1    7
2    2
3    3
4    4
dtype: int64

四，序列的更新

序列值的更新，有3种方式，第一种方式是使用标量值更新序列的单个值，第二种方式是通过索引切片修改序列的多个值，第三种方式是使用序列来更新一个序列。

1，使用标量来更新序列

索引到索引的单个值，通过赋值来修改序列

>>> s = pd.Series([0, 1, 2, 3, 4])
>>> s.iat[1]=7
>>> s
0    0
1    7
2    2
3    3
4    4
dtype: int64

2，使用切片来更新序列

通过loc属性获得序列的切片，通过赋值一个列表来修改序列的多个值：

>>> s.loc[1:2]=[2,3]
>>> s
0    0
1    2
2    3
3    3
4    4
dtype: int64

3，使用序列来更新序列

按照索引对齐方式就地修改序列，也就是说，在修改原始序列的值时，原始序列的索引必须和参数序列进行匹配，把索引相同的值修改为新值。

Series.update(self, other)

举个例子，使用update()函数，修改索引为0和2的值为‘d‘和‘e‘：

>>> s = pd.Series([‘a‘, ‘b‘, ‘c‘])
>>> s.update(pd.Series([‘d‘, ‘e‘], index=[0, 2]))
>>> s
0    d
1    b
2    e
dtype: object

五，处理序列的缺失值

缺失值使用NumPy.NaN ，NumPy.nan或者None来表示，使用isna()函数来检查是否存在NA，使用dropna()删除序列中的NA值，使用fillna()函数填充缺失值，

1，检查序列是否存在缺失值

>>> s=pd.Series(data=[1,2,np.NaN,4])
>>> s.isna()
0    False
1    False
2     True
3    False
dtype: bool

2，删除序列中的缺失值

>>> s.dropna()
0    1.0
1    2.0
3    4.0
dtype: float64

六，根据相邻的有效数据来填充

使用fillna()函数，找到缺失数据相邻的有效数据，使用该有效数据来填充缺失值：

Series.fillna(self, value=None, method=None, axis=None, inplace=False, limit=None, downcast=None, **kwargs)

参数注释：

value：用于填充缺失的值
method：寻找有效值的方法，回填（‘backfill’, ‘bfill’）,补填（ ‘pad’, ‘ffill’）和固定值填充（method= None），默认值是None
downcast：向下转换，尽可能把类型转换为较低的类型，默认值是None，例如，尽可能把float64转换为Int64。

1，回填

回填是指backfill和bfill 方法，用缺失值之后的第一有效值来填充

>>> s.fillna(method=‘bfill‘)
0    1.0
1    2.0
2    4.0
3    4.0
dtype: float64

2，补填

补填是指 pad和ffill方法，用缺失值之前的有效值来填充

>>> s.fillna(method=‘ffill‘)
0    1.0
1    2.0
2    2.0
3    4.0
dtype: float64

3，固定值填充

当method为None时，使用value参数指定的值来填充缺失值，固定值可以是均值、中位数、和众数，

>>> s.fillna(value=3,method=None)
0    1.0
1    2.0
2    3.0
3    4.0
dtype: float64

七，使用插补法来填充

使用插补法拟合出缺失的值，然后用拟合值来填充缺失值：

Series.interpolate(self, method=‘linear‘, axis=0, limit=None, inplace=False, limit_direction=‘forward‘, limit_area=None, downcast=None, **kwargs)

参数注释：

limit_direction：限制的方向，有效值是{‘forward’, ‘backward’, ‘both’}，如果指定方向，使用该方向来填充NaN
limit_area：有效值是{None, ‘inside’, ‘outside’}，None表示没有填充限制，inside表示只填充那些被有效值围绕的NaN；outside表示在有效值之外填充NaN（推断）。
method：获得插值的方法，默认值是线性(linear)，已经实现的method是：linear、time、index、pad、‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘spline’, ‘barycentric’, ‘polynomial’,‘krogh’, ‘piecewise_polynomial’, ‘spline’, ‘pchip’, ‘akima’,‘from_derivatives’
**kwargs：关键字参数，传递给插值函数

常见的插补方法是线性回归和多项式回归

1，线性回归拟合

linear是默认的拟合方法，linear 忽略索引，并发序列值作为等间距，

>>> s.interpolate()
0    1.0
1    2.0
2    3.0
3    4.0
dtype: float64

2，多项式拟合

polynomial表示多项式拟合，需要传递order参数：

>>> s.interpolate(method=‘polynomial‘,order=2)
0    1.0
1    2.0
2    3.0
3    4.0
dtype: float64

参考文档：

pandas 学习第3篇：序列的处理（排序、连接、替换、更新和缺失值）

原文：https://www.cnblogs.com/ljhdo/p/10279486.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)

pandas 学习 第3篇：序列的处理（排序、连接、替换、更新和缺失值）

一，序列的排序

二，序列的追加连接

三，序列值的替换

四，序列的更新

五，处理序列的缺失值

六，根据相邻的有效数据来填充

七，使用插补法来填充

pandas 学习第3篇：序列的处理（排序、连接、替换、更新和缺失值）