jupter nootbok 快捷键、NumPy模块初识

时间：2019-02-16 16:59:26 阅读：252 评论：0 收藏：0 [点我收藏+]

jupter nootbok 快捷键

插入cell：a b
删除cell：x
cell模式的切换：m：Markdown模式 y：code模式
运行cell：shift+enter
tab：补全
shift+tab：打开帮助文档

NumPy

NumPy(Numerical Python) 是 Python 语言的一个扩展程序库，支持大量的维度数组与矩阵运算，此外也针对数组运算提供大量的数学函数库。

一、创建ndarray

1. 使用np.array()创建
一维数据创建
import numpy as np
np.array([1,2,3,4,5],dtype=int)
二维数组创建
np.array([[1,2,3],[4,5,6],[7.7,8,9]])

注意：
numpy默认ndarray的所有元素的类型是相同的
如果传进来的列表中包含不同的类型，则统一为同一类型，优先级：str>float>int

使用matplotlib.pyplot获取一个numpy数组，数据来源于一张图片

import matplotlib.pyplot as plt
img_arr = plt.imread(‘bobo.jpg‘)#当前目录下的图片
plt.imshow(img_arr)
img.shape
#(626, 413, 3)#前面两个数字表示像素，最后一个表示颜色

使用np的routines函数创建数组

包含以下常见创建方法：
1) np.ones(shape, dtype=None, order=‘C‘)
np.ones(shape=(4,5),dtype=float)#默认为1
np.zeros(shape, dtype=None, order=‘C‘)#默认为0
np.full(shape, fill_value, dtype=None, order=‘C‘)
np.full(shape=(6,7),fill_value=999)
np.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None) 等差数列
np.linspace(0,100,20)#返回20个数，等差是5的一维数组
np.arange([start, ]stop, [step, ]dtype=None)
arr1 = np.arange(3,10,2) #3是开始，10是结束，2是步长 的一维数组
np.random.seed(10)#固定时间因子
np.random.randint(0,100,size=(3,4))#产生的随机数固定
array([[ 9, 15, 64, 28],
       [89, 93, 29,  8],
       [73,  0, 40, 36]])

注意：只有arange和linspace产生一维数组，其他的都可产生一维或者多维数组

ndarray的属性

4个必记参数： ndim：维度 shape：形状（各维度的长度） size：总长度  dtype：元素类型

ndarray的基本操作

1. 索引
一维与列表完全一致 多维时同理
np.random.seed(1)
arr = np.random.randint(0,100,size=(5,5)) 
#根据索引修改数据
arr[1][2] = 6666444
#获取二维数组前两行
arr[0:2]
#获取二维数组前两列
arr[:,0:2]
#获取二维数组前两行和前两列数据
arr[0:2,0:2]
#将数组的行倒序
arr[::-1]
#列倒序
arr[:,::-1]
#全部倒序
arr[::-1,::-1]
#将图片进行全倒置操作
plt.imshow(img_arr[::-1,::-1,::-1])
变形
.将一维数组变形成多维数组
array([[     37,      12,      72,       9,      75],
       [      5,      79, 6666444,      16,       1],
       [     76,      71,       6,      25,      50],
       [     20,      18,      84,      11,      28],
       [     29,      14,      50,      68,      87]])
arr.shape
(5, 5)

使用arr.reshape()函数，注意参数是一个tuple！数组元素变形前后要统一!

a.reshape((5,-1))  #-1表示的是自动计算行或列
array([[  37,   12,   72,    9,   75],
       [   5,   79, 9999,   16,    1],
       [  76,   71,    6,   25,   50],
       [  20,   18,   84,   11,   28],
       [  29,   14,   50,   68,   87]])
将多维数组变形成一维数组
a = arr.reshape((25,))
a.shape
(25,)
图片倒置
img_arr.shape
(626, 413, 3)
img_arr.size
775614
#将原数据三维数组变形成一维数组
arr_1 = img_arr.reshape((775614,))
#将arr_1元素倒置
arr_1 = arr_1[::-1]
#将arr_1变形成三维数组
a_img = arr_1.reshape((626, 413, 3))
plt.imshow(a_img)

级联

np.concatenate((arr,arr),axis=1)  #0 纵轴  1 横轴
级联需要注意的点：
级联的参数是列表：一定要加中括号或小括号
维度必须相同
形状相符:在维度保持一致的前提下，如果进行横向（axis=1）级联，必须保证进行级联的数组行数保持一致。如果进行纵向（axis=0）级联，必须保证进行级联的数组列数保持一致。
可通过axis参数改变级联的方向
np.vstack():在竖直方向上堆叠
np.hstack():在水平方向上平铺

切分

与级联类似，三个函数完成切分工作：
np.split(arr,行／列号，轴):参数2是一个列表类型
plt.imshow(np.split(img,(400,),axis=0)[0])
np.vsplit
np.hsplit

副本

所有赋值运算不会为ndarray的任何元素创建副本。对赋值后的对象的操作也对原来的对象生效。
可使用copy()函数创建副本
c_arr = arr.copy()
c_arr[1][4] = 100100

ndarray的聚合操作

 求和np.sum
 arr.sum(axis=1)#求行的和
最大最小值：np.max/ np.min
平均值：np.mean()
其他聚合操作
Function Name    NaN-safe Version    Description
np.sum    np.nansum    Compute sum of elements
np.prod    np.nanprod    Compute product of elements
np.mean    np.nanmean    Compute mean of elements
np.std    np.nanstd    Compute standard deviation
np.var    np.nanvar    Compute variance
np.min    np.nanmin    Find minimum value
np.max    np.nanmax    Find maximum value
np.argmin    np.nanargmin    Find index of minimum value
np.argmax    np.nanargmax    Find index of maximum value
np.median    np.nanmedian    Compute median of elements
np.percentile    np.nanpercentile    Compute rank-based statistics of elements
np.any    N/A    Evaluate whether any elements are true
np.all    N/A    Evaluate whether all elements are true
np.power 幂运算

广播机制

【重要】ndarray广播机制的三条规则:缺失维度的数组将维度补充为进行运算的数组的维度。缺失的数组元素使用已有元素进行补充。
规则一：为缺失的维度补1(进行运算的两个数组之间的维度只能相差一个维度)
规则二：缺失元素用已有值填充
规则三：缺失维度的数组只能有一行或者一列

m = np.ones((2, 3))
a = np.arange(3)
display(m,a)
array([[1., 1., 1.],
       [1., 1., 1.]])
array([0, 1, 2])
m+a
array([[1., 2., 3.],
       [1., 2., 3.]])

ndarray的排序

快速排序
np.sort()与ndarray.sort()都可以，但有区别：

np.sort()不改变输入
ndarray.sort()本地处理，不占用空间，但改变输入

Pandas的数据结构

1、Series

Series是一种类似与一维数组的对象，由下面两个部分组成：

values：一组数据（ndarray类型）
index：相关的数据索引标签

1）Series的创建

import pandas as pd
from pandas import Series,DataFrame
import numpy as np

两种创建方式：

(1) 由列表或numpy数组创建

默认索引为0到N-1的整数型索引
#使用列表创建Series
Series(data=[1,2,3,4,5],name=‘bobo‘)
#使用numpy创建Series
Series(data=np.random.randint(0,10,size=(5,)))
#还可以通过设置index参数指定索引
s = Series(data=np.random.randint(0,10,size=(5,)),index=[‘a‘,‘b‘,‘c‘,‘d‘,‘e‘])
由字典创建:不能在使用index.但是依然存在默认索引
dic = {
    ‘语文‘:100,
    ‘数学‘:90
}
s = Series(data=dic)

Series的索引和切片

可以使用中括号取单个索引（此时返回的是元素类型），或者中括号里一个列表取多个索引（此时返回的是一个Series类型）。

(1) 显式索引：

- 使用index中的元素作为索引值
- 使用s.loc[]（推荐）:注意，loc中括号中放置的一定是显示索引
注意，此时是闭区间，能取到尾

(2) 隐式索引：

- 使用整数作为索引值
- 使用.iloc[]（推荐）:iloc中的中括号中必须放置隐式索引
注意，此时是半开区间，取不到尾

切片:隐式索引切片和显示索引切片
显示索引切片:index和loc
s[‘a‘:‘d‘]
s.loc[‘a‘:‘c‘]
隐式索引切片：整数索引值和iloc
s.iloc[0:3]

Series的基本概念

可以把Series看成一个定长的有序字典

向Series增加一行：相当于给字典增加一组键值对
s[‘g‘] = 10
可以通过shape，size，index,values等得到series的属性
s.index
s.values

可以使用s.head(),tail()分别查看前n个和后n个值
s.head(3)
对Series元素进行去重
s.unique()  #返回的是一个ndarray

当索引没有对应的值时，可能出现缺失数据显示NaN（not a number）的情况
使得两个Series进行相加
In [41]:

s1 = Series([1,2,3],index=[‘a‘,‘b‘,‘c‘])
s2 = Series([1,2,3],index=[‘a‘,‘b‘,‘d‘])
s = s1+s2
a    2.0
b    4.0
c    NaN
d    NaN
dtype: float64

可以使用pd.isnull()，pd.notnull()，或s.isnull(),notnull()函数检测缺失数据
s.isnull()
a    False
b    False
c     True
d     True
dtype: bool
s.notnull()
a     True
b     True
c    False
d    False
dtype: bool
s[s.notnull()]#过滤掉空的数据
a    2.0
b    4.0
dtype: float64

Series的运算

1) + - * /
(3) Series之间的运算

在运算中自动对齐不同索引的数据
如果索引不对应，则补NaN
s1 =  Series([1,2,31,2],index=["a","d","s","r"])
s2 = Series([11,2,2,3],index=["a","d","s","b"])
s = s1+s2
a    12.0
b     NaN
d     4.0
r     NaN
s    33.0
dtype: float64

jupter nootbok 快捷键、NumPy模块初识

原文：https://www.cnblogs.com/yidashi110/p/10388297.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)