首页 > 其他 > 详细

sklearn 数据预处理1: StandardScaler

时间:2018-07-18 18:41:40      阅读:157      评论:0      收藏:0      [点我收藏+]
转载自:https://blog.csdn.net/u012609509/article/details/78554709

StandardScaler

作用:去均值和方差归一化。且是针对每一个特征维度来做的,而不是针对样本。 StandardScaler对每列分别标准化,因为shape of data: [n_samples, n_features]
【注:】
并不是所有的标准化都能给estimator带来好处。
“Standardization of a dataset is a common requirement for many machine learning estimators: they might behave badly if the individual feature do not more or less look like standard normally distributed data (e.g. Gaussian with 0 mean and unit variance).”

实例代码

# coding=utf-8
# 统计训练集的 mean 和 std 信息
from sklearn.preprocessing import StandardScaler
import numpy as np


def test_algorithm():
    np.random.seed(123)
    print(‘use sklearn‘)
    # 注:shape of data: [n_samples, n_features]
    data = np.random.randn(10, 4)
    scaler = StandardScaler()
    scaler.fit(data)
    trans_data = scaler.transform(data)
    print(‘original data: ‘)
    print data
    print(‘transformed data: ‘)
    print trans_data
    print(‘scaler info: scaler.mean_: {}, scaler.var_: {}‘.format(scaler.mean_, scaler.var_))
    print(‘\n‘)

    print(‘use numpy by self‘)
    mean = np.mean(data, axis=0)
    std = np.std(data, axis=0)
    var = std * std
    print(‘mean: {}, std: {}, var: {}‘.format(mean, std, var))
    # numpy 的广播功能
    another_trans_data = data - mean
    # 注:是除以标准差
    another_trans_data = another_trans_data / std
    print(‘another_trans_data: ‘)
    print another_trans_data


if __name__ == ‘__main__‘:
    test_algorithm()
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37

程序的输出如下:

use sklearn
    original data:
    [[-1.0856306   0.99734545  0.2829785 - 1.50629471]
     [-0.57860025  1.65143654 - 2.42667924 - 0.42891263]
    [1.26593626 - 0.8667404 - 0.67888615 - 0.09470897]
    [1.49138963 - 0.638902 - 0.44398196 - 0.43435128]
    [2.20593008
    2.18678609
    1.0040539
    0.3861864]
    [0.73736858  1.49073203 - 0.93583387  1.17582904]
    [-1.25388067 - 0.6377515
    0.9071052 - 1.4286807]
    [-0.14006872 - 0.8617549 - 0.25561937 - 2.79858911]
    [-1.7715331 - 0.69987723
    0.92746243 - 0.17363568]
    [0.00284592  0.68822271 - 0.87953634  0.28362732]]
    transformed
    data:
    [[-0.94511643  0.58665507  0.5223171 - 0.93064483]
     [-0.53659117  1.16247784 - 2.13366794  0.06768082]
    [0.9495916 - 1.05437488 - 0.42049501
    0.3773612]
    [1.13124423 - 0.85379954 - 0.19024378  0.06264126]
    [1.70696485
    1.63376764
    1.22910949
    0.8229693]
    [0.52371324  1.02100318 - 0.67235312  1.55466934]
    [-1.08067913 - 0.85278672
    1.13408114 - 0.858726]
    [-0.18325687 - 1.04998594 - 0.00561227 - 2.1281129]
    [-1.49776284 - 0.9074785
    1.15403514
    0.30422599]
    [-0.06810748  0.31452186 - 0.61717074  0.72793583]]
    scaler info: scaler.mean_: [0.08737571  0.33094968 - 0.24989369 - 0.50195303], scaler.var_: [1.54038781  1.29032409
                                                                                          1.04082479  1.16464894]

    use numpy by self
    mean: [0.08737571  0.33094968 - 0.24989369 - 0.50195303], std: [1.24112361  1.13592433  1.02020821
                                                                    1.07918902], var: [1.54038781  1.29032409
                                                                                       1.04082479  1.16464894]
    another_trans_data:
    [[-0.94511643  0.58665507  0.5223171 - 0.93064483]
     [-0.53659117  1.16247784 - 2.13366794  0.06768082]
    [0.9495916 - 1.05437488 - 0.42049501
    0.3773612]
    [1.13124423 - 0.85379954 - 0.19024378  0.06264126]
    [1.70696485
    1.63376764
    1.22910949
    0.8229693]
    [0.52371324  1.02100318 - 0.67235312  1.55466934]
    [-1.08067913 - 0.85278672
    1.13408114 - 0.858726]
    [-0.18325687 - 1.04998594 - 0.00561227 - 2.1281129]
    [-1.49776284 - 0.9074785
    1.15403514
    0.30422599]
    [-0.06810748  0.31452186 - 0.61717074  0.72793583]]
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61

参考网址

sklearn 数据预处理1: StandardScaler

原文:https://www.cnblogs.com/super-saiyan-blue/p/9330833.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!