首页 > 其他 > 详细

翻译——1_Project Overview, Data Wrangling and Exploratory Analysis-checkpoint

时间:2020-02-04 17:06:50      阅读:99      评论:0      收藏:0      [点我收藏+]

为提高提高大学能源效率进行建筑能源需求预测

本文翻译哈佛大学的能源分析和预测报告,这是原文

暂无数据源,个人认为学习分析方法就足够

内容:

  1. 项目概述
  2. 了解数据
  3. 探索性分析
  4. 使用不同的机器学习方法进行预测
  5. 总结
  6. 结论
  7. 讨论

1. 项目概述

用机器学习来进行能源预测,希望能够节约能源

2. 了解数据

有三种类型的能源消耗,电力,冷水和热水。图显示了哈佛工厂供应的冷水和热水的建筑物。

技术分享图片

图:哈佛的冷水和热水供应。(左:冷水,用蓝色标出。右边:热水,用黄色突出显示。)

我们选择了一栋建筑,并获得了2011年7月1日至2014年10月31日的能耗数据。由于仪表故障,有几个月的数据丢失。数据分辨率是每小时一次。在原始数据中,每小时的数据是仪表读数。为了得到每小时的消耗,我们需要抵消数据然后减去。我们有2012年1月到2014年10月的每小时天气和能源数据(2.75年)。天气数据来自剑桥气象站。

在本节中,我们将完成以下任务。

。手动从哈佛能源见证网站下载原始数据,获取每小时的电力、冷水和热水。

。干净的天气数据,增加了更多的功能,包括冷度,热度和湿度比。

。根据假期、学年和周末估算每日入住率。

。创建与小时相关的特性,即cos(hourOfDay * 2 * pi / 24)。

。合并电力、冷水和热水数据流与天气、时间和占用功能。

%matplotlib inline 

import requests 
from StringIO import StringIO
import numpy as np
import pandas as pd # pandas
import matplotlib.pyplot as plt # module for plotting 
import datetime as dt # module for manipulating dates and times
import numpy.linalg as lin # 执行线性代数运算的模块
from __future__ import division
from math import log10,exp

pd.options.display.mpl_style = 'default'

原始能耗数据

原始数据从哈佛能源见证网站下载

技术分享图片

技术分享图片

然后我们用Pandas 把它们放在一个dataframe里。

file = 'Data/Org/0701-0930-2011.xls' 
df = pd.read_excel(file, header = 0, skiprows = np.arange(0,6))

files = ['Data/Org/1101-1130-2011.xls', 
         'Data/Org/1201-2011-0131-2012.xls',
         'Data/Org/0201-0331-2012.xls',
         'Data/Org/0401-0531-2012.xls',
         'Data/Org/0101-0228-2013.xls',
         'Data/Org/0301-0430-2013.xls',
         'Data/Org/0501-0630-2013.xls',
         'Data/Org/0701-0831-2013.xls',
         'Data/Org/0901-1031-2013.xls',
         'Data/Org/1101-1231-2013.xls',
         'Data/Org/0101-0228-2014.xls',
         'Data/Org/0301-0430-2014.xls', 
         'Data/Org/0501-0630-2014.xls', 
         'Data/Org/0701-0831-2014.xls',
         'Data/Org/0901-1031-2014.xls']

for file in files:
    data = pd.read_excel(file, header = 0, skiprows = np.arange(0,6))
    df = df.append(data)

df.head()
WARNING *** file size (2481102) not 512 + multiple of sector size (512)
WARNING *** file size (848833) not 512 + multiple of sector size (512)
WARNING *** file size (1694257) not 512 + multiple of sector size (512)
WARNING *** file size (1640459) not 512 + multiple of sector size (512)
WARNING *** file size (1667907) not 512 + multiple of sector size (512)
WARNING *** file size (847258) not 512 + multiple of sector size (512)
WARNING *** file size (1691449) not 512 + multiple of sector size (512)
WARNING *** file size (1666647) not 512 + multiple of sector size (512)
WARNING *** file size (1665736) not 512 + multiple of sector size (512)
WARNING *** file size (1614814) not 512 + multiple of sector size (512)
WARNING *** file size (1665980) not 512 + multiple of sector size (512)
WARNING *** file size (1667276) not 512 + multiple of sector size (512)
WARNING *** file size (1691736) not 512 + multiple of sector size (512)
WARNING *** file size (1666704) not 512 + multiple of sector size (512)
WARNING *** file size (1665920) not 512 + multiple of sector size (512)
WARNING *** file size (1614900) not 512 + multiple of sector size (512)
WARNING *** file size (1666228) not 512 + multiple of sector size (512)
WARNING *** file size (1666191) not 512 + multiple of sector size (512)
WARNING *** file size (1691845) not 512 + multiple of sector size (512)
WARNING *** file size (1663846) not 512 + multiple of sector size (512)
Unnamed: 0 Unnamed: 1 Gund Bus-A 15 Min Block Demand - kW Gund Bus-A CurrentA - Amps Unnamed: 4 Unnamed: 5 Gund Bus-A CurrentB - Amps Unnamed: 7 Gund Bus-A CurrentC - Amps Unnamed: 9 ... Gund Main Demand - Tons Gund Main Energy - Ton-Days Gund Main FlowRate - gpm Gund Main FlowTotal - kgal(1000) Gund Main SignalAeration - Count Gund Main SignalStrength - Count Gund Main SonicVelocity - Ft/Sec Gund Main TempDelta - Deg F Gund Main TempReturn - Deg F Gund Main TempSupply - Deg F
0 2011-07-01 01:00:00 White 48.458733 65.977882 NaN NaN 52.631417 NaN 55.603840 NaN ... 4.677294 17912.537804 6.916454 48168.083414 0.693405 57.208127 1437.640543 16.238684 59.757447 43.516103
1 2011-07-01 02:00:00 White 40.472697 57.230223 NaN NaN 42.483092 NaN 50.243230 NaN ... 4.586403 17912.853518 6.739337 48168.645429 0.567355 57.082909 1438.030719 16.263573 59.710199 43.495128
2 2011-07-01 03:00:00 #d2e4b0 39.472809 55.487443 NaN NaN 41.911784 NaN 48.482163 NaN ... 4.462877 17913.169232 6.725142 48169.207444 0.441304 57.001646 1439.111130 15.797043 59.248158 43.457344
3 2011-07-01 04:00:00 White 39.198879 55.849806 NaN NaN 41.525529 NaN 48.987457 NaN ... 4.696993 17913.484946 7.041330 48169.769458 0.315254 57.000000 1440.768604 15.947392 59.207097 43.267682
4 2011-07-01 05:00:00 White 39.297522 55.736219 NaN NaN 41.299381 NaN 48.710408 NaN ... 4.550372 17913.800660 6.863004 48170.331473 0.189204 57.000000 1442.426077 15.903679 59.282707 43.372615

5 rows × 55 columns

以上是原始的每小时数据。

正如你所看到的,它很乱。首先要删除没有意义的列。

df.rename(columns={'Unnamed: 0':'Datetime'}, inplace=True)
nonBlankColumns = ['Unnamed' not in s for s in df.columns]
columns = df.columns[nonBlankColumns]
df = df[columns]
df = df.set_index(['Datetime'])
df.index.name = None
df.head()
Gund Bus-A 15 Min Block Demand - kW Gund Bus-A CurrentA - Amps Gund Bus-A CurrentB - Amps Gund Bus-A CurrentC - Amps Gund Bus-A CurrentN - Amps Gund Bus-A EnergyReal - kWhr Gund Bus-A Freq - Hertz Gund Bus-A Max Monthly Demand - kW Gund Bus-A PowerApp - kVA Gund Bus-A PowerReac - kVAR ... Gund Main Demand - Tons Gund Main Energy - Ton-Days Gund Main FlowRate - gpm Gund Main FlowTotal - kgal(1000) Gund Main SignalAeration - Count Gund Main SignalStrength - Count Gund Main SonicVelocity - Ft/Sec Gund Main TempDelta - Deg F Gund Main TempReturn - Deg F Gund Main TempSupply - Deg F
2011-07-01 01:00:00 48.458733 65.977882 52.631417 55.603840 15.982278 1796757.502803 59.837524 96.117915 48.757073 12.344712 ... 4.677294 17912.537804 6.916454 48168.083414 0.693405 57.208127 1437.640543 16.238684 59.757447 43.516103
2011-07-01 02:00:00 40.472697 57.230223 42.483092 50.243230 13.423762 1796800.145991 60.005569 96.117915 42.238685 12.967984 ... 4.586403 17912.853518 6.739337 48168.645429 0.567355 57.082909 1438.030719 16.263573 59.710199 43.495128
2011-07-01 03:00:00 39.472809 55.487443 41.911784 48.482163 13.478933 1796840.146023 59.833880 96.117915 41.278573 12.732046 ... 4.462877 17913.169232 6.725142 48169.207444 0.441304 57.001646 1439.111130 15.797043 59.248158 43.457344
2011-07-01 04:00:00 39.198879 55.849806 41.525529 48.987457 13.603309 1796879.023607 59.673044 96.117915 41.345776 12.687845 ... 4.696993 17913.484946 7.041330 48169.769458 0.315254 57.000000 1440.768604 15.947392 59.207097 43.267682
2011-07-01 05:00:00 39.297522 55.736219 41.299381 48.710408 13.797331 1796918.273558 59.986672 96.117915 41.166736 12.437842 ... 4.550372 17913.800660 6.863004 48170.331473 0.189204 57.000000 1442.426077 15.903679 59.282707 43.372615

5 rows × 48 columns

然后我们打印出所有的列名。只有几根柱子可用来获得每小时的电力、冷水和热水。

for item in df.columns:
    print item
Gund Bus-A 15 Min Block Demand - kW
Gund Bus-A CurrentA - Amps
Gund Bus-A CurrentB - Amps
Gund Bus-A CurrentC - Amps
Gund Bus-A CurrentN - Amps
Gund Bus-A EnergyReal - kWhr
Gund Bus-A Freq - Hertz
Gund Bus-A Max Monthly Demand - kW
Gund Bus-A PowerApp - kVA
Gund Bus-A PowerReac - kVAR
Gund Bus-A PowerReal - kW
Gund Bus-A TruePF - PF
Gund Bus-A VoltageAB - Volts
Gund Bus-A VoltageAN - Volts
Gund Bus-A VoltageBC - Volts
Gund Bus-A VoltageBN - Volts
Gund Bus-A VoltageCA - Volts
Gund Bus-A VoltageCN - Volts
Gund Bus-B 15 Min Block Demand - kW
Gund Bus-B CurrentA - Amps
Gund Bus-B CurrentB - Amps
Gund Bus-B CurrentC - Amps
Gund Bus-B CurrentN - Amps
Gund Bus-B EnergyReal - kWhr
Gund Bus-B Freq - Hertz
Gund Bus-B Max Monthly Demand - kW
Gund Bus-B PowerApp - kVA
Gund Bus-B PowerReac - kVAR
Gund Bus-B PowerReal - kW
Gund Bus-B TruePF - PF
Gund Bus-B VoltageAB - Volts
Gund Bus-B VoltageAN - Volts
Gund Bus-B VoltageBC - Volts
Gund Bus-B VoltageBN - Volts
Gund Bus-B VoltageCA - Volts
Gund Bus-B VoltageCN - Volts
Gund Condensate Counter - count
Gund Condensate FlowTotal - LBS
Gund Main Demand - Tons
Gund Main Energy - Ton-Days
Gund Main FlowRate - gpm
Gund Main FlowTotal - kgal(1000)
Gund Main SignalAeration - Count
Gund Main SignalStrength - Count
Gund Main SonicVelocity - Ft/Sec
Gund Main TempDelta - Deg F
Gund Main TempReturn - Deg F
Gund Main TempSupply - Deg F

电力

以电力为例,“Gund Bus A”和“Gund Bus B”。“EnergyReal - kWhr”记录累计消耗量。我们不确定什么是“PowerReal”。为了以防万一,我们也把它放进了电日计。

electricity=df[['Gund Bus-A EnergyReal - kWhr','Gund Bus-B EnergyReal - kWhr',
                'Gund Bus-A PowerReal - kW','Gund Bus-B PowerReal - kW',]]
electricity.head()
Gund Bus-A EnergyReal - kWhr Gund Bus-B EnergyReal - kWhr Gund Bus-A PowerReal - kW Gund Bus-B PowerReal - kW
2011-07-01 01:00:00 1796757.502803 3657811.582122 47.184015 63.486186
2011-07-01 02:00:00 1796800.145991 3657873.464938 40.208796 61.270542
2011-07-01 03:00:00 1796840.146023 3657934.837505 39.209866 61.464394
2011-07-01 04:00:00 1796879.023607 3657995.470348 39.378507 59.396581
2011-07-01 05:00:00 1796918.273558 3658054.470285 39.240837 58.911729

通过检查每月的能耗来验证我们的数据处理方法

为了检验我们对数据的理解是否正确,我们想从每小时的数据中计算出每个月的用电量,然后将结果与facalities提供的每个月的数据进行比较,这些数据也可以在Energy Witness上找到。

以下是facalities提供的月度数据,"Bus A & B"以月度形式称为"CE603B kWh"和"CE604B kWh"。请注意,查表周期不是公历月份。

file = 'Data/monthly electricity.csv' 
monthlyElectricityFromFacility = pd.read_csv(file, header=0)
monthlyElectricityFromFacility
monthlyElectricityFromFacility = monthlyElectricityFromFacility.set_index(['month'])
monthlyElectricityFromFacility.head()
startDate endDate CE603B kWh CE604B kWh
month
Jul 11 6/16/11 7/18/11 43968.1 106307.1
Aug 11 7/18/11 8/17/11 41270.1 83121.1
Sep 11 8/17/11 9/16/11 51514.1 107083.1
Oct 11 9/16/11 10/18/11 65338.1 114350.1
Nov 11 10/18/11 11/17/11 65453.1 115318.1

我们用“EnergyReal - kWhr”柱表示两米。我们计算了查表周期的开始日期和结束日期的数字,用结束日期的数字减去开始日期的数字,就得到了每月的电量消耗。

monthlyElectricityFromFacility['startDate'] = pd.to_datetime(monthlyElectricityFromFacility['startDate'], format="%m/%d/%y")
values = monthlyElectricityFromFacility.index.values

keys = np.array(monthlyElectricityFromFacility['startDate'])

dates = {}
for key, value in zip(keys, values):
    dates[key] = value

sortedDates =  np.sort(dates.keys())
sortedDates = sortedDates[sortedDates > np.datetime64('2011-11-01')]

months = []
monthlyElectricityOrg = np.zeros((len(sortedDates) - 1, 2))
for i in range(len(sortedDates) - 1):
    begin = sortedDates[i]
    end = sortedDates[i+1]
    months.append(dates[sortedDates[i]])
    monthlyElectricityOrg[i, 0] = (np.round(electricity.loc[end,'Gund Bus-A EnergyReal - kWhr'] 
                           -  electricity.loc[begin,'Gund Bus-A EnergyReal - kWhr'], 1))
    monthlyElectricityOrg[i, 1] = (np.round(electricity.loc[end,'Gund Bus-B EnergyReal - kWhr'] 
                           -  electricity.loc[begin,'Gund Bus-B EnergyReal - kWhr'], 1))

monthlyElectricity = pd.DataFrame(data = monthlyElectricityOrg, index = months, columns = ['CE603B kWh', 'CE604B kWh'])


plt.figure()
fig, ax = plt.subplots()
fig = monthlyElectricity.plot(marker = 'o', figsize=(15,6), rot = 40, fontsize = 13, ax = ax, linestyle='')
fig.set_axis_bgcolor('w')
plt.xlabel('Billing month', fontsize = 15)
plt.ylabel('kWh', fontsize = 15)
plt.tick_params(which=u'major', reset=False, axis = 'y', labelsize = 13)
plt.xticks(np.arange(0,len(months)),months)
plt.title('Original monthly consumption from hourly data',fontsize = 17)

text = 'Meter malfunction'
ax.annotate(text, xy = (9, 4500000), 
            xytext = (5, 2), fontsize = 15,
            textcoords = 'offset points', ha = 'center', va = 'top')

ax.annotate(text, xy = (8, -4500000), 
            xytext = (5, 2), fontsize = 15, 
            textcoords = 'offset points', ha = 'center', va = 'bottom')

ax.annotate(text, xy = (14, -2500000), 
            xytext = (5, 2), fontsize = 15, 
            textcoords = 'offset points', ha = 'center', va = 'bottom')

ax.annotate(text, xy = (15, 2500000), 
            xytext = (5, 2), fontsize = 15, 
            textcoords = 'offset points', ha = 'center', va = 'top')

plt.show()

技术分享图片

翻译——1_Project Overview, Data Wrangling and Exploratory Analysis-checkpoint

原文:https://www.cnblogs.com/wwj99/p/12260052.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!