本文翻译哈佛大学的能源分析和预测报告,这是原文
暂无数据源,个人认为学习分析方法就足够
内容:
用机器学习来进行能源预测,希望能够节约能源
有三种类型的能源消耗,电力,冷水和热水。图显示了哈佛工厂供应的冷水和热水的建筑物。
图:哈佛的冷水和热水供应。(左:冷水,用蓝色标出。右边:热水,用黄色突出显示。)
我们选择了一栋建筑,并获得了2011年7月1日至2014年10月31日的能耗数据。由于仪表故障,有几个月的数据丢失。数据分辨率是每小时一次。在原始数据中,每小时的数据是仪表读数。为了得到每小时的消耗,我们需要抵消数据然后减去。我们有2012年1月到2014年10月的每小时天气和能源数据(2.75年)。天气数据来自剑桥气象站。
在本节中,我们将完成以下任务。
。手动从哈佛能源见证网站下载原始数据,获取每小时的电力、冷水和热水。
。干净的天气数据,增加了更多的功能,包括冷度,热度和湿度比。
。根据假期、学年和周末估算每日入住率。
。创建与小时相关的特性,即cos(hourOfDay * 2 * pi / 24)。
。合并电力、冷水和热水数据流与天气、时间和占用功能。
%matplotlib inline
import requests
from StringIO import StringIO
import numpy as np
import pandas as pd # pandas
import matplotlib.pyplot as plt # module for plotting
import datetime as dt # module for manipulating dates and times
import numpy.linalg as lin # 执行线性代数运算的模块
from __future__ import division
from math import log10,exp
pd.options.display.mpl_style = 'default'
原始数据从哈佛能源见证网站下载
然后我们用Pandas 把它们放在一个dataframe里。
file = 'Data/Org/0701-0930-2011.xls'
df = pd.read_excel(file, header = 0, skiprows = np.arange(0,6))
files = ['Data/Org/1101-1130-2011.xls',
'Data/Org/1201-2011-0131-2012.xls',
'Data/Org/0201-0331-2012.xls',
'Data/Org/0401-0531-2012.xls',
'Data/Org/0101-0228-2013.xls',
'Data/Org/0301-0430-2013.xls',
'Data/Org/0501-0630-2013.xls',
'Data/Org/0701-0831-2013.xls',
'Data/Org/0901-1031-2013.xls',
'Data/Org/1101-1231-2013.xls',
'Data/Org/0101-0228-2014.xls',
'Data/Org/0301-0430-2014.xls',
'Data/Org/0501-0630-2014.xls',
'Data/Org/0701-0831-2014.xls',
'Data/Org/0901-1031-2014.xls']
for file in files:
data = pd.read_excel(file, header = 0, skiprows = np.arange(0,6))
df = df.append(data)
df.head()
WARNING *** file size (2481102) not 512 + multiple of sector size (512)
WARNING *** file size (848833) not 512 + multiple of sector size (512)
WARNING *** file size (1694257) not 512 + multiple of sector size (512)
WARNING *** file size (1640459) not 512 + multiple of sector size (512)
WARNING *** file size (1667907) not 512 + multiple of sector size (512)
WARNING *** file size (847258) not 512 + multiple of sector size (512)
WARNING *** file size (1691449) not 512 + multiple of sector size (512)
WARNING *** file size (1666647) not 512 + multiple of sector size (512)
WARNING *** file size (1665736) not 512 + multiple of sector size (512)
WARNING *** file size (1614814) not 512 + multiple of sector size (512)
WARNING *** file size (1665980) not 512 + multiple of sector size (512)
WARNING *** file size (1667276) not 512 + multiple of sector size (512)
WARNING *** file size (1691736) not 512 + multiple of sector size (512)
WARNING *** file size (1666704) not 512 + multiple of sector size (512)
WARNING *** file size (1665920) not 512 + multiple of sector size (512)
WARNING *** file size (1614900) not 512 + multiple of sector size (512)
WARNING *** file size (1666228) not 512 + multiple of sector size (512)
WARNING *** file size (1666191) not 512 + multiple of sector size (512)
WARNING *** file size (1691845) not 512 + multiple of sector size (512)
WARNING *** file size (1663846) not 512 + multiple of sector size (512)
Unnamed: 0 | Unnamed: 1 | Gund Bus-A 15 Min Block Demand - kW | Gund Bus-A CurrentA - Amps | Unnamed: 4 | Unnamed: 5 | Gund Bus-A CurrentB - Amps | Unnamed: 7 | Gund Bus-A CurrentC - Amps | Unnamed: 9 | ... | Gund Main Demand - Tons | Gund Main Energy - Ton-Days | Gund Main FlowRate - gpm | Gund Main FlowTotal - kgal(1000) | Gund Main SignalAeration - Count | Gund Main SignalStrength - Count | Gund Main SonicVelocity - Ft/Sec | Gund Main TempDelta - Deg F | Gund Main TempReturn - Deg F | Gund Main TempSupply - Deg F | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2011-07-01 01:00:00 | White | 48.458733 | 65.977882 | NaN | NaN | 52.631417 | NaN | 55.603840 | NaN | ... | 4.677294 | 17912.537804 | 6.916454 | 48168.083414 | 0.693405 | 57.208127 | 1437.640543 | 16.238684 | 59.757447 | 43.516103 |
1 | 2011-07-01 02:00:00 | White | 40.472697 | 57.230223 | NaN | NaN | 42.483092 | NaN | 50.243230 | NaN | ... | 4.586403 | 17912.853518 | 6.739337 | 48168.645429 | 0.567355 | 57.082909 | 1438.030719 | 16.263573 | 59.710199 | 43.495128 |
2 | 2011-07-01 03:00:00 | #d2e4b0 | 39.472809 | 55.487443 | NaN | NaN | 41.911784 | NaN | 48.482163 | NaN | ... | 4.462877 | 17913.169232 | 6.725142 | 48169.207444 | 0.441304 | 57.001646 | 1439.111130 | 15.797043 | 59.248158 | 43.457344 |
3 | 2011-07-01 04:00:00 | White | 39.198879 | 55.849806 | NaN | NaN | 41.525529 | NaN | 48.987457 | NaN | ... | 4.696993 | 17913.484946 | 7.041330 | 48169.769458 | 0.315254 | 57.000000 | 1440.768604 | 15.947392 | 59.207097 | 43.267682 |
4 | 2011-07-01 05:00:00 | White | 39.297522 | 55.736219 | NaN | NaN | 41.299381 | NaN | 48.710408 | NaN | ... | 4.550372 | 17913.800660 | 6.863004 | 48170.331473 | 0.189204 | 57.000000 | 1442.426077 | 15.903679 | 59.282707 | 43.372615 |
5 rows × 55 columns
以上是原始的每小时数据。
正如你所看到的,它很乱。首先要删除没有意义的列。
df.rename(columns={'Unnamed: 0':'Datetime'}, inplace=True)
nonBlankColumns = ['Unnamed' not in s for s in df.columns]
columns = df.columns[nonBlankColumns]
df = df[columns]
df = df.set_index(['Datetime'])
df.index.name = None
df.head()
Gund Bus-A 15 Min Block Demand - kW | Gund Bus-A CurrentA - Amps | Gund Bus-A CurrentB - Amps | Gund Bus-A CurrentC - Amps | Gund Bus-A CurrentN - Amps | Gund Bus-A EnergyReal - kWhr | Gund Bus-A Freq - Hertz | Gund Bus-A Max Monthly Demand - kW | Gund Bus-A PowerApp - kVA | Gund Bus-A PowerReac - kVAR | ... | Gund Main Demand - Tons | Gund Main Energy - Ton-Days | Gund Main FlowRate - gpm | Gund Main FlowTotal - kgal(1000) | Gund Main SignalAeration - Count | Gund Main SignalStrength - Count | Gund Main SonicVelocity - Ft/Sec | Gund Main TempDelta - Deg F | Gund Main TempReturn - Deg F | Gund Main TempSupply - Deg F | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2011-07-01 01:00:00 | 48.458733 | 65.977882 | 52.631417 | 55.603840 | 15.982278 | 1796757.502803 | 59.837524 | 96.117915 | 48.757073 | 12.344712 | ... | 4.677294 | 17912.537804 | 6.916454 | 48168.083414 | 0.693405 | 57.208127 | 1437.640543 | 16.238684 | 59.757447 | 43.516103 |
2011-07-01 02:00:00 | 40.472697 | 57.230223 | 42.483092 | 50.243230 | 13.423762 | 1796800.145991 | 60.005569 | 96.117915 | 42.238685 | 12.967984 | ... | 4.586403 | 17912.853518 | 6.739337 | 48168.645429 | 0.567355 | 57.082909 | 1438.030719 | 16.263573 | 59.710199 | 43.495128 |
2011-07-01 03:00:00 | 39.472809 | 55.487443 | 41.911784 | 48.482163 | 13.478933 | 1796840.146023 | 59.833880 | 96.117915 | 41.278573 | 12.732046 | ... | 4.462877 | 17913.169232 | 6.725142 | 48169.207444 | 0.441304 | 57.001646 | 1439.111130 | 15.797043 | 59.248158 | 43.457344 |
2011-07-01 04:00:00 | 39.198879 | 55.849806 | 41.525529 | 48.987457 | 13.603309 | 1796879.023607 | 59.673044 | 96.117915 | 41.345776 | 12.687845 | ... | 4.696993 | 17913.484946 | 7.041330 | 48169.769458 | 0.315254 | 57.000000 | 1440.768604 | 15.947392 | 59.207097 | 43.267682 |
2011-07-01 05:00:00 | 39.297522 | 55.736219 | 41.299381 | 48.710408 | 13.797331 | 1796918.273558 | 59.986672 | 96.117915 | 41.166736 | 12.437842 | ... | 4.550372 | 17913.800660 | 6.863004 | 48170.331473 | 0.189204 | 57.000000 | 1442.426077 | 15.903679 | 59.282707 | 43.372615 |
5 rows × 48 columns
然后我们打印出所有的列名。只有几根柱子可用来获得每小时的电力、冷水和热水。
for item in df.columns:
print item
Gund Bus-A 15 Min Block Demand - kW
Gund Bus-A CurrentA - Amps
Gund Bus-A CurrentB - Amps
Gund Bus-A CurrentC - Amps
Gund Bus-A CurrentN - Amps
Gund Bus-A EnergyReal - kWhr
Gund Bus-A Freq - Hertz
Gund Bus-A Max Monthly Demand - kW
Gund Bus-A PowerApp - kVA
Gund Bus-A PowerReac - kVAR
Gund Bus-A PowerReal - kW
Gund Bus-A TruePF - PF
Gund Bus-A VoltageAB - Volts
Gund Bus-A VoltageAN - Volts
Gund Bus-A VoltageBC - Volts
Gund Bus-A VoltageBN - Volts
Gund Bus-A VoltageCA - Volts
Gund Bus-A VoltageCN - Volts
Gund Bus-B 15 Min Block Demand - kW
Gund Bus-B CurrentA - Amps
Gund Bus-B CurrentB - Amps
Gund Bus-B CurrentC - Amps
Gund Bus-B CurrentN - Amps
Gund Bus-B EnergyReal - kWhr
Gund Bus-B Freq - Hertz
Gund Bus-B Max Monthly Demand - kW
Gund Bus-B PowerApp - kVA
Gund Bus-B PowerReac - kVAR
Gund Bus-B PowerReal - kW
Gund Bus-B TruePF - PF
Gund Bus-B VoltageAB - Volts
Gund Bus-B VoltageAN - Volts
Gund Bus-B VoltageBC - Volts
Gund Bus-B VoltageBN - Volts
Gund Bus-B VoltageCA - Volts
Gund Bus-B VoltageCN - Volts
Gund Condensate Counter - count
Gund Condensate FlowTotal - LBS
Gund Main Demand - Tons
Gund Main Energy - Ton-Days
Gund Main FlowRate - gpm
Gund Main FlowTotal - kgal(1000)
Gund Main SignalAeration - Count
Gund Main SignalStrength - Count
Gund Main SonicVelocity - Ft/Sec
Gund Main TempDelta - Deg F
Gund Main TempReturn - Deg F
Gund Main TempSupply - Deg F
以电力为例,“Gund Bus A”和“Gund Bus B”。“EnergyReal - kWhr”记录累计消耗量。我们不确定什么是“PowerReal”。为了以防万一,我们也把它放进了电日计。
electricity=df[['Gund Bus-A EnergyReal - kWhr','Gund Bus-B EnergyReal - kWhr',
'Gund Bus-A PowerReal - kW','Gund Bus-B PowerReal - kW',]]
electricity.head()
Gund Bus-A EnergyReal - kWhr | Gund Bus-B EnergyReal - kWhr | Gund Bus-A PowerReal - kW | Gund Bus-B PowerReal - kW | |
---|---|---|---|---|
2011-07-01 01:00:00 | 1796757.502803 | 3657811.582122 | 47.184015 | 63.486186 |
2011-07-01 02:00:00 | 1796800.145991 | 3657873.464938 | 40.208796 | 61.270542 |
2011-07-01 03:00:00 | 1796840.146023 | 3657934.837505 | 39.209866 | 61.464394 |
2011-07-01 04:00:00 | 1796879.023607 | 3657995.470348 | 39.378507 | 59.396581 |
2011-07-01 05:00:00 | 1796918.273558 | 3658054.470285 | 39.240837 | 58.911729 |
为了检验我们对数据的理解是否正确,我们想从每小时的数据中计算出每个月的用电量,然后将结果与facalities提供的每个月的数据进行比较,这些数据也可以在Energy Witness上找到。
以下是facalities提供的月度数据,"Bus A & B"以月度形式称为"CE603B kWh"和"CE604B kWh"。请注意,查表周期不是公历月份。
file = 'Data/monthly electricity.csv'
monthlyElectricityFromFacility = pd.read_csv(file, header=0)
monthlyElectricityFromFacility
monthlyElectricityFromFacility = monthlyElectricityFromFacility.set_index(['month'])
monthlyElectricityFromFacility.head()
startDate | endDate | CE603B kWh | CE604B kWh | |
---|---|---|---|---|
month | ||||
Jul 11 | 6/16/11 | 7/18/11 | 43968.1 | 106307.1 |
Aug 11 | 7/18/11 | 8/17/11 | 41270.1 | 83121.1 |
Sep 11 | 8/17/11 | 9/16/11 | 51514.1 | 107083.1 |
Oct 11 | 9/16/11 | 10/18/11 | 65338.1 | 114350.1 |
Nov 11 | 10/18/11 | 11/17/11 | 65453.1 | 115318.1 |
我们用“EnergyReal - kWhr”柱表示两米。我们计算了查表周期的开始日期和结束日期的数字,用结束日期的数字减去开始日期的数字,就得到了每月的电量消耗。
monthlyElectricityFromFacility['startDate'] = pd.to_datetime(monthlyElectricityFromFacility['startDate'], format="%m/%d/%y")
values = monthlyElectricityFromFacility.index.values
keys = np.array(monthlyElectricityFromFacility['startDate'])
dates = {}
for key, value in zip(keys, values):
dates[key] = value
sortedDates = np.sort(dates.keys())
sortedDates = sortedDates[sortedDates > np.datetime64('2011-11-01')]
months = []
monthlyElectricityOrg = np.zeros((len(sortedDates) - 1, 2))
for i in range(len(sortedDates) - 1):
begin = sortedDates[i]
end = sortedDates[i+1]
months.append(dates[sortedDates[i]])
monthlyElectricityOrg[i, 0] = (np.round(electricity.loc[end,'Gund Bus-A EnergyReal - kWhr']
- electricity.loc[begin,'Gund Bus-A EnergyReal - kWhr'], 1))
monthlyElectricityOrg[i, 1] = (np.round(electricity.loc[end,'Gund Bus-B EnergyReal - kWhr']
- electricity.loc[begin,'Gund Bus-B EnergyReal - kWhr'], 1))
monthlyElectricity = pd.DataFrame(data = monthlyElectricityOrg, index = months, columns = ['CE603B kWh', 'CE604B kWh'])
plt.figure()
fig, ax = plt.subplots()
fig = monthlyElectricity.plot(marker = 'o', figsize=(15,6), rot = 40, fontsize = 13, ax = ax, linestyle='')
fig.set_axis_bgcolor('w')
plt.xlabel('Billing month', fontsize = 15)
plt.ylabel('kWh', fontsize = 15)
plt.tick_params(which=u'major', reset=False, axis = 'y', labelsize = 13)
plt.xticks(np.arange(0,len(months)),months)
plt.title('Original monthly consumption from hourly data',fontsize = 17)
text = 'Meter malfunction'
ax.annotate(text, xy = (9, 4500000),
xytext = (5, 2), fontsize = 15,
textcoords = 'offset points', ha = 'center', va = 'top')
ax.annotate(text, xy = (8, -4500000),
xytext = (5, 2), fontsize = 15,
textcoords = 'offset points', ha = 'center', va = 'bottom')
ax.annotate(text, xy = (14, -2500000),
xytext = (5, 2), fontsize = 15,
textcoords = 'offset points', ha = 'center', va = 'bottom')
ax.annotate(text, xy = (15, 2500000),
xytext = (5, 2), fontsize = 15,
textcoords = 'offset points', ha = 'center', va = 'top')
plt.show()
翻译——1_Project Overview, Data Wrangling and Exploratory Analysis-checkpoint
原文:https://www.cnblogs.com/wwj99/p/12260052.html