I can’t believe it has been nearly 6 months since I last posted. Given the sustained heat it seemed like a good idea to finish off this subject.
As hinted at in my last post, temperature is the missing
variable to make sense of Residential electrical usage. Fortunately there are some reasonable freely
available historical weather databases, most notably the one provided by
NOAA. As covered in the last posting, we
can download this data for a particular location, in this case Chicago O’Hare
airport, KORD.
Next we have to find a suitable model for the usage pattern
we observed in the last posting. Usage
is lowest at around 70F, and rises slightly as temperatures fall but rises
significantly as temperatures rise.
Operationally we can imagine that as temperatures rise beyond what is
comfortable, more and more cooling devices turn on to combat the heat. However, eventually all the air conditioning
equipment is on at full capacity and incremental demand drops off. Based on this, a logistic function [http://en.wikipedia.org/wiki/Logistic_function]
seems appropriate.
Furthermore, depending on the time of day, the high
temperature and the low temperature may both be influential predictors.
Continuing from the code presented in the last posting, we
can fit a logistic function in temperature to the usage data for each hour of
the day, taking into account whether it is a weekday, the time of the year and
the temperature function:
allResults=NULL
# create the variable for results for each hour
startList=list(intercept=1.5,weekday=-.02,Time=0,S1=.04,C1=-.03,S2=-.02,
C2=-.04,S3=.001,C3=.01,maxT=87,maxS=4,maxScalar=1,
maxSlope=0,minSlope=0)
# initial variable value
for (hourInd in 1:24){ # iterate through hours; create a model for each hour
test2<-nls(ComEd[,(hourInd+1)] ~ intercept
+ timcols%*%c(weekday,Time,S1,C1,S2,C2,S3,C3)
+
maxSlope*KORDtemps[dateInd,"MAX"]
+
minSlope*KORDtemps[dateInd,"MIN"]
+
maxScalar*plogis(KORDtemps[dateInd,"MAX"],
location = maxT, scale =
maxS),
start=startList,
control=nls.control(maxiter = 500, tol = 1e-05,
minFactor = 1/10000,printEval =
FALSE, warnOnly = TRUE))
# using
nonlinear least squares to find the best result
allResults=rbind(allResults,coef(test2))
# combine all results
startList=list(intercept=coef(test2)[1],weekday=coef(test2)[2],
Time=coef(test2)[3],S1=coef(test2)[4],
C1=coef(test2)[5],S2=coef(test2)[6],C2=coef(test2)[7],
S3=coef(test2)[8],C3=coef(test2)[9],maxT=coef(test2)[10],
maxS=coef(test2)[11],maxScalar=coef(test2)[12],
maxSlope=coef(test2)[13],minSlope=coef(test2)[14])
# update
starting value for next iteration to solution from previous hour
# although
not always necessary this reduces computational time
}
Then we can examine the regression results to see the
average response function in usage based on Max temperature:
maxResponse = mean(allResults[,'maxSlope']) * KORDtemps[dateInd,"MAX"]
+ mean(allResults[,'maxScalar'])*plogis(KORDtemps[dateInd,"MAX"],
location = mean(allResults[,'maxT']),
scale = mean(allResults[,'maxS']),
log = FALSE)
plot(KORDtemps[dateInd,"MAX"],maxResponse)
We can see that the fitted data (in black) overlay the actual data (in red) well
but there are still differences for very hot days.
So to summarize we can model residential power usage pretty accurately to first order based on knowledge of the high temperature for the day as well as the time of the year, the day of the week and the hour of the day. As expected, as temperatures rise, usage grows dramatically over a high temperature of 80F.
Stay cool!