After more than a decade of consistent losses from rolling a long NG position, the cumulative return has been positive for the past 12 months. This has only occurred briefly during the 'polar vortex' of early 2014 and during the 2008 commodities 'super cycle' peak.
For years, long only positions in NG have experienced losses on average in the last month before rolling the position. The traditional explanation for this has been an oversupplied market relative to storage capacity. Producers sell into the short term market; storage owners earn a 'negative convenience yield' for storing this into the future.
Fracking costs have increased, partly driven by input costs Fracking PPI and increasing interest rates. Industrial demand has recovered according to the EIA.
It seems likely that the natural gas market is in transition towards an equilibrium in which there is no longer an excess of supply being stored into the future relative to demand.
Could this winter be the first test of this phenomenon? Will we see significant backwardation in the curve as convenience yield accrues to holders of the physical commodity?
Commodity Stat Arb
Monday, November 5, 2018
Monday, July 8, 2013
Modeling Residential Electricity Usage with R – Part 2
I can’t believe it has been nearly 6 months since I last posted. Given the sustained heat it seemed like a good idea to finish off this subject.
As hinted at in my last post, temperature is the missing
variable to make sense of Residential electrical usage. Fortunately there are some reasonable freely
available historical weather databases, most notably the one provided by
NOAA. As covered in the last posting, we
can download this data for a particular location, in this case Chicago O’Hare
airport, KORD.
Next we have to find a suitable model for the usage pattern
we observed in the last posting. Usage
is lowest at around 70F, and rises slightly as temperatures fall but rises
significantly as temperatures rise.
Operationally we can imagine that as temperatures rise beyond what is
comfortable, more and more cooling devices turn on to combat the heat. However, eventually all the air conditioning
equipment is on at full capacity and incremental demand drops off. Based on this, a logistic function [http://en.wikipedia.org/wiki/Logistic_function]
seems appropriate.
Furthermore, depending on the time of day, the high
temperature and the low temperature may both be influential predictors.
Continuing from the code presented in the last posting, we
can fit a logistic function in temperature to the usage data for each hour of
the day, taking into account whether it is a weekday, the time of the year and
the temperature function:
allResults=NULL
# create the variable for results for each hour
startList=list(intercept=1.5,weekday=-.02,Time=0,S1=.04,C1=-.03,S2=-.02,
C2=-.04,S3=.001,C3=.01,maxT=87,maxS=4,maxScalar=1,
maxSlope=0,minSlope=0)
# initial variable value
for (hourInd in 1:24){ # iterate through hours; create a model for each hour
test2<-nls(ComEd[,(hourInd+1)] ~ intercept
+ timcols%*%c(weekday,Time,S1,C1,S2,C2,S3,C3)
+
maxSlope*KORDtemps[dateInd,"MAX"]
+
minSlope*KORDtemps[dateInd,"MIN"]
+
maxScalar*plogis(KORDtemps[dateInd,"MAX"],
location = maxT, scale =
maxS),
start=startList,
control=nls.control(maxiter = 500, tol = 1e-05,
minFactor = 1/10000,printEval =
FALSE, warnOnly = TRUE))
# using
nonlinear least squares to find the best result
allResults=rbind(allResults,coef(test2))
# combine all results
startList=list(intercept=coef(test2)[1],weekday=coef(test2)[2],
Time=coef(test2)[3],S1=coef(test2)[4],
C1=coef(test2)[5],S2=coef(test2)[6],C2=coef(test2)[7],
S3=coef(test2)[8],C3=coef(test2)[9],maxT=coef(test2)[10],
maxS=coef(test2)[11],maxScalar=coef(test2)[12],
maxSlope=coef(test2)[13],minSlope=coef(test2)[14])
# update
starting value for next iteration to solution from previous hour
# although
not always necessary this reduces computational time
}
Then we can examine the regression results to see the
average response function in usage based on Max temperature:
maxResponse = mean(allResults[,'maxSlope']) * KORDtemps[dateInd,"MAX"]
+ mean(allResults[,'maxScalar'])*plogis(KORDtemps[dateInd,"MAX"],
location = mean(allResults[,'maxT']),
scale = mean(allResults[,'maxS']),
log = FALSE)
plot(KORDtemps[dateInd,"MAX"],maxResponse)
We can see that the fitted data (in black) overlay the actual data (in red) well
but there are still differences for very hot days.
So to summarize we can model residential power usage pretty accurately to first order based on knowledge of the high temperature for the day as well as the time of the year, the day of the week and the hour of the day. As expected, as temperatures rise, usage grows dramatically over a high temperature of 80F.
Stay cool!
Wednesday, January 30, 2013
Modeling Residential Electricity Usage with R
Wow, I can’t believe it has been 11 months since my last
blog posting! The next series of
postings will be related to the retail energy field. Residential power usage is satisfying to
model as it can be forecast fairly accurately with the right inputs. Partly as a consequence of deregulation there
is now more data more available than before.
As in prior postings I will use reproducible R code each step of the
way.
For this posting I will be using data from Commonwealth
Edison [ComEd] in Chicago, IL. ComEd makes
available historical usage data for different rate classes. In this example I use rate class C23.
First we must download the data and fix column and header
names:
library(xlsx)
library(downloader)
# load historical electric usage data from ComEd website
download('https://www.comed.com/Documents/customer-service/rates-pricing/retail-electricity-metering/Historical-Load-Profiles.xls','ComEd.xls',mode='wb')
ComEd<-read.xlsx(file='ComEd.xls',sheetName='C23')
# edit row and column names
dimnames(ComEd)[2][[1]]<-{c('Date',1:24)}
dimnames(ComEd)[1][[1]]<-substr(ComEd[,1],5,15)
ComEd[,1]<-as.Date(substr(ComEd[,1],5,15),'%m/%d/%Y')
Next we hypothesize some explanatory variables. Presumably electricity usage is influenced by
day of the week, time of the year and the passage of time. Therefore we construct a set of explanatory
variables and use them to predict usage for a particular hour of the day [16] as
follows:
# construct time related explanatory variables
times<-as.numeric(ComEd[,1]-ComEd[1,1])/365.25
weekdayInd<-!(weekdays(ComEd[,1])=="Saturday"|weekdays(ComEd[,1])=="Sunday")
timcols<-cbind(weekday=weekdayInd,
Time=times,S1=sin(2*pi*times),C1=cos(2*pi*times),S2=sin(4*pi*times),C2=cos(4*pi*times),S3=sin(6*pi*times),C3=cos(6*pi*times))
lm1<-lm(ComEd[,16]~timcols)
summary(lm1)
While all the input variables are highly predictive,
something seems to be missing in overall predictive accuracy:
Call:
lm(formula = ComEd[, 16] ~ timcols)
Residuals:
Min 1Q
Median 3Q Max
-1.11876 -0.16639 -0.01833
0.12886 1.64360
Coefficients:
Estimate Std.
Error t value Pr(>|t|)
(Intercept) 1.30253 0.02577
50.545 < 2e-16 ***
timcolsweekday -0.08813
0.02226 -3.959 7.93e-05 ***
timcolsTime 0.06901 0.00959
7.196 1.03e-12 ***
timcolsS1 0.37507 0.01450
25.863 < 2e-16 ***
timcolsC1 -0.17110 0.01424 -12.016 < 2e-16 ***
timcolsS2 -0.23303 0.01425 -16.351 < 2e-16 ***
timcolsC2 -0.37622 0.01439 -26.136 < 2e-16 ***
timcolsS3 -0.05689 0.01434
-3.967 7.65e-05 ***
timcolsC3 0.13164 0.01424
9.246 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001
‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3679 on 1331 degrees of freedom
Multiple R-squared: 0.6122, Adjusted
R-squared: 0.6099
F-statistic: 262.7 on 8 and 1331 DF,
p-value: < 2.2e-16
A plot of the residuals clearly shows something
else is going on:
plot(lm1$residuals)
As you may have already concluded there is seasonal heteroscedasticity in the data, beyond what we have fitted with the seasonal variables.
Let’s load some temperature data for the Chicago area and
see how it relates to this data.
First we load the weather data from the NOAA database. It is stored in annual fixed width files
which we stitch together:
# Load weather data from NOAA ftp site
KORDtemps<-NULL
for (yearNum in 2009:2013)
{
ftpString<-paste('ftp://ftp.ncdc.noaa.gov/pub/data/gsod/',yearNum,'/725300-94846-',yearNum,'.op.gz',sep='')
fileString<-paste('KORD',yearNum,'.gz',sep='')
download.file(ftpString,fileString,mode='wb')
temp2<-read.fwf(fileString,c(6,6,10,8,3,8,3,8,3,8,3,7,3,7,3,7,7,8,1,7,1,6,1,6,8),
skip=1,
col.names=c('STN','WBAN','YEARMODA','TEMP','TEMPCOUNT','DEWP','DEWPCOUNT','SLP','SLPCOUNT','STP','STPCOUNT','VISIB','VISIBCOUNT','WDSP','WDSPCOUNT','MAXSPD','GUST','MAX','MAXFLAG','MIN','MINFLAG','PRCP','PRCPFLAG','SNDP','FRSHTT'))
KORDtemps<-rbind(KORDtemps,temp2)
}
# Change missing data to NAs
KORDtemps[KORDtemps==9999.9]<-NA
KORDtemps<-cbind(date=as.Date(as.character(KORDtemps[,3]),'%Y%m%d'),KORDtemps)
head(KORDtemps)
# create cross reference date index
dateInd<-sapply(ComEd[,1], function(x) which(KORDtemps[,1]==x))
Next we plot the residuals from our regression vs. the
prevailing max temperature for the day:
plot(KORDtemps[dateInd,"MAX"],lm1$residuals)
As expected there is something going on relating temperature
to residuals.
In my next blog posting I will discuss approaches to fitting
this weather data to the demand model...
Monday, March 5, 2012
Futures price prediction using the order book data
It has been a couple of months since my last post; busy with lots of projects.
I had some fun playing around with data from Interactive Brokers API. It turns out that it is relatively easily to get hold of the raw market data relating to both trades and order book changes for CME/NYMEX commodity futures. For the purposes of this analysis I focused on the order book itself and what if anything one can imply forwards in time based on it.
All of the analysis was done in R. I have been experimenting with R for a year or so and have found it to have many advantages for statistical analyses. I put all the code at the bottom of the posting for those of you interested in doing something similar.
The raw data needs some manipulation to make it useful. Each record is a single “order book event” as I think of it. It could be a new order at a new price or a change to the number of buys or sells in the existing order book. Only the first 10 prices are supplied. As a result there are over 500,000 records for a single day’s 7 hours of trading during the pit hours (when most of the trade volume occurs).
For example, reconstructing a time series of the order book itself we see that as of
2012-02-17 13:52:55.302678
time the order book consisted of these bids:
Price | 2.689 | 2.690 | 2.691 | 2.692 | 2.694 | 2.695 | 2.685 | 2.686 | 2.687 | 2.688 |
quantity | 18 | 37 | 44 | 34 | 10 | 1 | 16 | 8 | 6 | 16 |
And these offers:
Price | 2.701 | 2.702 | 2.703 | 2.704 | 2.705 | 2.696 | 2.697 | 2.698 | 2.699 | 2.700 |
quantity | 168 | 11 | 65 | 8 | 10 | 6 | 79 | 21 | 11 | 12 |
So a rudimentary analysis of the data might focus on the imbalance between the sum of buys and the sum of sells. In the above example there are a total of 190 bids showing and 391 offers showing. Surely a difference of this magnitude would be meaningful?
For this sort of time series analysis we turn to the Vector Autoregression [VAR] models (as described in previous postings). We have two time series, each potentially dependent on itself and the other.
The first step is to discretize the data into 1 second interval data since the raw data is event driven and not summarized. The coding steps for this are outlined in R below. Also we focus initially on the price implied by the average of the best bid and offer. This is because trades may not occur contemporaneously with order book changes. Therefore the best price indicator we have at any point in time is based on the best bid and offer.
The sample data we are using is from the 2nd hour of trading from 10am to 11am on Friday February 17th 2012.
Estimation results for equation P_chg:
======================================
P_chg = P_chg.l1 + imbalance.l1 + P_chg.l2 + imbalance.l2 + P_chg.l3 + imbalance.l3 + P_chg.l4 + imbalance.l4 + P_chg.l5 + imbalance.l5 + const
Estimate Std. Error t value Pr(>|t|)
P_chg.l1 -2.592e-04 1.675e-02 -0.015 0.9877
imbalance.l1 7.496e-07 2.681e-07 2.796 0.0052 **
P_chg.l2 7.043e-02 1.676e-02 4.202 2.71e-05 ***
imbalance.l2 1.594e-07 3.571e-07 0.447 0.6553
P_chg.l3 2.088e-02 1.679e-02 1.244 0.2138
imbalance.l3 -6.813e-07 3.572e-07 -1.907 0.0566 .
P_chg.l4 -6.725e-03 1.675e-02 -0.401 0.6881
imbalance.l4 3.668e-07 3.536e-07 1.037 0.2997
P_chg.l5 -6.367e-03 1.666e-02 -0.382 0.7023
imbalance.l5 -5.575e-07 2.625e-07 -2.124 0.0337 *
const 5.517e-06 9.708e-06 0.568 0.5698
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.0005032 on 3584 degrees of freedom
Multiple R-Squared: 0.01049, Adjusted R-squared: 0.007733
F-statistic: 3.801 on 10 and 3584 DF, p-value: 4.028e-05
Looking at the results we see that the model is significant. This means that we have a model that can explain the change in price at time t based on information available at time t-1. Specifically it indicates that the price change at time t can be forecast with the price changes and order imbalance data at times t-1, t-2 and t-3.
Unfortunately when we look at the fitted values, we see that the model isn’t giving significant enough forecasts of price. Fitted values range from -6.543e-04 to 6.409e-04. I.e.: +/- 0.6 of a tick [see also chart 1]. Given that trading costs alone are approximately 1 tick we haven’t created a forecast that we can actually make any money from. However it is interesting to get the intuitive validation that prices have a momentum component on a short term basis, and that changes in the order book (increasing number of buys vs. sells) is also a positive predictor for price.
More explanatory variables is one solution to trying to get more explanatory power.
The data is distributed randomly in time; maybe the frequency of data or intensity of the order book event process is relevant [see chart 2]. Since intensity is highly skewed by adding 1 and taking the log of the data we get a more useful distribution for statistical fitting purposes.
Also, (borrowing from Hasbrouck’s Empirical Market Microstructure), it is also worth considering the serial covariance of prices. Deviations of this from normal can be a signal of private information entering the market. Chart 3 shows the rolling serial covariance of price changes. Significantly positive periods may be predictive of market moving activity.
Rerunning the same analysis with these two additional variables we get the results:
stimation results for equation P_chg:
======================================
P_chg = P_chg.l1 + order_chg.l1 + log_intensity.l1 + serial_cov.l1 + P_chg.l2 + order_chg.l2 + log_intensity.l2 + serial_cov.l2 + P_chg.l3 + order_chg.l3 + log_intensity.l3 + serial_cov.l3 + P_chg.l4 + order_chg.l4 + log_intensity.l4 + serial_cov.l4 + const
Estimate Std. Error t value Pr(>|t|)
P_chg.l1 3.031e-04 1.678e-02 0.018 0.98559
order_chg.l1 7.210e-07 2.685e-07 2.686 0.00727 **
log_intensity.l1 2.574e-06 1.396e-05 0.184 0.85376
serial_cov.l1 4.551e-01 1.147e+00 0.397 0.69145
P_chg.l2 7.220e-02 1.678e-02 4.304 1.72e-05 ***
order_chg.l2 1.500e-07 3.576e-07 0.419 0.67503
log_intensity.l2 -1.737e-05 1.446e-05 -1.202 0.22961
serial_cov.l2 -5.224e-01 1.650e+00 -0.317 0.75153
P_chg.l3 2.208e-02 1.678e-02 1.316 0.18837
order_chg.l3 -6.861e-07 3.541e-07 -1.938 0.05274 .
log_intensity.l3 8.787e-06 1.445e-05 0.608 0.54320
serial_cov.l3 2.059e+00 1.649e+00 1.248 0.21204
P_chg.l4 -9.888e-03 1.669e-02 -0.593 0.55352
order_chg.l4 -1.387e-07 2.630e-07 -0.528 0.59786
log_intensity.l4 1.606e-06 1.395e-05 0.115 0.90837
serial_cov.l4 -1.964e+00 1.146e+00 -1.714 0.08663 .
const 1.152e-05 2.271e-05 0.507 0.61204
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.0005036 on 3579 degrees of freedom
Multiple R-Squared: 0.01048, Adjusted R-squared: 0.006059
F-statistic: 2.37 on 16 and 3579 DF, p-value: 0.001621
Unfortunately the range of fitted values is still about 0.6 of a tick: -6.601e-04 to 6.419e-04 and the p value is not as compelling as before. We have added 2 variables and the model is no better a predictor than it was before.
So in summary, price movements can be predicted using a vector autoregression model, looking at lagged price change, order imbalance, message intensity and serial correlation. However the strength of the price signal is small relative to transaction costs.
Many thanks to Bernhard Pfaff for his R package ‘vars’ as well as the companion book Analysis of Integrated and Cointegrated Time Series with R, and Joel Hasbrouck for his book Empirical Market Microstructure.
___________________________________________________
For those interested in the R code see below. I am aware there are some loops and other coding blunders but it does work:
Download data from IB:
library(IBrokers)
tws <- twsConnect(clientId = 1)
twsPrompt<-twsFuture(symbol="NG","NYMEX","20120227")
reqMktData(tws, twsList, CALLBACK = NULL, file = "data.dat")
Load and analyze data:
setwd('C:/Users/Lloyd/Documents/trading/analysis/2012/intraday/')
colType<-c("Date","POSIXct","integer","integer","integer","integer","integer","integer","numeric","integer","character")
depthData<-read.table("depthCL.dat",header=F, row.names = NULL, sep = " ", fill=TRUE) #,colClasses=colType
options(digits.secs=6) # option for formatting dates with millisecond timestamps
cleanDates<-strptime(paste(depthData[,1],depthData[,2]),"%Y%m%d %H:%M:%OS")
cleanData<-array(0,c(length(cleanDates),8))
for (col in 3:10) cleanData[,(col-2)]<-as.numeric(as.vector(depthData[,col]))
# remove bad data
keep<-which(!is.na(cleanDates))
cleanData<-cleanData[keep,]
cleanDates<-cleanDates[keep]
keep<-which(!is.na(cleanData[,5]))
cleanData<-cleanData[keep,]
cleanDates<-cleanDates[keep]
bidPrices<-rep(0,10)
bidSizes<-rep(0,10)
bidAge<-rep(0,10)
offerPrices<-rep(0,10)
offerSizes<-rep(0,10)
offerAge<-rep(0,10)
dataRows<-length(cleanData[,1])
bidTotal<-rep(0,dataRows)
offerTotal<-rep(0,dataRows)
midPrice<-rep(0,dataRows)
fwdIndex<-rep(0,dataRows)
# loop through the data, recreating the working order book at each instant in time.
for (rowNum in 1:dataRows) {
if ((cleanData[rowNum,6]==1)&(cleanData[rowNum,7]>0)) {
P<-cleanData[rowNum,7]
Q<-cleanData[rowNum,8]
if (P %in% bidPrices) {
bidSizes[which(bidPrices==P)]<-Q
bidAge[which(bidPrices==P)]<-1
bidAge[which(bidPrices!=P)]<-bidAge[which(bidPrices!=P)]+1
} else {
if (P>max(bidPrices)) {
delete<-which.min(bidPrices)
bidPrices[delete]<-P
bidSizes[delete]<-Q
} else if (P<min(bidPrices)) {
delete<-which.max(bidPrices)
bidPrices[delete]<-P
bidSizes[delete]<-Q
}
}
} else if ((cleanData[rowNum,6]==0)&(cleanData[rowNum,7]>0)) {
P<-cleanData[rowNum,7]
Q<-cleanData[rowNum,8]
if (P %in% offerPrices) {
offerSizes[which(offerPrices==P)]<-Q
offerAge[which(offerPrices==P)]<-1
offerAge[which(offerPrices!=P)]<-offerAge[which(offerPrices!=P)]+1
} else {
if (P>max(offerPrices)) {
delete<-which.min(offerPrices)
offerPrices[delete]<-P
offerSizes[delete]<-Q
} else if (P<min(offerPrices)) {
delete<-which.max(offerPrices)
offerPrices[delete]<-P
offerSizes[delete]<-Q
}
}
}
bidTotal[rowNum]<-sum(bidSizes)
offerTotal[rowNum]<-sum(offerSizes)
if ((min(offerPrices)>0)&(max(bidPrices)>0)) midPrice[rowNum]<-(min(offerPrices)+max(bidPrices))/2
}
# create second indices for dates with correct start of day
ints<-seq(cleanDates[36],cleanDates[length(cleanDates)],"sec")
secondIndex<-findInterval(as.numeric(ints),as.numeric(cleanDates))
intensity<-diff(secondIndex)+1
imbalance<-bidTotal[secondIndex]-offerTotal[secondIndex]
priceChange<-diff(midPrice[secondIndex])
priceDiffs<-diff(midPrice[secondIndex])
results=NULL
library(vars)
# first look at order imbalance
alldat<-as.matrix(cbind(priceDiffs,imbalance[2:(length(imbalance))]))[921:4520,]
colnames(alldat)<-c("P_chg","imbalance")
varest<-VAR(alldat,p=1,type="const", lag.max=20, ic="AIC")
summary(varest)
summary(varest$varresult$P_chg$fitted.values)
serialCov<-NULL
for (ii in (921-60):(4520-60)) serialCov<-rbind(serialCov,cov(priceDiffs[(ii-1):(ii+59)],priceDiffs[(ii):(ii+60)]))
plot(serialCov)
length(serialCov)
alldat<-(cbind(priceDiffs,imbalance[2:(length(imbalance))],log(intensity, base = 10)[2:(length(imbalance))])[921:4520,])
alldat<-cbind(alldat[,1:3],serialCov*1000) # 1000 times to avoid singularity
colnames(alldat)<-c("P_chg","order_chg","log_intensity","serial_cov")
varest<-VAR(alldat,type="const", lag.max=10, ic="AIC")
summary(varest)
summary(varest$varresult$P_chg$fitted.values)
Thursday, December 1, 2011
NG Spreads returns, a reliable earner.
I was reminded recently it has been a while since my last posting. So here is a quick analysis of a strategy that works pretty well.
Much has been written on the changes to the NG market structure over the past few years; the influence of financial investors and the impact of new technologies. One thing that hasn’t changed is the reliable earning power of spread trading.
The complexity in modeling returns from futures trading, particularly spread trading means that the best way to characterize the profitability of these types of strategies is in terms of $/contract. This can then be directly compared to the $/contract of margin requirements for a reasonable perspective on the investment return. For this analysis, I used recent data from the CME website for initial margin by strategy type for the NG contract:
But first I present the $/contract return side of the equation for some standard spread strategies. These analyses are all based on rolling a X-Y position in NG futures, with the contract roll being on the penultimate day available. In the usual convention X is the nearby contract and Y is the further out contract. The ratio is 1:1 in contracts. Other roll alternatives may make more sense but this is conveniently tradeable given the TAS contracts available. Other hedge ratios may be more statistically defensible; 1:1 has the advantage of being available as a directly tradable spread (also via TAS), and the analytics is trivial.
Looking at the 5 year historical performance of N-(N+1) spreads we see the following results:
spread | Daily avg | Daily stdev | Annualized avg/stdev |
1-2 | 13.27 | 232.51 | 0.90 |
2-3 | -4.89 | 172.79 | -0.45 |
3-4 | -10.81 | 128.42 | -1.33 |
4-5 | -3.90 | 126.33 | -0.49 |
5-6 | -1.30 | 97.10 | -0.21 |
6-7 | -5.48 | 108.67 | -0.80 |
7-8 | -9.22 | 123.99 | -1.18 |
8-9 | -17.55 | 179.98 | -1.54 |
9-10 | -2.78 | 129.50 | -0.34 |
10-11 | 0.96 | 123.55 | 0.12 |
11-12 | -7.32 | 755.13 | -0.15 |
1-3 | 18.16 | 193.36 | 1.48 |
1-2 -(3-4) | 24.08 | 232.20 | 1.64 |
This table shows the average daily return and standard deviation in $/contract. So for example, holding a long prompt contract and short 2nd month contract since January 1, 2007 made an average of $13.27 per day with a standard deviation of $232.51 per day. However on an annualized basis this translates into $3,317 and $3,676 respectively or a profit to risk ratio of about 0.9. Not great, but a nice diversified addition to an existing strategy.
The highest individual 1 month spread contract return is in the 3-4 contract. Specifically being short this contract, which yields a profit to risk ratio for the year of 1.33.
By extension, these strategies can be combined relatively easily:
Long 1-2 plus short 2-3 (which is equivalent to long 1-3) has a 1.48 profit to risk ratio.
Long 1-2 plus short 3-4 has a 1.64 profit to risk ratio.
What is nice about these strategies is that the capital (margin) commitment is not immense. An individual spread requires less than $1000 per contract of initial margin. Drawdowns have been limited in the past 5 years as the cumulative profit chart shows:
Overall, this is a nice uncorrelated strategy to add to a portfolio.
Labels:
NG spreads,
stat arb
Subscribe to:
Posts (Atom)