Top Twenty VIX Proxy ETFs and CEFs As Of August 2nd, 2007

    The following is a table showing the 5- and 20-session log-return correlations of the ETFs and CEFs to the CBOE VIX volatility index.  Score is a weighted average of the 5- and 20-session correlations, with the 5-day weighted 30% and the 20-day weighted 70%. 

Symbol 5-Day 20-Day Score
DOG 68.93% 83.25% 0.79
SIJ 70.24% 83.64% 0.8
SH 73.77% 83.46% 0.81
DXD 76.40% 82.98% 0.81
TIP 87.45% 78.88% 0.81
SFK 77.67% 83.55% 0.82
GVI 85.75% 80.92% 0.82
SCC 79.74% 85.63% 0.84
SJH 78.04% 86.38% 0.84
BND 94.06% 80.03% 0.84
CIU 94.47% 81.39% 0.85
SHY 93.40% 82.50% 0.86
FXY 88.10% 85.57% 0.86
SKF 82.30% 88.18% 0.86
BIV 97.14% 82.40% 0.87
AGG 92.90% 84.43% 0.87
IEI 91.83% 85.89% 0.88
TLT 98.22% 83.56% 0.88
IEF 95.42% 86.78% 0.89
TLH 97.58% 87.05% 0.9

    As you can see easily, the majority of the funds present are treasury assets of various maturities and short funds.  Notable amongst these, however, is FXY, the CurrencyShares Yen Trust.  Though speculation over the carry trade has proven spotty at best over the past year, it seems reasonable to accept this as some evidence of unwinding carry trade.

I am not getting those values!

Mike, I hate to bother you about this bit I tried calculating the correlations but was unable to duplicate your values. Lets take DOG 5-day correlation as an example. The (Pearson's) Correlation is given by: Correlation (C1, C2, n-5, n) where:
C1 = LN(Price/Price 5d ago) for DOG;
C2 = LN(Price/Price 5d ago) for VIX;
[C1, C2 are series where each value is as given above]
n = Bar (or date of reference Price)
I even experimented using Log10 instead of LN but still didn't get your values.

I am obviously doing something wrong; can you correct me please? Or point me to a website that shows how to calculate this correctly? Caution: I am NOT an MBA though I do have a reasonable understanding of Math. Thnaks in advance.

I am looking at the daily

I am looking at the daily correlation of the log-return time series.  If you are looking at a weekly time series instead of a daily time series, you will likely see much different numbers.  I can calculate these numbers in bulk for weekly or monthly time series if you're interested. 

For full transparency, here is a dump from my Matlab session:

>> disp(names{178})
DOG
>> datedisp(dates(end-4:end))
30-Jul-2007  
31-Jul-2007  
01-Aug-2007  
02-Aug-2007  
03-Aug-2007  
>> [fts2mat(ts{178}.Adj(end-4:end)) fts2mat(vix.Close(end-4:end))]

ans =

   59.5200   20.8700
   60.1400   23.5200
   59.5000   23.6700
   58.9600   21.6400
   60.2000   24.1500
>> diff(log([fts2mat(ts{178}.Adj(end-4:end)) fts2mat(vix.Close(end-4:end))]))

ans =

    0.0104    0.1195
   -0.0107    0.0064
   -0.0091   -0.0897
    0.0208    0.1097
>> corrcoef(diff(log([fts2mat(ts{178}.Adj(end-4:end)) fts2mat(vix.Close(end-4:end))])))

ans =

    1.0000    0.8514
    0.8514    1.0000

Got that value but questions remain...

Mike, thanks for your prompt reply, that too on a Sunday! Here are my comments, etc.:

1) I was looking at the daily prices, too. The difference in my values was b/c I was using an entirely different logic (log of 1-bar *price change*, etc.) whereas you are using the 1-bar *log price change* (i.e. log of price [bar] - log of price [bar -1]). I assume yours is the correct method, as taught in grad schools now? Or is that just how MatLab does it?
2) Another possible reason for the difference in values can be explained on the basis of difference in data values. e.g. I noted that for the 5 prices you posted for DOG & VIX each, I had one different price for each (incidentally, the VIX close on Aug 3 in my data is 25.16, as it is on Yahoo's website also; yours is 24.15).
3) It was great idea to publish the MatLab dump! I retraced the logic in Excel and for Aug-03 I got an R value of 0.85136972, which MatLab has rounded to 0.8514. Perfect correlation (pun intended)! However, your dump shows
1.0000 0.8514
0.8514 1.0000
I assume they are both for Aug-03, with one symbol being arbitrarily given a value of 1 & the R value of the other symbol given in relation to that? [I can't imagine that one value is for Aug-03 and the other for Aug-02]? Can you clarify?
4) Mike, I think this R value is for 4 sessions, not 5. Let me explain. The diff between Aug-03 & Aug-02 gives us 1-session R, that between Aug-03 & Aug-01 gives us 2-session R...so that between Aug-03 & Jul-30 actually gives us a 4-session R, not 5. What do you think?
5) Finally, here are the R (correlation) values I got, using your method:
7/30/07 0.98336
7/31/07 0.96682
8/1/07 0.84728
8/2/07 0.78766
8/3/07 0.85136
Do you see similar values in MatLab? I still don't see a value similar to the one you had posted in your article (the value must have been for Aug-01 or July-31); since I am now using your method, I can only assume this is due to data differences?
Please clarify and educate. Thanks!

1)  Yes, this difference in

1)  Yes, this difference in logic would definitely cause trouble with the SPY, as it sees heavy trading in PM/AH, which would result in differences when returns are compounded.  The primary reason that I use adjusted-close-to-close logic, however, is that ex-dividend and dividend reinvestment dates introduce significant amounts of noise  I have contemplated other methods such as a weighted combination of low, high, actual, and adjusted log-return correlations, but have yet to get around to this.  As far as "correct," I'm still an undergrad, but I've seen teachers vary quite dramatically in their opinions.

2)  Here's another very interesting problem.  So I see clearly the number you quote on the "Summary" page, but this section of my data is collected automatically from the Yahoo historical data here - http://finance.yahoo.com/q/hp?s=%5EVIX 

  As you can tell, Yahoo's own numbers vary.  Going straight to the CBOE (http://www.cboe.com/DelayedQuote/SimpleQuote.aspx?ticker=VIX), we see that your number is actually the right one for Friday's close.  Up until last week, I had been using CBOE's data as well, but I see that this has perhaps been a bad idea.  Note as well though that these were adjusted prices, in case that made the difference for the DOG. 

3) The output you see there is what's called a correlation matrix.  The 1's refer to the correlation between one of the columns and itself, which is always one, and thus you can sometimes identify correlation matrices by their diagonal of ones.  More info here - http://en.wikipedia.org/wiki/Covariance_matrix

4 & 5) Yes, you're correct there, I need to fix the session count.  I'll take a look at how the CBOE and Yahoo data are mismatched.  I have had horrible luck with pay-for data vendors as well, and these kinds of problems are always very frustrating but seemingly impossible to avoid given the volume of data involved.

Hope I've helped somewhat, and I'm very grateful for your input as well.  One of the reasons I began this site was that so few authors are clear about or even understand what they're calculating, and so I'm very happy to be open about my work.  Depending on how busy I become during this next school year and how close I'd like to keep my intellectual property, I may release portions of my code library on the site as well.  And again, thank you!

Data issues are important

Most people don't realize that there can be differences in the data even from 'reputable' vendors. And yes, I see your point about the VIX having different price on the summary page than on the historical prices page. FWIW, I don't find Yahoo data to be of high quality. But atleast its free!!

I am a bit surprised to hear that academics have differing opinions on how to calculate the correlation coefficient; I would have thought that such a basic concept would have been "standardized" a long time ago. Feel free to opine on what is the "generally accepted" technique in the academic community.

I have found this interchange most helpful and am glad you also learnt something. Thank you!

JD