-
Notifications
You must be signed in to change notification settings - Fork 241
Expand file tree
/
Copy path03 Method.html
More file actions
executable file
·182 lines (160 loc) · 8.76 KB
/
03 Method.html
File metadata and controls
executable file
·182 lines (160 loc) · 8.76 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
<p>
In order to apply the model, we need to first pull history data to build it. The project can be briefly divided into four parts: the historical data request, model training, prediction and execution.
</p>
<h3>Step 1: Request Historical Data</h3>
<p>
The first function takes two arguments: symbol and number of daily data points requested. This function requests historical <a href="https://www.quantconnect.com/docs#Consolidating-Data-TradeBars-vs-QuoteBars">QuoteBars</a> and builds it into a pandas DataFrame. For more information about pandas DataFrame, please refer to the help documentation <a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html">DataFrame</a>. The <code>_calculate_return</code> function takes a DataFrame as an argument to calculate the mean and standard deviation of the log prices, and create new columns for the DataFrame (return, reversal factor and momentum) - it prepares the DataFrame for multiple linear regression.
</p>
<div class="section-example-container">
<pre class="python">def _get_history(self,symbol, num):
data = {}
dates = []
history = self.history([symbol], num, Resolution.DAILY).loc[symbol]['close'] #request the historical data for a single symbol
for time in history.index:
t = time.to_pydatetime().date()
dates.append(t)
dates = pd.to_datetime(dates)
df = pd.DataFrame(history)
df.reset_index(drop=True)
df.index = dates
df.columns = ['price']
return df
def _calculate_return(self,df):
#calculate the mean for further use
mean = np.mean(df.price)
# cauculate the standard deviation
sd = np.std(df.price)
# pandas method to take the last datapoint of each month.
df = df.resample('BM').last()
# the following three lines are for further experiment purpose
# df['j1'] = df.price.shift(1) - df.price.shift(2)
# df['j2'] = df.price.shift(2) - df.price.shift(3)
# df['j3'] = df.price.shift(3) - df.price.shift(4)
# take the return as depend variable
df['log_return'] = df.price - df.price.shift(1)
# calculate the reversal factor
df['reversal'] = (df.price.shift(1) - mean)/sd
# calculate the momentum factor
df['mom'] = df.price.shift(1) - df.price.shift(4)
df = df.dropna() #remove nan value
return (df,mean,sd)</pre>
</div>
<h3>Step 2: Build Predictive Model</h3>
<p>
The <code>_concat</code> function requests history and joins the results into a single DataFrame. As \(\mu \) varies by country so we assign the mean and standard deviation to the symbol for each currency for future use. The OLS function takes the resulting DataFrame to conduct an OLS regression. We write it into a function because it's easier to change the formula here if we need.
</p>
<div class="section-example-container">
<pre class="python">def _concat(self):
# we requested as many daily tradebars as we can
his = self._get_history(self._quoted[0].value,20*365)
# get the clean DataFrame for linear regression
his = self._calculate_return(his)
# add property to the symbol object for further use.
self._quoted[0].mean = his[1]
self._quoted[0].sd = his[2]
df = his[0]
# repeat the above procedure for each symbols, and concat the dataframes
for i in range(1,len(self._quoted)):
his = self._get_history(self._quoted[i].value,20*365)
his = self._calculate_return(his)
self._quoted[i].mean = his[1]
self._quoted[i].sd = his[2]
df = pd.concat([df,his[0]])
df = df.sort_index()
# remove outliers that outside the 99.9% confidence interval
df = df[df.apply(lambda x: np.abs(x - x.mean()) / x.std() < 3).all(axis=1)]
return df
def _OLS(self,df):
res = sm.ols(formula = 'log_return ~ reversal + mom',data = df).fit()
return res</pre>
</div>
<h3>Step 3: Apply Predictive Model</h3>
<p>
The <code>_predict</code> function uses the history for the last 3 months, merges it into a DataFrame and then calculates the updated factors. Using these updated factors (together with the model we built) we calculate the expected return.
</p>
<div class="section-example-container">
<pre class="python">def _predict(self,symbol):
# get current month in string
month = str(self.time).split(' ')[0][5:7]
# request the data in the last three months
res = self._get_history(symbol.value,33*3)
# pandas method to take the last datapoint of each month
res = res.resample('BM').last()
# remove the data points in the current month
res = res[res.index.month != int(month)]
# calculate the variables
res = self._calculate_input(res,symbol.mean,symbol.sd)
res = res.iloc[0]
# take the coefficient. The first one will not be used for sum-product because it's the intercept
params = self._formula.params[1:]
# calculate the expected return
re = sum([a*b for a,b in zip(res[1:],params)]) + self._formula.params[0]
return re
def _calculate_input(self,df,mean,sd):
# df['j1'] = df.price - df.price.shift(1)
# df['j2'] = df.price.shift(1) - df.price.shift(2)
# df['j3'] = df.price.shift(2) - df.price.shift(3)
df['reversal'] = (df.price - mean)/sd
df['mom'] = df.price - df.price.shift(3)
df = df.dropna()
return df</pre>
</div>
<p>
There are a few points of note:
</p>
<ul>
<li>We need historical TradeBars for the last three months. To do this we requested 99 bars and use a pandas DataFrame to extract a data point for the end of each month.</li>
<li>We use event schedule to execute the strategy at the first trading day, however, sometimes the first day of the month could be on the 2nd if the 1st falls on a weekend. To fix this we remove the data from the current month, leaving only the last 3 months of data.</li>
<li>We start from the second element of res (res[1:]) because res and params are different lengths. This was hard to detect because Python would not throw error when running [a*b for a,b in zip(res,params)] even if the length of the two lists are different.</li>
<li>This function also used pandas DataFrame methods extensively. For more information please refer to <a href="http://pandas.pydata.org">pandas</a>.</li>
</ul>
<h3>Step 4: Initializing the Model</h3>
<p>
In the <a href='/docs/v2/writing-algorithms/initialization'>initialize</a> function we prepare the data and conduct a linear regression. The class property <code>self._formula</code> is the result of the OLS regression. We will use this object each time we rebalance the portfolio.
</p>
<div class="section-example-container">
<pre class="python">def initialize(self):
self.set_start_date(2013,6,1)
self.set_end_date(2016,6,1)
self.set_cash(10000)
syls = ['EURUSD','GBPUSD','USDCAD','USDJPY']
self._quoted = []
for i in range(len(syls)):
self._quoted.append(self.add_forex(syls[i],Resolution.DAILY,Market.OANDA).symbol)
df = self._concat()
self.log(str(df))
self._formula = self._OLS(df)
self.log(str(self._formula.summary()))
self.log(str(df))
self.log(str(df.describe()))
for i in self._quoted:
self.log(str(i.mean) + ' ' + str(i.sd))
self.schedule.on(self.date_rules.month_start(), self.time_rules.at(9,31), self._action)</pre>
</div>
<h3>Step 5: Performing Monthly Rebalancing</h3>
<p>
Every month we rebalance the portfolio using the <a href="https://www.quantconnect.com/docs#Scheduled-Events">Schedule Event</a> helper method. The predicted returns are added to the rank array and then sorted by return. The first element in the list is the best return paired with the associated symbol. When all the expected returns in the rank array are positive we only go long the pair with the highest expected return. When all returns are negative, we only go short the pair with the lowest expected return.
</p>
<div class="section-example-container">
<pre class="python">def _action(self):
rank = []
long_short = []
for i in self._quoted:
rank.append((i,self._predict(i)))
# rank the symbols by their expected return
rank.sort(key = lambda x: x[1],reverse = True)
# the first element in long_short is the one with the highest expected return, which we are going to long, and the second one is going to be shorted.
long_short.append(rank[0])
long_short.append(rank[-1])
self.liquidate()
# the product < 0 means the expected return of the first one is positive and that of the second one is negative--we are going to long and short.
if long_short[0][1]*long_short[1][1] < 0:
self.set_holdings(long_short[0][0],1)
self.set_holdings(long_short[1][0],-1)
# this means we long only because all of the expected return is positive
elif long_short[0][1] > 0 and long_short[1][1] > 0:
self.set_holdings(long_short[0][0],1)
# short only
else:
self.set_holdings(long_short[1][0],-1)</pre>
</div>