目录
介绍理论建模数据参数设置绩效分析*策略代码:https://github.com/xcycharles/stock/blob/master/stats_arb.py*数据代码:https://github.com/xcycharles/stock/blob/master/tushare_data_git.py介绍
本文从数学理论到框架,一步步介绍了A股etf50篮子的统计套利策略实现。在分享建模思路的同时,我还提供了改进和分析绩效的思路。这个策略的好处是低频并且市场中性。任何投资人都可以根据信号参与,并且承担较低的换手手续费。alpha来源于事件驱动和市场无效性。先上图。
近期回测
过去三年回测
最大化资金使用率后的回测
理论
A股上证50的成分股acf时序分析得出博彩平台,统计显著的非阴影区域中,股价由于事件驱动产生的shock会在一个月内回归。
从PACF分析中可以得出shock后每天的统计显著的自相关会持续5天左右。
建模
接下来根据数据理论建模,具有协整性的stock pair,当有shock来临时,做多并持有5天。当作趋势突破因子用,笔者测试的未来回报IC值也是当持仓5天时最高。
关于协整性测试我试了以下方法,通过结果和具体信号触发的案例分析得出第三个为最好。具体统计含义此处不详细深入。
Engle-GrangerJohansenOLS residual Adfullerdifference unit rootdef cointegration_test(y, x):
ols_result = sm.OLS(y, x).fit()
return adfuller(ols_result.resid)接下来是信号触发机制代码:
def find_good_pairs(df):
#df = df.fillna(method='ffill')
#df = df.fillna(method='bfill')
dim = df.shape[1] # number of columns
#pvalue_matrix = np.ones((dim, dim))
#correl_matrix = np.zeros((dim, dim))
keys = df.keys() # index object of df columns
good_pairs = []
short = []
long = []
for i in range(dim):
for j in range(i + 1, dim):
try:
stock1 = df[keys[i]] # first stock
stock2 = df[keys[j]] # second stock
# correlation is about magnitude in short time
correl = np.corrcoef(stock1,stock2)[0,1]
# cointegration is about possibility if stationary over long time
#pvalue = coint(stock1, stock2)[1]
pvalue = cointegration_test(stock1,stock2)[1]
#pvalue_matrix[i, j] = pvalue
#correl_matrix[i, j] = correl
if pvalue < coint_param and correl > corr_param:
good_pairs.append((keys[i], keys[j]))
diff = stock1-stock2
rmean = diff.rolling(rmeanwindow).mean()[-1]
#rmean = diff[-2]
std = np.std(diff[-rmeanwindow:])
if diff[-1] > rmean+2*std:# and diff[-1] < rmean+3*std:
print(f'long {keys[i]}, short {keys[j]}, corr is {correl}, coint is {pvalue}')
if style == 'reversal':
#if stock1[-1] < stock1[-2]:
short.append(keys[i])
#if stock2[-1] > stock2[-2]:
long.append(keys[j])
if style == 'trend':
if (stock1[-1]-stock1[-5])/stock1[-5] < buyhighlimit:
long.append(keys[i])
short.append(keys[j])
elif diff[-1] < rmean-2*std:# and diff[-1] > rmean-3*std:
print(f'long {keys[j]}, short {keys[i]}, corr is {correl}, coint is {pvalue}')
if style == 'reversal':
#if stock2[-1] < stock2[-2]:
short.append(keys[j])
#if stock1[-1] > stock1[-2]:
long.append(keys[i])
if style == 'trend':
if (stock2[-1] - stock2[-5]) / stock2[-5] < buyhighlimit:
long.append(keys[j])
short.append(keys[i])
except:
pass
return good_pairs, set(short), set(long)数据
A股日线数据可以从挖地兔取得
def get_stock_data(start, end):
ticker_list = ['603986.SH','603501.SH','603288.SH','603259.SH','601995.SH','601899.SH','601888.SH','601857.SH','601818.SH','601688.SH','601668.SH','601628.SH','601601.SH','601398.SH','601336.SH','601318.SH','601288.SH','601211.SH','601166.SH','601138.SH','601088.SH','601066.SH','601012.SH','600918.SH','600893.SH','600887.SH','600837.SH','600809.SH','600745.SH','600703.SH','600690.SH','600588.SH','600585.SH','600570.SH','600547.SH','600519.SH','600438.SH','600309.SH','600276.SH','600196.SH','600104.SH','600050.SH','600048.SH','600036.SH','600031.SH','600030.SH','600028.SH','600016.SH','600009.SH','600000.SH']
ticker_list = ','.join(ticker_list)
data = pd.DataFrame()
daily_df = pro.query('daily',ts_code=ticker_list,start_date=start,end_date=end,fields='ts_code,trade_date,close')
data = pd.concat([data,daily_df],axis=0)
return data
df = pd.DataFrame()
for j in [str("%.2d" % i) for i in range(17,21)]:
for i in [str("%.2d" % i) for i in range(1,13)]:
start = '20'+j+i+'01'
end = '20'+j+i+'31'
df = pd.concat([df,get_stock_data(start,end)],axis=0)
df.columns = ['stock', 'date', 'close']
df = df.pivot_table(index=['date'], columns='stock', values='close')
df.index = df.index.map(lambda x:dt.datetime.strptime(str(x),'%Y%m%d'))参数设置
amount = 1.0e6 #起始金额cointwindow = 200 #协整区间最好为一年以上rmeanwindow = 20 #信号触发标准,直接影响信号的多少和质量rebalance = 3 #调仓天数turnoveradj = 1 * rebalance #本金根据调仓天数分配,这样可能导致信号少的是否资金没有完全利用coint_param = 0.05 #协整度参数corr_param = 0.6 #相关性参数buyhighlimit = 0.2 #买入前安全保护阈值以免过高买入style = 'trend' #用统计套利做趋势或者回归绩效分析
平均的每日收益包括信号不触发的时间。换手率指的是平均每天需要调仓和整体暴露的比例。下单比例说明一年里平均两天触发一次信号。最大回撤是
的含义,就是在盈利基础上最大的回撤。对于如此高胜率的解释是当配对股票产生日k线价格的分歧后,将会带来趋势性的突破。
但是在大概率的背后也会有像下图的突破性急速反转。因此在绩效分析的时候需要对特定情况分析,然后改进策略保护机制。
文末留一个悬念给大家,怎么找突破还是回归的方法可以看下文末的评论区~