LUCIDA:如何利用多因子策略构建强大的加密资产投资组合(因子合成篇)

Odaily星球日报Опубліковано о 2024-02-06Востаннє оновлено о 2024-02-06

Анотація

本篇根据大类对因子的相关性进行检验,依据检验结果对因子做了舍弃或合成处理。

书接上回,关于《用多因子模型构建强大的加密资产投资组合》系列文章中,我们已经发布了三篇:《理论基础篇》《数据预处理篇》《因子有效性检验篇》

前三篇分别解释了多因子策略的理论与单因子测试的步骤。

一、因子相关性检验的原因:多重共线性

我们通过单因子测试部分筛选出一批有效因子,但以上因子不能直接入库。因子本身可以根据具体的经济含义进行大类划分,同类型的因子间存在较强的相关性,若不经相关性筛选直接入库,根据不同因子进行多元线性回归求预期收益率时,会出现多重共线性问题。计量经济学中,多重共线性是指回归模型中的一些或全部解释变量存在“完全”或准确的线性关系(各变量间高度相关)。

因此,有效因子筛选出后,首先需要根据大类对因子的相关性进行 T 检验,对于相关性较高的因子,要么舍弃显著性较低的因子,要么进行因子合成。

多重共线性的数学解释如下:

LUCIDA:如何利用多因子策略构建强大的加密资产投资组合(因子合成篇)

会存在两种情况:

LUCIDA:如何利用多因子策略构建强大的加密资产投资组合(因子合成篇)

多重共线性导致的后果:

1.完全共线性下参数估计量不存在

2.近似共线性下 OLS 估计量非有效

LUCIDA:如何利用多因子策略构建强大的加密资产投资组合(因子合成篇)

LUCIDA:如何利用多因子策略构建强大的加密资产投资组合(因子合成篇)

3.参数估计量经济含义不合理

4.变量的显著性检验(t 检验)失去意义

5.模型的预测功能失效:通过多元线性模型拟合出的预测收益率极其不准确,模型失效。

二、步骤一:同类型因子的相关性检验

检验新求出的因子与已入库因子的相关性。通常来说,有两类数据求相关性:

1.根据所有 token 在回测期间的因子值求相关

2.根据所有 token 在回测期间的因子超额收益值求相关

LUCIDA:如何利用多因子策略构建强大的加密资产投资组合(因子合成篇)

我们所求的每个因子对 token 的收益率都有一定的贡献和解释能力。进行相关性检验**,是为了找到对策略收益有不同解释和贡献的因子,策略的最终目的是收益**。如果两个因子对收益的刻画是相同的,即使两个因子值存在很大差别也无意义。因此,我们并不是想找到因子值本身差异大的因子,而是想找到因子对收益刻画不同的因子,所以最终选择了用因子超额收益值求相关。

我们的策略是日频,所以按回测区间的日期计算因子超额收益之间的相关系数矩阵

LUCIDA:如何利用多因子策略构建强大的加密资产投资组合(因子合成篇)

编程求解与库内相关最高的前 n 个因子:

def get_n_max_corr(self, factors, n= 1):
       factors_excess = self.get_excess_returns(factors)
       save_factor_excess = self.get_excess_return(self.factor_value, self.start_date, self.end_date)
       if len(factors_excess) < 1:
           return factor_excess, 1.0, None
       factors_excess[self.factor_name] = factor_excess['excess_return']
       factors_excess = pd.concat(factors_excess, axis= 1)
       factors_excess.columns = factors_excess.columns.levels[ 0 ]
       # get corr matrix
       factor_corr = factors_excess.corr()
       factor_corr_df = factor_corr.abs().loc[self.factor_name]
       max_corr_score = factor_corr_df.sort_values(ascending=False).iloc[ 1:].head(n)
       
       return save_factor_excess, factor_corr_df, max_corr_score

三、步骤二:因子取舍、因子合成

对于相关性较高的因子集合,可以采取两种方式处理:

(1)因子取舍

根据因子本身的 ICIR 值、收益率、换手率、Sharpe 比率,挑选某维度下最有效的因子进行保留,删除其他因子。

(2)因子合成

对因子集合中的因子进行合成,截面上尽可能多的保留有效信息

LUCIDA:如何利用多因子策略构建强大的加密资产投资组合(因子合成篇)

假设当前有 3 个待处理的因子矩阵:

LUCIDA:如何利用多因子策略构建强大的加密资产投资组合(因子合成篇)

2.1 等权加权

各因子权重相等(w= 1/因子个数),综合因子=各因子值加总求平均。

Eg.动量类因子,一个月收益率、两个月收益率、三个月收益率、六个月收益率、十二个月收益率,这六个因子的因子载荷各占 1/6 的权重,合成新的动量因子载荷,然后再重新进行标准化处理。

synthesis 1 = synthesis.mean(axis= 1) # 按行求均值

2.2 历史 IC 加权、历史 ICIR、历史收益加权

用回测期的 IC 值(ICIR 值、历史收益值)对因子进行加权。过去有很多期,每一期都有一个 IC 值,所以用它们的均值作为因子的权重。通常使用回测期 IC 的均值(算数平均值)作为权重。

LUCIDA:如何利用多因子策略构建强大的加密资产投资组合(因子合成篇)

2.3 历史 IC 半衰加权、历史 ICIR 半衰加权

2.1 与 2.2 都是计算算数平均值,回测期的每一次 IC、ICIR 对于因子的作用被默认为相同。

但现实中,回测期的每一期对于当期的影响程度不完全相同,存在时间上的衰减。越接近当前期的时期,影响越大,越远影响越小。在此原理,求 IC 权重前首先定义一个半衰权重,距离当期越近的权重值越大、越远权重越小。

半衰权重数学推导:

LUCIDA:如何利用多因子策略构建强大的加密资产投资组合(因子合成篇)

LUCIDA:如何利用多因子策略构建强大的加密资产投资组合(因子合成篇)

2.4 最大化 ICIR 加权

通过求解方程,计算最优因子权重 w 使得 ICIR 最大化

LUCIDA:如何利用多因子策略构建强大的加密资产投资组合(因子合成篇)

协方差矩阵的估计问题:协方差矩阵用于衡量不同资产之间的关联性。统计学中常以样本协方差矩阵代替总体协方差矩阵,但在样本量不足时,样本协方差矩阵与总体协方差矩阵的差异会很大。所以有人提出了压缩估计的方法,原理是使估计协方差矩阵与实际协方差矩阵之间的均方误差最小

方式:

1.样本协方差矩阵

LUCIDA:如何利用多因子策略构建强大的加密资产投资组合(因子合成篇)

2.Ledoit-Wolf 收缩:引入一个缩小系数,将原始的协方差矩阵与单位矩阵进行混合,以减少噪音的影响。

LUCIDA:如何利用多因子策略构建强大的加密资产投资组合(因子合成篇)

3.Oracle 近似收缩:对 Ledoit-Wolf 收缩的改进,目标是通过对协方差矩阵进行调整,从而在样本大小较小的情况下更准确地估计真实的协方差矩阵。(编程实现与 Ledoit-Wolf 收缩同理)

2.5 主成分分析 PCA

主成分分析(Principal Component Analysis,PCA)是一种用于降维和提取数据主要特征的统计方法。其目标是通过线性变换,将原始数据映射到一个新的坐标系,使得数据在新坐标系下的方差最大化。

具体而言,PCA 首先找到数据中的主成分,也就是数据中方差最大的方向。然后,它找到与第一个主成分正交(无关)且具有最大方差的第二个主成分。这个过程一直重复,直到找到数据中所有的主成分。

LUCIDA:如何利用多因子策略构建强大的加密资产投资组合(因子合成篇)

Пов'язані матеріали

Is the 'Token Subsidy War' Among AI Giants Almost Over?

The article discusses the ongoing "token subsidy war" among AI giants like OpenAI and Anthropic, questioning whether it's nearing its end. It reveals that current AI subscription prices are heavily subsidized, with some plans offering tokens at up to 70 times the actual cost to attract and retain heavy users, especially developers and enterprises. This strategy mirrors past internet-era subsidy battles, but with a key difference: AI tokens lack "lock-in" effects. Unlike ride-hailing or food delivery apps, users can easily switch between AI providers as APIs become standardized, making it difficult for companies to raise prices post-subsidy. The piece highlights a structural asymmetry in the competition. Giants like Google, with massive advertising revenue, can afford to subsidize tokens indefinitely, akin to using "tokens as a weapon." In contrast, venture-backed companies like OpenAI and Anthropic face pressure to become profitable, especially as they approach IPO. The article cites Google Ventures founder Bill Maris, who suggests Google could slash token prices by 80%, putting immense pressure on competitors. Two potential endgames are presented: the "internet service" model (subsidize, monopolize, then raise prices) and the "utility" model (tokens become a standardized, low-margin commodity like electricity). Given the low switching costs, the latter seems more likely. The competition may not have a single winner but could instead accelerate AI's evolution into a foundational, infrastructure-level technology, akin to a public utility. For now, users continue to benefit from heavily subsidized token costs.

marsbit9 хв тому

Is the 'Token Subsidy War' Among AI Giants Almost Over?

marsbit9 хв тому

Beyond the Stadium: The Profitable Games Surrounding the World Cup

"Beyond the Pitch: The Profit Game Around the World Cup" The FIFA World Cup transcends being a sporting spectacle, evolving into a massive global arena for speculation and profit-seeking. The 2026 tournament has amplified this dynamic, creating a multi-layered ecosystem of financial opportunism alongside the football. **Prediction markets** have surged into the mainstream. Platforms like Polymarket and Kalshi saw trading volumes for World Cup contracts soar, attracting new users with their financial trading model and high-profile, chain-based wealth stories that overshadow traditional sports betting in terms of growth and narrative. However, **traditional sportsbooks** remain the dominant force, leveraging established user habits, legal markets, and comprehensive product offerings to handle the vast majority of speculative wagers, with projections suggesting record-breaking betting volumes. Capital markets also react. **"Concept stocks"** in countries like South Korea and Japan experience volatile price swings based on team performance and anticipated fan spending on items like chicken, beer, and viewing parties, effectively becoming a stock market reflecting fan sentiment. The **ticket resale market** has become a sophisticated arena for arbitrage. Prices fluctuate wildly based on team draws and star power, with sellers sometimes listing tickets they don't yet own in a practice akin to short-selling, while FIFA's own "Right to Buy" tokens add another layer of speculative trading. **Collectibles and merchandise** offer another avenue. Panini sticker albums, with their inherent scarcity and nostalgic value, can become high-value collectibles. Limited-edition or locally themed jerseys command significant premiums on secondary markets, and even counterfeit vendors profit from fans' desire for affordable match-day identity. The **cryptocurrency** space has seen a frenzy of speculative, unauthorized World Cup-themed meme coins on chains like Solana. These tokens, often exploiting team names and player imagery, experience extreme pump-and-dump cycles, creating stories of massive gains for a few early entrants and steep losses for many others. Finally, an entire industry thrives on **providing information and tools** to other speculators. Developers create platforms like SeatSidekick to track ticket inventory and prices, while paid Telegram groups and subscriptions sell betting tips and predictions, monetizing the widespread desire for an informational edge. In essence, the World Cup has become a compressed, global laboratory for speculation. While the games determine champions on the field, a parallel, complex network of financial transactions—spanning prediction contracts, bets, stocks, tickets, collectibles, crypto, and information services—settles its own scores in the global market.

marsbit49 хв тому

Beyond the Stadium: The Profitable Games Surrounding the World Cup

marsbit49 хв тому

How Does Codex Use a Computer? Three Entry Points and Permission Boundaries

This article explains the three primary methods for Codex to interact with a computer, each with distinct use cases, permission boundaries, and trust levels. **1. Computer Use:** This offers the broadest access, allowing Codex to visually control and interact with the graphical user interface of authorized macOS/Windows apps, system settings, and even iOS simulators. It's ideal for tasks lacking APIs or structured tools, such as operating legacy software or multi-app workflows. However, it's the slowest method and has the widest permission scope, requiring careful supervision for sensitive actions. **2. Chrome Extension:** This grants Codex access to the user's logged-in Chrome browser state, including cookies, profiles, and open tabs. It's best for tasks requiring user identity across websites like Gmail, LinkedIn, Salesforce, or internal dashboards. Its key advantage is multi-tab control for complex workflows. While more powerful for browser-based tasks than Computer Use, it carries higher sensitivity as actions are performed under the user's identity. **3. In-App Browser:** This is a browser isolated within the Codex thread, separate from the user's personal browsing data. It excels in web development and debugging scenarios—previewing local servers, testing responsive layouts, or annotating designs directly on the page. Its isolation is a strength for development but a limitation for tasks requiring login sessions. The core principle is to choose the narrowest, safest, and most structured interface for the task. Use plugins or MCPs first, resort to visual control (Computer Use) only for GUI-dependent tasks, employ the Chrome extension for identity-reliant browser work, and prefer the In-App Browser for isolated development. **Appshots** are clarified as a fourth, complementary tool for *inputting* context—capturing a screenshot of a window to point Codex to something—rather than a method for Codex to *act*. Together, this layered approach highlights a key to AI agent productization: not granting unlimited permissions, but constraining them within clear boundaries for specific tasks while preserving user oversight.

marsbit2 год тому

How Does Codex Use a Computer? Three Entry Points and Permission Boundaries

marsbit2 год тому

The "Iron Rule" of Chip Equipment Is Being Broken

For years, the semiconductor equipment industry followed an unwritten "iron rule": suppliers offered steep discounts for new tool introductions (Design-in) and faced consistent price pressure during repeat orders, especially during market downturns. This long-standing buyer's market dynamic is now being upended. Recently, SK Hynix's primary equipment suppliers have reportedly requested a 3-4% price *increase*, a nearly unprecedented move. This shift is driven by a severe supply-demand imbalance fueled by the AI compute boom. Securing equipment has become an urgent arms race as chipmakers' expansion speed dictates their ability to fulfill massive AI chip orders. Key areas feeling the strain include: **TCB (Thermal Compression Bonding) Equipment:** Demand is exploding, driven by the simultaneous needs of HBM4 memory stacking, AI chip Chip-on-Substrate (C2S), and logic Chiplet Chip-on-Wafer (C2W) packaging. Players like Hanmi Semiconductor, Hanwha Semitech, and ASMPT are receiving major orders. While hybrid bonding is seen as the future, TCB remains the pragmatic choice for HBM4 mass production, with its lifecycle extended by relaxed specifications and ongoing technological upgrades. **Test Equipment Bottlenecks:** Ironically, AI-driven shortages are now crippling test equipment manufacturing. Critical components like FPGAs, Driver ICs, and CPUs face severe shortages and extended lead times (up to 52 weeks for FPGAs), as AI data center and server vendors prioritize supply. This creates a paradoxical cycle: AI chip shortages drive fab expansion, which requires more test equipment, whose production is delayed because its key parts are diverted to make AI chips. The industry is entering a broad, AI-powered upcycle. SEMI forecasts global semiconductor equipment sales to hit a record $156 billion by 2027, fueled by investment in advanced logic/foundry, HBM-driven DRAM, and advanced packaging (like CoWoS). Major players like TSMC, SK Hynix, and Micron are aggressively ramping capital expenditure. In conclusion, leading equipment vendors are no longer just selling tools; they are selling the critical capability to deliver AI-era capacity. Pricing power is shifting decisively to those with indispensable technology in key process nodes like advanced logic, HBM, and advanced packaging, rewriting the industry's traditional power structure.

marsbit2 год тому

The "Iron Rule" of Chip Equipment Is Being Broken

marsbit2 год тому

Торгівля

Спот
Ф'ючерси
活动图片