ISMCRP5007_kawasaki

>100 Views

January 28, 25

#ボラティリティ予測 #動的トピックモデル #自然言語処理 #高頻度データ #金融時系列

スライド概要

川崎能典

@8799413110

スライド一覧

（ダウンロード不可）

関連スライド

学振特別研究員になるために～2025年度申請版

学振 dc1 dc2 jsps pd

大上雅史 790.8K

ZAZA株式会社_会社紹介

ZAZA株式会社 396.2K

StampFlyで学ぶマルチコプタ制御

伊藤恒平 369.9K

研究に使える便利なフリーソフト ImageJ

imagej 放射線技師

片山豊 366.5K

大規模言語モデルに追加学習で専門知識を教える試み (2023, arXiv:2312.03360)

Kan Hatakeyama 312.1K

東京大学 3Dスキャン勉強会 - フォトグラメトリ」

フォトグラメトリ vr 3dデジタルアーカイブ

龍 lilea 302K

各ページのテキスト

テキスト系列からの動的トピックの抽出とボラティリティ予測への応用川崎能典統計数理研究所森本孝之関西学院大学理学部 2025年1月25日統計数理研究所共同研究集会「ビッグデータ解析と再現可能研究」

This talks is continuation of our 2017 APFM paper… • Morimoto, T. and Kawasaki, Y. (2017). Forecasting Financial Market Volatility Using a Dynamic Topic Model, AsiaPacific Financial Markets, Vol. 24, pp. 149-167. DOI: 10.1007/s10690-017-9228-z

Motivation • Counts of keywords sometimes helps • (Ex.) Google SVI (Search Volume Index) • Have to find nice keywords. • From news (text) data, we want to extract topics (defined by distribution of words) that may affect market sentiments • Construct topic score time series 𝑆𝑆𝐶𝐶𝑡𝑡 • Investigate if 𝑆𝑆𝐶𝐶𝑡𝑡 improves volatility forecasting • Seek more effective specification in multiscale dynamic topic model

Illustration: topic score and realized volatility Realized volatility estimated from high frequent data Estimated topic score (one of 20 scores)

“Bag-of-Words” model • We only focus on word frequencies, and neglect other information (order of words, dependency and so on. • (Ex.) A document 𝐷𝐷 = “It is fine today” can be expressed 𝐷𝐷 = {“𝑖𝑖𝑖𝑖𝑖, “𝑖𝑖𝑖𝑖𝑖, “𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓, “𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡}. • Usually we exclude so-called “stop words” such as “a”, “the”, “for”, etc. • In this research, after morphological analysis, we choose nouns only, and remove numerals, suffixes, non-independent words, pronouns and symbols.

Latent Dirichlet Allocation Model • A standard method for topic modeling • Often abbreviated as LDA • Distribution of words follows multinomial distribution (gives likelihood) • Dirichlet distribution gives a prior distribution of words frequencies • Word distribution 𝜙𝜙𝑧𝑧 characterizes a topic 𝑧𝑧, and each document 𝑑𝑑 consists of many topics of which distribution 𝜃𝜃𝑑𝑑 .

Typical MCMC cycle for LDA This algorithm is for a single document. We do this day by day for Reuters news, and want to ensure some continuity of topics along time axis.

Multiscale Dynamic Topic Model • Proposed by Iwata et al. (2010) • Parameter 𝜙𝜙𝑡𝑡,𝑧𝑧 of word distribution of topic 𝑧𝑧 at time 𝑡𝑡 has some time dependent structure. 𝑠𝑠 𝜙𝜙𝑡𝑡,𝑧𝑧 ∼ Dirichlet 𝑆𝑆 𝑠𝑠 � 𝜆𝜆𝑡𝑡,𝑧𝑧,𝑠𝑠 𝜔𝜔 �𝑡𝑡−1,𝑧𝑧 𝑠𝑠=0 • 𝜔𝜔 �𝑡𝑡−1,𝑧𝑧 : distribution of words over topic 𝑧𝑧 with scale 𝑠𝑠 at time 𝑡𝑡 − 1 • 𝜆𝜆𝑡𝑡,𝑧𝑧,𝑠𝑠 : weight for scale 𝑠𝑠 in topic 𝑧𝑧 at time 𝑡𝑡

Original specification in Iwata et al. 𝑠𝑠 • 𝜔𝜔 �𝑡𝑡−1,𝑧𝑧 indicated the word distribution (w.d.) from epoch 𝑡𝑡 − 1 − 2𝑠𝑠−1 + 1 • If 𝑆𝑆 = 4, 𝑠𝑠 runs through 0,1,2,3,4. • 𝑠𝑠 = 4 →w.d. comes from 𝑡𝑡 − 8 to 𝑡𝑡 − 1 • 𝑠𝑠 = 3 →w.d. comes from 𝑡𝑡 − 4 to 𝑡𝑡 − 1 • 𝑠𝑠 = 2 →w.d. comes from 𝑡𝑡 − 2 to 𝑡𝑡 − 1 • 𝑠𝑠 = 1 →word distribution comes at 𝑡𝑡 − 1 0 • 𝑠𝑠 = 0; assume uniform distribution for 𝜔𝜔 �𝑡𝑡−1,𝑧𝑧

10.

Illustration of Multiscale Word Distribution Word distributions are likely to be smoothed as the time scale becomes long. Iwata, T. et al. (2000) Proceedings of 16th ACM SIGKDD, p.663-672.

11.

HAR-like specification (key idea) 𝑠𝑠 • 𝜔𝜔 �𝑡𝑡−1,𝑧𝑧 indicated the word distribution (w.d.) and we consider three different spans. • 𝑆𝑆 = 3, and 𝑠𝑠 runs through 0,1,2,3. • 𝑠𝑠 = 3 →w.d. comes from 𝑡𝑡 − 22 to 𝑡𝑡 − 1 • 𝑠𝑠 = 2 →w.d. comes from 𝑡𝑡 − 5 to 𝑡𝑡 − 1 • 𝑠𝑠 = 1 →word distribution comes at 𝑡𝑡 − 1 0 • 𝑠𝑠 = 0; assume uniform distribution for 𝜔𝜔 �𝑡𝑡−1,𝑧𝑧 In this talk, we call this specification Heterogeneous MTDM.

12.

MCMC cycle for MDTM Weights 𝜆𝜆𝑡𝑡,𝑧𝑧,𝑠𝑠 and hyperparameter 𝛼𝛼𝑡𝑡−1,𝑧𝑧 are estimated in an outer loop of this cycle, by stochastic EM algorithm and fixed point iteration method.

13.

Construction of topic scores • Topic scores are made up by estimated topic proportions 𝜃𝜃𝑡𝑡,𝑗𝑗,𝑖𝑖 (percentage of topic 𝑖𝑖 included in 𝑗𝑗-th document at time 𝑡𝑡) 𝐷𝐷𝑡𝑡 𝑆𝑆𝐶𝐶𝑡𝑡𝑖𝑖 = � 𝜃𝜃𝑡𝑡,𝑗𝑗,𝑖𝑖 𝑗𝑗=𝑑𝑑 • 𝑆𝑆𝐶𝐶𝑡𝑡𝑖𝑖 : score for topic 𝑖𝑖 at time 𝑡𝑡 • 𝐷𝐷𝑡𝑡 : number of documents at time 𝑡𝑡 • 𝜃𝜃𝑡𝑡,𝑗𝑗,𝑖𝑖 : 𝑖𝑖-th element of the topic distribution within 𝑗𝑗th document at time 𝑡𝑡

14.

Word distribution (June 2, 2008) • We consider 20 topics in all. • Word distribution in Topic 1 and Topic 2 • Only top 10 words are shown

15.

Data • High frequent data of stock index (TOPIX) • January 7th 2008 – December 28th 2012, 𝑇𝑇 = 1223 • Generate 1 min return and calculate daily realized volatility (𝑅𝑅𝑉𝑉𝑡𝑡 ) and realized quarticity (𝑅𝑅𝑄𝑄𝑡𝑡 ) • News data taken from Reuter Japan’s web site • Language = Japanese • 298,205 documents, 24,227 non-overlapping words excluding stop words

16.

Forecasting models • Heterogeneous Autoregressive (HAR) model Baseline model, Corsi (2009) • HARQ model, adding realized quarticity (𝑅𝑅𝑄𝑄𝑡𝑡−1 ) in the coefficient of 𝑅𝑅𝑉𝑉𝑡𝑡−1 Bollerslev, Patton and Quaedvleig (2016) • HAR + topic score (HAR-SC) • HARQ + topic score (HARQ-SC) • In our 2017 paper, we did AR vs. AR-SC and ARQ vs. ARQ-SC comparison which will be omitted here.

17.

高頻度データの集約 • 𝑟𝑟𝑡𝑡,𝑖𝑖 を何らかの金融資産価格から作成した(例えば)1分間隔の収益率系列とする。第𝑡𝑡日における第𝑖𝑖収益率。元データは秒単位で計測されているが、等間隔・分刻みに集約。 • 第𝑡𝑡日内で定義された収益率データの個数を 𝑛𝑛𝑡𝑡 で表す。 𝑛𝑛 2 𝑡𝑡 • 第𝑡𝑡日の実現ボラティリティRV(Realized Volatility)はRV𝑡𝑡 = ∑𝑖𝑖=1 𝑟𝑟𝑡𝑡,𝑖𝑖 で定義される。 • 同じように、実現 Quarticity (Realized Quarticity, RQ) を RQ 𝑡𝑡 = 𝑛𝑛 𝑛𝑛𝑡𝑡 4 𝑟𝑟𝑡𝑡,𝑖𝑖 で定義する。すなわちこれは、第𝑡𝑡日における収益率の ( 𝑡𝑡 ) ∑𝑖𝑖=1 3 標本4次モーメントであり、RV𝑡𝑡 の標本分散に対応する。

18.

HAR vs. HAR-SC • HAR-SC model is defined by 𝑅𝑅𝑉𝑉𝑡𝑡 = 𝛽𝛽0 + 𝛽𝛽1 𝑅𝑅𝑉𝑉𝑡𝑡−1 + 𝛽𝛽2 𝑅𝑅𝑉𝑉𝑡𝑡−1|𝑡𝑡−5 + 𝛽𝛽3 𝑅𝑅𝑉𝑉𝑡𝑡−1|𝑡𝑡−22 + 𝛾𝛾𝛾𝛾𝐶𝐶𝑡𝑡−1 + 𝑢𝑢𝑡𝑡 1 ℎ ∑ where 𝑅𝑅𝑉𝑉𝑡𝑡−𝑗𝑗|𝑡𝑡−ℎ = 𝑅𝑅𝑉𝑉𝑡𝑡−𝑖𝑖 𝑖𝑖=𝑗𝑗 ℎ+1−𝑗𝑗 • Omitting 𝛾𝛾𝛾𝛾𝐶𝐶𝑡𝑡−1 reduces to HAR model

19.

HARQ vs. HARQ-SC • HARQ-SC model is defined by 𝑅𝑅𝑉𝑉𝑡𝑡 1⁄2 = 𝛽𝛽0 + (𝛽𝛽1 +𝛽𝛽1𝑄𝑄 𝑅𝑅𝑄𝑄𝑡𝑡−1 )𝑅𝑅𝑉𝑉𝑡𝑡−1 + 𝛽𝛽2 𝑅𝑅𝑉𝑉𝑡𝑡−1|𝑡𝑡−5 + 𝛽𝛽3 𝑅𝑅𝑉𝑉𝑡𝑡−1|𝑡𝑡−22 + 𝛾𝛾𝛾𝛾𝐶𝐶𝑡𝑡−1 + 𝑢𝑢𝑡𝑡 1 ℎ ∑𝑖𝑖=𝑗𝑗 𝑅𝑅𝑉𝑉𝑡𝑡−𝑖𝑖 where 𝑅𝑅𝑉𝑉𝑡𝑡−𝑗𝑗|𝑡𝑡−ℎ = ℎ+1−𝑗𝑗 • Omitting 𝛾𝛾𝛾𝛾𝐶𝐶𝑡𝑡−1 reduces to HARQ model

20.

21.

HARQ-HARSC: yet another complication • HARQ-HARSC model is defined by 𝑅𝑅𝑉𝑉𝑡𝑡 1⁄2 = 𝛽𝛽0 + (𝛽𝛽1 +𝛽𝛽1𝑄𝑄 𝑅𝑅𝑄𝑄𝑡𝑡−1 )𝑅𝑅𝑉𝑉𝑡𝑡−1 + 𝛽𝛽2 𝑅𝑅𝑉𝑉𝑡𝑡−1|𝑡𝑡−5 + 𝛽𝛽3 𝑅𝑅𝑉𝑉𝑡𝑡−1|𝑡𝑡−22 + 𝛾𝛾1 𝑆𝑆𝐶𝐶𝑡𝑡−1 + 𝛾𝛾2 𝑆𝑆𝐶𝐶𝑡𝑡−1|𝑡𝑡−5 + 𝛾𝛾3 𝑆𝑆𝐶𝐶𝑡𝑡−1|𝑡𝑡−22 + 𝑢𝑢𝑡𝑡 1 ℎ ∑ where 𝑆𝑆𝐶𝐶𝑡𝑡−𝑗𝑗|𝑡𝑡−ℎ = 𝑆𝑆𝐶𝐶𝑡𝑡−𝑖𝑖 𝑖𝑖=𝑗𝑗 ℎ+1−𝑗𝑗

22.

In-Sample Forecasting Comparison Nikkei Index 2008-2012 Scale MSE (RW) H-MTDM MSE (IW) H-MTDM QLIKE (RW) H-MTDM QLIKE (IW) MDTM(4) Topic# 19 4 19 14 Model HARQ-HARSC HARQ-HARSC HAR-HARSC HAR-HARSC Accumulated error function value is rescaled by that of HAR model. MSE: 𝐿𝐿 𝑅𝑅𝑉𝑉𝑡𝑡 , 𝑋𝑋𝑡𝑡 = 𝑅𝑅𝑉𝑉𝑡𝑡 − 𝑋𝑋𝑡𝑡 2 QLIKE: 𝐿𝐿 𝑅𝑅𝑉𝑉𝑡𝑡 , 𝑋𝑋𝑡𝑡 𝑅𝑅𝑉𝑉𝑡𝑡 𝑅𝑅𝑉𝑉𝑡𝑡 = − log 𝑋𝑋𝑡𝑡 𝑋𝑋𝑡𝑡 −1 IW: increasing window in regression RW: rolling regression with fixed window size Error 0.692 0.971 0.973 0.990

23.

In-Sample Forecasting Comparison TSE Bank Sector Index 2008-2012 Scale MSE (RW) H-MDTM MSE (IW) MDTM(3) QLIKE (RW) H-MDTM QLIKE (IW) H-MDTM Topic# 19 13 19 19 Model HARQ-HARSC HARQ-SC HAR-HARSC HAR-HARSC Accumulated error function value is rescaled by that of HAR model. MSE: 𝐿𝐿 𝑅𝑅𝑉𝑉𝑡𝑡 , 𝑋𝑋𝑡𝑡 = 𝑅𝑅𝑉𝑉𝑡𝑡 − 𝑋𝑋𝑡𝑡 2 QLIKE: 𝐿𝐿 𝑅𝑅𝑉𝑉𝑡𝑡 , 𝑋𝑋𝑡𝑡 𝑅𝑅𝑉𝑉𝑡𝑡 𝑅𝑅𝑉𝑉𝑡𝑡 = − log 𝑋𝑋𝑡𝑡 𝑋𝑋𝑡𝑡 −1 IW: increasing window in regression RW: rolling regression with fixed window size Error 0.631 0.790 0.886 0.958

24.

Quantitative Comparison of Forecasting Model Confidence Set (MCS) by Hansen, Lunde and Nason (2011) • ℳ0 = {1,2, … , 𝑚𝑚0 } : Initial model set • Big discrepancy in the values of error function shows the difference in forecasting ability. • 𝑑𝑑𝑖𝑖𝑖𝑖,𝑡𝑡 = ℒ 𝑣𝑣𝑡𝑡 , 𝑣𝑣�𝑖𝑖𝑖𝑖 − ℒ 𝑣𝑣𝑡𝑡 , 𝑣𝑣�𝑗𝑗𝑡𝑡 , 𝑡𝑡 = 1, … , 𝑇𝑇 • Test the null of 𝐻𝐻0 : E 𝑑𝑑𝑖𝑖𝑖𝑖,𝑡𝑡 = 0, ∀𝑖𝑖, 𝑗𝑗 ∈ ℳ ⊂ ℳ0 , 𝑖𝑖 > 𝑗𝑗 • Perform the joint test of “all are equal”. If rejected, we eliminate the most inferior model from ℳ.

25.

Eliminating models, constructing MCS • Following Hansen et al. (2011), we employ the 𝑑𝑑�𝑖𝑖𝑖𝑖 test statistic max , where 𝑑𝑑̅𝑖𝑖𝑖𝑖 = 𝑖𝑖,𝑗𝑗∈ℳ var(𝑑𝑑𝑖𝑖𝑖𝑖 ) 1 𝑇𝑇 ∑𝑡𝑡=1 𝑑𝑑𝑖𝑖𝑖𝑖 𝑇𝑇 • 𝑑𝑑̅𝑖𝑖𝑖𝑖 , var(𝑑𝑑𝑖𝑖𝑖𝑖 ) are calculated by block bootstrap • Elimination rule: Set ℳ = 𝑚𝑚, 𝑑𝑑̅𝑖𝑖 = ∑𝑗𝑗∈ℳ 𝑑𝑑̅𝑖𝑖𝑖𝑖 ∕ 𝑚𝑚 − 1 , and eliminate model 𝑖𝑖 ∗ which satisfies 𝑑𝑑�𝑖𝑖 ∗ 𝑖𝑖 = argmax𝑖𝑖∈ℳ . Then repeat the var(𝑑𝑑�𝑖𝑖 ) procedure after shrinking ℳ.

26.

Preliminary results of MCS The most frequently chosen model is HARQ-HARSC. Scale 6 means 𝑆𝑆 = 6, so it covers the lags 1,2 4,8,16,32. Taking larger lags deteriorates the forecasting performance, while sometimes Scale 9 attains the best.