Application of Transfer Learning and Text Mining on Reports to Shareholders


本研究採自然語言處理方法(Bidirectional Encoder Representation from Transformers, BERT)建立文字探勘模型,以經標記之國內半導體業致股東報告書訓練BERT。本研究亦分析BERT是否改善過往文字探勘方法的缺點,最後以情緒分析剖析致股東報告書語調對公司未來績效的影響。實證結果顯示,經驗證資料集表現篩選超參數(hyperparameter)後,BERT測試資料集分類準確率高達0.86。透過BERT視覺化釋例,本文發現其能捕捉否定詞修飾的詞彙,且能捕捉形容詞所修飾的名詞。惟與Li(2010a)使用MD&A語調之研究結果不同,本文實證並未發現當年致股東報告書情緒與下一年盈餘及盈餘變動具顯著正關聯,推論原因可能是國內投資人結構或資訊透明度特性,導致致股東報告書與美國MD&A資訊內涵不同。


First, this study applies BERT (Bidirectional Encoder Representations from Transformers) on Reports to Shareholders (RTS) of the semi-conductor industry in Taiwan. Next, we discuss whether BERT can overcome some weaknesses of traditional text mining tools. Finally, we try to assess the association between the tone in RTS with company's future financial performance by using sentiment analysis. The empirical result shows that BERT classification accuracy reaches as high as 0.86, which outperforms other techniques. Moreover, by visualizing the operation in BERT, we find that BERT can capture the word association successfully. However, the empirical result fails to show that the sentiment in RTS has significant and positive association with next year's earnings and change in earnings, which is inconsistent with previous findings. We conjecture that it may be caused by the capital market attributes such as investor's structure or information transparency in Taiwan, resulting in differences in information content provided by RTS and MD&A.


