透過您的圖書館登入
IP:3.12.151.153
  • 學位論文

財報項目全文的擷取和效能評估

Item Extraction for Annual Financial Report: Annotation and Evaluation

指導教授 : 盧信銘

摘要


近年來文字分析在財報中的應用相當廣泛,但是研究者感興趣的議題只有特定幾項,再加上財報項目擷取的品質好壞會影響後續分析的結果,因此本實驗提出以機器學習的模型來解決此任務。在第一階段進行財報的人工標記,蒐集訓練資料,以反映報表真實情況;第二階段針對訓練資料集設計六種不同的特徵,並運用條件隨機域讓模型自行根據學到的潛在規則進行文字序列標記。根據本實驗結果可以發現使用條件隨機域的方式進行全文的項目擷取,可以有效地提升擷取準確度,確保分析前的資料品質。而在這之中,項目標題文字對於標記的結果影響較大,項目編號和 item 此字較無任何影響。

並列摘要


Textual Analysis is widely used in financial reports. However, there are only a few specific topics that researchers are interested in, and the quality of the item extraction will affect the results of the subsequent analysis. Therefore, in this research, we propose a machine learning model to extract an item from 10-K reports. First, to reflect the real situation of the reports, this study carries out manual tagging of the financial report and collects training materials. Second, we design six different features for the training dataset, and use the conditional random field to label text sequences based on the potential rules learned. According to the results of this experiment, it can be found that the use of conditional random fields for the full-text item extraction can effectively improve the accuracy of the extraction and ensure the quality of the data before analysis. Among them, the title text of the project has a great influence on the result of the mark, and the item number and the item have no influence.

參考文獻


Antweiler, W., & Frank, M. Z. (2004). Is all that talk just noise? The information content of internet stock message boards. Journal of Finance, 59(3), 1259-1294. doi:10.1111/j.1540-6261.2004.00662.x
Bonsall, S. B., Leone, A. J., Miller, B. P., & Rennekamp, K. (2017). A plain english measure of financial reporting readability. Journal of Accounting & Economics, 63(2-3), 329-357. doi:10.1016/j.jacceco.2017.03.002
Brown, S. V., & Tucker, J. W. (2011). Large‐sample evidence on firms’ year‐over‐year MD&A modifications. Journal of Accounting Research, 49(2), 309-346.
Campbell, J. L., Chen, H. C., Dhaliwal, D. S., Lu, H. M., & Steele, L. B. (2014). The information content of mandatory risk factor disclosures in corporate filings. Review of Accounting Studies, 19(1), 396-455. doi:10.1007/s11142-013-9258-3
Das, D., & Bandyopadhyay, S. (2012). Sentence-level emotion and valence tagging. Cognitive Computation, 4(4), 420-435. doi:10.1007/s12559-012-9173-0

延伸閱讀