透過您的圖書館登入
IP:18.188.218.184
  • 學位論文

美國 SEC 10-Q 季報項目擷取

Item Extraction for SEC 10-Q Reports

指導教授 : 盧信銘
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


隨著數據分析的快速發展和財務數據的高速增長,越來越多的研究關注於文字分析在財報中的應用。美國證券交易委員會 (SEC) 要求的年度 10-K 財報和季度 10-Q 財報一直是金融文字分析研究人員的熱門研究主題。但是,財報中的幾個財報項目比較常被單獨研究與討論。因為財報項目擷取的品質會影響後續研究的結果,因此如何準確提取研究人員需要的特定財報項目成為一個重要的問題。在我們的實驗中,我們提出了一個基於注意力的 BiLSTM-CRF 模型來從 10-Q 財報中擷取財報項目。首先,我們設計了人工標註規則,並為訓練過程提供了一個 10-Q 財報項目擷取的資料集。其次,我們建立了一個基於注意力的 BiLSTM-CRF 模型。本模型是一個端到端模型,不需要複雜的特徵設計以及對於特定任務的背景知識,我們首先輸入 10-Q 財報中的所有單詞,然後模型依序輸出每一行的標記決策。我們的研究結果表明,基於注意力的 BiLSTM-CRF 模型在擷取所有財報項目和擷取特定財報項目中都有良好的表現。

並列摘要


With the rapid development in data analysis and high-speed growth in financial data, more and more researchers are focusing on financial report text analysis. 10-K and 10-Q reports required by the United States Securities and Exchange Commission (SEC) have always been popular sources for researchers in financial text analysis. However, several items in the financial report are more common to be discussed separately. As the quality of the extracted item affects the subsequent results and analysis. Extracting items accurately becomes an important issue. In this study, we propose an attention-based BiLSTM-CRF model to extract items from the 10-Q reports. We developed human annotation rules and an annotated 10-Q dataset for model training. We have also developed an attention-based BiLSTM-CRF model for item extraction. Our model is an end-to-end model that does not require hand-crafted features and task-specific knowledge. Our model takes tokens from a 10-Q document and outputs the predicted tags for each line. Our experimental results show that the attention-based BiLSTM-CRF model performs well in both the all items extraction and the selected items extraction.

參考文獻


Akbik, A., Blythe, D., Vollgraf, R. (2018). Contextual string embeddings for sequence labeling. Proceedings of the 27th International Conference on Computational Linguistics,
Basu, S., Ma, X., Briscoe-Tran, H. (2022). Measuring multidimensional investment opportunity sets with 10-K text. The Accounting Review, 97(1), 51-73.
Bengio, Y., Simard, P., Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2), 157-166.
Bonsall, S. B., Leone, A. J., Miller, B. P., Rennekamp, K. (2017). A plain English measure of financial reporting readability. Journal of Accounting and Economics, 63(2-3), 329-357.
Brown, N. C., Crowley, R. M., Elliott, W. B. (2020). What are you saying? Using topic to detect financial misreporting. Journal of Accounting Research, 58(1), 237-291.

延伸閱讀