  • 期刊
  • OpenAccess


專注於書面和口頭語言的特徵提取、AI文詞探勘方法在不同類型的語言表達中的應用|Application of AI Text Mining Methods to Different Genres of Linguistic Expressions: Focusing on Feature Extraction of Written and Spoken Language




在本論文中,我想研究從AI文詞探勘結果中提取與文本體裁有關的特徵的差異,以便為閱讀和理解作品找到新的視角和特徵,而不是基於定量方法提取一般傾向的傳統數據挖掘方向。到目前為止,我主要關注文學和編輯的體裁,但這次我將擴大範圍,包括人文和社會科學論文:哲學和思想、宗教、歷史、心理學、社會和教育,以及自然話語。因此,對於人文社會科學論文這種描述說話人思想的書面文字,多維縮放法可以相當準確地提取出內容的重要點,但在共現網絡表很難找到關鍵詞以外的東西。與報紙社論和小說等社論相比,社論和論文在多維標度法中是相似的,但論文在共現網絡中與小說相似。此外,與話語材料的情況相比,這兩種文詞探勘方法可以分別提取話語內容的重要關鍵詞,但不能猜測內容的細節。然而,可以通過元素分佈的差異來估計對話的豐富程度。就話語而言,存在著與書面文本不同的傾向,可以說,未來有可能在文詞探勘的基礎上考慮書面語言和口語的質的區別。|In this paper, I would like to examine the differences in the extraction of features from AI text mining results in relation to the genre of the text, in order to find new perspectives and features for reading and understanding of the work, rather than the conventional data mining direction based on quantitative methods to extract general trends. In the past, I have mainly focused on the genres of literature and editorials, but this time I will expand the scope to include the humanities and social sciences, including philosophy, thought, religion, history, psychology, society, and education, as well as nature discourse. As a result, for humanities and social sciences papers in written form, which are sentences expressing the speaker's thoughts, the multidimensional scaling method can extract the important points of the content quite accurately, but it is difficult to find anything other than keywords in the co-occurrence network. Compared with editorials such as newspaper editorials and novels, editorials and articles are similar in the multidimensional scaling method, but articles are similar to novels in the co-occurrence network. In addition, compared with the case of discourse materials, the two methods of text mining can extract the important keywords of discourse contents respectively, but they cannot guess the details of the contents. However, there is a possibility that the richness of the conversation can be inferred from the difference in the distribution of the elements. In the case of discourse, there is a different tendency from that of written text, and it can be said that there is a possibility that the qualitative difference between written and spoken language can be examined in the future using text mining as a standard.


石垣達也、町田和哉、小林隼人、高村大也、奥村学 (2020)「質問-回答ぺアを活用する半教師あり抽出型質問要約モデルとその学習法」『自然言語処理』27-4 pp.825-852
岩倉友哉、吉川和 (2020)「化学分野への言語処理の応用」『自然言語処理』27-4 pp.969-973
大澤歩夢、高木翼、中西美和 (2021)「テキストマイニングを用いたヒヤリハット報告からのGood Job抽出の試み-航空安全情報自発的報告制度(VOICES)によるデータの分析」『ヒューマンファクターズ』25-2 pp.62-77
奥田慎平、五十嵐祐 (2021)「テキストマイニングによるコロナ禍の消費者心理・行動の定量化」『人工知能学会全国大会論文集』JSAI2021-ID3-OS-3b-0
落合由治、曾秋桂、王嘉臨、葉夌 (2020)「日本語テキストマイニング技術の文学語学教育分野への応用可能性の検討」『言語処理学会第26回年次大会発表論文集』pp.497-500
