言語表現ジャンルに応じたAIテキストマイニング手法の活用－書き言葉と話し言葉の特徴抽出を中心に－

本論文では、言語の質的分析の中でも、従来のように一般的傾向を抽出する量的手法を基礎としたデータマイニング的方向ではなく、作品の読解．理解について新しい観点や特徴を見いだすための言語作品の個別的特性を取り出す方向で、文章ジャンルとの関係でAIテキストマイニング結果にどのような特徴抽出に差異が生じるのかを考察していきたい。今まで文学と論説のジャンルを主として考察をおこなってきたが、今回はさらに範囲を広げ、書き言葉では人文社会系の哲学．思想、宗教、歴史、心理、社会、教育の文献、話し言葉では自然談話を取り上げる。その結果、話し手の思いを述べる文章である書き言葉の人文社会系論文は、多次元尺度構成法で内容の重要点がかなり正確に抽出できるが、共起ネットワークではキーワード以外は見出しにくい。新聞社説のような論説、小説と比べると、多次元尺度構成法では論説と論文は似ているが、共起ネットワークでは論文は小説に似ている。また談話資料の場合と比べると、テキストマイニングの二つの方法談話内容の重要キーワードをそれぞれ抽出できるが、細部の内容までは推測できない。しかし、要素の分布の相違で会話の充実度が推測できる可能性がある。談話の場合は文章とは異なった傾向があり、書き言葉と話し言葉の質的差異について、テキストマイニングを基準にして、今後、考察していける可能性があると言えよう。

關鍵字

ジャンル； AIテキストマイニング；文章；談話；特徵抽出

並列摘要

在本論文中，我想研究從AI文詞探勘結果中提取與文本體裁有關的特徵的差異，以便為閱讀和理解作品找到新的視角和特徵，而不是基於定量方法提取一般傾向的傳統數據挖掘方向。到目前為止，我主要關注文學和編輯的體裁，但這次我將擴大範圍，包括人文和社會科學論文：哲學和思想、宗教、歷史、心理學、社會和教育，以及自然話語。因此，對於人文社會科學論文這種描述說話人思想的書面文字，多維縮放法可以相當準確地提取出內容的重要點，但在共現網絡表很難找到關鍵詞以外的東西。與報紙社論和小說等社論相比，社論和論文在多維標度法中是相似的，但論文在共現網絡中與小說相似。此外，與話語材料的情況相比，這兩種文詞探勘方法可以分別提取話語內容的重要關鍵詞，但不能猜測內容的細節。然而，可以通過元素分佈的差異來估計對話的豐富程度。就話語而言，存在著與書面文本不同的傾向，可以說，未來有可能在文詞探勘的基礎上考慮書面語言和口語的質的區別。｜In this paper, I would like to examine the differences in the extraction of features from AI text mining results in relation to the genre of the text, in order to find new perspectives and features for reading and understanding of the work, rather than the conventional data mining direction based on quantitative methods to extract general trends. In the past, I have mainly focused on the genres of literature and editorials, but this time I will expand the scope to include the humanities and social sciences, including philosophy, thought, religion, history, psychology, society, and education, as well as nature discourse. As a result, for humanities and social sciences papers in written form, which are sentences expressing the speaker's thoughts, the multidimensional scaling method can extract the important points of the content quite accurately, but it is difficult to find anything other than keywords in the co-occurrence network. Compared with editorials such as newspaper editorials and novels, editorials and articles are similar in the multidimensional scaling method, but articles are similar to novels in the co-occurrence network. In addition, compared with the case of discourse materials, the two methods of text mining can extract the important keywords of discourse contents respectively, but they cannot guess the details of the contents. However, there is a possibility that the richness of the conversation can be inferred from the difference in the distribution of the elements. In the case of discourse, there is a different tendency from that of written text, and it can be said that there is a possibility that the qualitative difference between written and spoken language can be examined in the future using text mining as a standard.

並列關鍵字

文本體裁； AI文詞探勘；書面語言；自然話語；特徵提取｜genres ； AI text mining ； written language ； nature discourse ； extraction of features

參考文獻

石垣達也、町田和哉、小林隼人、高村大也、奥村学 (2020)「質問－回答ぺアを活用する半教師あり抽出型質問要約モデルとその学習法」『自然言語処理』27-4 pp.825-852

Google Scholar

岩倉友哉、吉川和 (2020)「化学分野への言語処理の応用」『自然言語処理』27-4 pp.969-973

Google Scholar

大澤歩夢、高木翼、中西美和 (2021)「テキストマイニングを用いたヒヤリハット報告からのGood Job抽出の試み－航空安全情報自発的報告制度（VOICES）によるデータの分析」『ヒューマンファクターズ』25-2 pp.62-77

Google Scholar

奥田慎平、五十嵐祐 (2021)「テキストマイニングによるコロナ禍の消費者心理・行動の定量化」『人工知能学会全国大会論文集』JSAI2021-ID3-OS-3b-0

Google Scholar

落合由治、曾秋桂、王嘉臨、葉夌 (2020)「日本語テキストマイニング技術の文学語学教育分野への応用可能性の検討」『言語処理学会第26回年次大会発表論文集』pp.497-500

Google Scholar

延伸閱讀

葉姿吟（2022）。使用AI文字探勘調查漢字教育研究的現況－以論文要旨的分析為主－｜Using AI Text Mining techniques to analyze the research of teaching kanji: take the abstract as an example。銘傳日本語教育，(25)，18-36。https://www.airitilibrary.com/Article/Detail?DocID=10296271-202210-202209280013-202209280013-18-36
落合由治（2021）。AI文本探勘技術應用於比較文化素養方面：活用語言資訊之定量指標與情感分析｜Application of AI Text Mining Technology to Comparative Cultural Literacy: Using quantitative indicators of linguistic information and sentiment analysis。淡江日本論叢，()，77-101。https://www.airitilibrary.com/Article/Detail?DocID=2075356X-202112-202203280012-202203280012-77-101
落合由治（2018）。關於語態的語法範疇探討－針對主動語態和被動語態之外的區域－｜A consideration of grammatical categories on Voice: Aiming at areas beyond active and passive。淡江日本論叢，()，1-25。https://www.airitilibrary.com/Article/Detail?DocID=2075356X-201812-201903250012-201903250012-1-25
林欣慧（2020）。麥生本系統《源氏物語》須磨卷之文本－由異文所依據之資料來源見其生成背景－│The Text of Suma Volume of The Tale of Genji in The Case of The Version Represented by Munyuubon: Analyze The Way it Formatted by Researching The Sourse of The Differences。台大日本語文研究，(40)，51-75。https://doi.org/10.6183/NTUJP.202012_(40).0003
賴錦雀（2022）。從詞彙調查和搭配詞看“～やか”類型形容動詞－兼論與同根．同漢字表記形容詞之比較－｜＂-Yaka＂ type adjective as seen from vocabulary survey and collocation: For comparison with cognate and Chinese character notation adjectives。台灣日語教育學報，()，201-230。https://doi.org/10.29758/TWRYJYSB.202206_(38).0008

國際替代計量

言語表現ジャンルに応じたAIテキストマイニング手法の活用－書き言葉と話し言葉の特徴抽出を中心に－

全文下載

主題瀏覽