透過您的圖書館登入
IP:3.139.86.56
  • 學位論文

運用文字探勘分析人民日報的風格變遷

A Study of Writing Style of The People’s Daily

指導教授 : 陳麗霞 余清祥
本文將於2026/08/01開放下載。若您希望在開放下載時收到通知,可將文章加入收藏

摘要


大數據發展促使各類型資料的數位化,文字探勘更是當中典範,在不同領域都可看到相關應用,寫作風格是常見議題之一。然而,文章風格容易受到議題的影響,即便是同一作者或文本,文字使用可能因為時空背景等因素而產生差異。以中國共產黨機關報刊《人民日報》為例,內容及題材不僅呈現當代特色,也會顧及官方立場與目的,該報的特色變化可反映中共建國至今的政治及社會變遷。因此本文以《人民日報》的風格變化為研究目標,藉由比較各年度的遣詞用字差異,透過統計方法及分群劃分不同時期;另外,本文也運用多種關鍵詞偵測指標,篩選各時期的代表詞作為分類的解釋變數,希望能夠兼顧準確率、運算速度、解釋性。 本文以《人民日報》1949~2019年頭版報導為研究素材,因為頭版內容大多涉及全國性及國際等重大事務,避免某些地方性事務造成用詞的異質性。本文先考量探索性資料分析,包括字、詞以及字詞的Jaccard、Yue相似指標,挖掘《人民日報》的文字基本特性;接著套用群集分析近年中國分成數個時期,再與專家的分期結果比較。研究發現:透過雙字詞更能看出各時期的差異,如果以雙字詞或相似指標進行分群,《人民日報》可分為四個時期(或可命名為「建國」、「文化革命」、「改革開放」、「現代化」),不同分群方法的分析結果相當一致,而各時期的用詞風格有明顯差異。另外,分類解釋變數的挑選以本文提出的代表詞偵測指標最佳,無論是準確率、運算速度、解釋性三者的結果,都優於卡方指標或維度縮減等方法。

並列摘要


Big data enhances the quantitative analysis in all kinds of data and text mining is one of them. Identifying authors’ writing style is one popular topic of text mining. However, the writing style can be affected by, for example, the theme and language of articles. Take the People’s Daily, official newspaper of the Central Committee of the Chinese Communist Party, as an example. The Chinese Communist Party attaches great importance to the People's Daily, and has given strong guidance to the work of the People's Daily in all periods of revolution, construction and reform. In order words, through the text analysis of the People’s Daily, we may find the changes of political/social environment of Chinese Communist Party, and we want to know if it is possible to differentiate different periods of China (1949~2019) via text analysis of the articles in the People’s Daily. We first conduct exploratory data analysis, including characters, words, Jaccard and Yue’s Index. Then we use cluster analysis to divide modern China into several periods, and then compare with the results of experts' research. The research found that the differences between the periods can be more clearly seen through the two-character words. If the two-character words or similar indicators are used to cluster, the People's Daily can be divided into four periods. Besides, we use multiple keyword indicators to select representative words in each period, and we select these representative words as explanatory variables to classify. Whether in terms of accuracy, calculation speed, or explanatory performance, it is better than chi-square indicators or dimensionality reduction methods.

參考文獻


一、中文文獻
1. 王宇(2012)。「框架視野下的食品安全報導——以《人民日報》近10年的報導為例」,《現代傳播: 中國傳媒大學學報》,34(2),頁43-47。
2. 曲青山(2021)。《中國共產黨百年輝煌》。北京市: 人民出版社。
3. 余清祥、葉昱廷(2020)。「以文字探勘技術分析臺灣四大報文字風格」,《數位典藏與數位人文》,6,頁69-96。
4. 於韜、王洪岩(2018)。「基於 TF-IDF 算法的文本信息提取」,《科技視界》,16,頁117-118。

延伸閱讀