基於餘弦和模糊相似度方法之漸進式企業電子郵件分類

由於現今網路的發達以及方便性，使得電子郵件的使用量大幅上升。許多企業也將電子郵件視為與客戶或是企業內部員工相互傳訊的重要管道，因此對於公司而言，企業電子郵件系統的控管也變得相對重要。然而，許多員工利用企業的郵件系統傳送私人信件的情況是無可避免的。此現象帶來的後果是，私人郵件不但佔用郵件系統的頻寬造成系統效能降低，甚至可能造成企業的重要商業郵件延遲或無法順利寄出，造成公司商業上的損失。而且隨著隱私權意識的抬頭，如何在不監控郵件內容的情況下，將私人與商業郵件進行分類，以提升公司的商業效益，為本研究的目的。為達到此目的，本研究只使用電子郵件之表頭資料(Header)而非郵件內容，雖然可能會降低分類的準確度，但卻能保護員工的隱私權。利用萃取出來的表頭資料，使用餘弦和模糊相似度的方法進行企業電子郵件的分類。更重要的是，本研究提出的漸進式系統可有效地避免處理累計的龐大郵件資料量，而且也考慮到隨著時間的改變，公司內部的人員流動或是客戶族群的變動問題。

關鍵字

模糊相似度；餘弦相似度；電子郵件分類

並列摘要

Nowadays, the usage amounts of email have increased because Internet becomes more common. Many enterprises regard email as an essential way for business in contacting with customers or employees. Therefore, the management of email system becomes even more important for an enterprise. However, it is unavoidable that a lot of employees send private emails by enterprise email system. It has brought negative effect to email system because the bandwidths are used by personal purpose. What worse, it may delay or affect in sending significant business emails. It may decrease the interests of an enterprise. Moreover, public becomes to take care about privacy. How to classify enterprise emails as either business or personal emails to improve the business interests without monitoring the contents of email. This is the goal of the paper. To achieve this purpose, only the header of email will be used. The contents in this paper will not. Although it may lower the accuracy of classification. It will protect employee’s private rights. Using the cosine similarity and fuzzy similarity approaches to classify enterprise emails by extracted email header. More important, the incremental system which this paper purposed could effectively avoid handling the huge amount of cumulate emails. And it also considers the change of internal staffs or customers of an enterprise with passing of time.

並列關鍵字

fuzzy similarity ； E-mail classification ； cosine similarity

參考文獻

[5] Grupe, F. H. and M. M. Owrang, “Data mining discovering new knowledge and cooperative advantage,” Information Systems Management,12(4), pp. 26-31,1995.

[6] Fayyad, U., G. P. Shapiro and P. Smyth, “From Data Mining to Knowledge Discovery in Database”, AI Magazine, Vol. 17, pp.37-54, 1996.

[9] P. Taninpong and S. Ngamsuriyaroj, “Incremental Naive Bayesian Spam Mail Filtering and Variant Incremental Training”, Eight IEEE/ACIS International Conference on Computer and Information Science, pp. 383-387, 2009.

[10] R. Kothari and M. Dong, “Decision Trees for Classification: A Review and Some New Results”, World Scientific, 2000.

[11] C. Apte, F. Damerau, and S.M. Weiss, “Automated Learning of Decision Rules for Text Categorization”, in ACM Transactions on Information Systems, 1994.

被引用紀錄

沈育信（2015）。以N-gram為基礎之網路新聞讀者情緒預測方法〔碩士論文，淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2015.00802

宋志偉（2013）。利用資料探勘方法進行公路客運旅行時間之推估〔碩士論文，中原大學〕。華藝線上圖書館。https://doi.org/10.6840/cycu201300362

楊銘鴻（2013）。自動推薦於資源整合最佳化之研究-以健康休閒產業育成資源為例〔碩士論文，國立臺北大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0023-2607201314413300

國際替代計量

基於餘弦和模糊相似度方法之漸進式企業電子郵件分類

未授權

主題瀏覽