Title

馬氏-田口系統:理論及其應用

Translated Titles

Mahalanobis-Taguchi System: Theory and Applications

Authors

蕭宇翔

Key Words

馬氏-田口系統 ; 分類 ; 資料類別不平衡問題 ; 閾值 ; 特徵選取 ; 音訊辨識 ; 特徵萃取 ; 音色 ; Mahalanobis-Taguchi System (MTS) ; Classification ; Class imbalance problem ; Threshold ; Feature selection ; Sound signal recognition ; Feature extraction ; Timbre

PublicationName

清華大學工業工程與工程管理學系學位論文

Volume or Term/Year and Month of Publication

2009年

Academic Degree Category

博士

Advisor

蘇朝墩

Content Language

英文

Chinese Abstract

近年來,由於在大量資料的可得性上有重大發展,以及在對於將現有資料轉化為可用資訊或知識上有著相當迫切的需求,資料探勘於是在資訊產業中受到了相當的重視。馬氏-田口系統是由田口玄一博士所發展的新資料探勘工具,為一種多變量資料分析技術,可應用於於診斷、預測、二元分類及特徵選取等方面,目前已成功地運用在各式實際的問題分析上。 此論文著眼於馬氏-田口系統理論的深究與擴展,以改善馬氏-田口系統目前所存在於理論與應用上的主要限制與缺點,並加強馬氏-田口系統的可靠度與實用性。最後,本研究亦以若干實際案例的問題解決來具體地呈現上述研究成果所得的效益。本研究之主要內容概述如下: 在理論方面,本研究探討馬氏-田口系統對於處理「資料類別不平衡問題」的穩建性。資料類別不平衡問題所指的是某類別的樣本數量顯著大於另一類別之樣本數量。資料類別不平衡問題通常會降低現有分類技術在判別少量類別樣本時的敏感度。然而,少量類別在實際應用上往往是較為重要的,因而對少量類別的誤判將造成整體系統的損失。此外,決定馬氏-田口系統的分類閾值在實際應用上一直是個懸而未決的問題,因此,本研究同時針對此問題,利用柴比雪夫定理提出「機率閾值訂定法」來獲得二元分類問題的適當閾值。另一方面,由於多類別問題在應用上的普遍性,本研究以現有的馬氏-田口系統理論為基礎,提出新的多類別分類及特徵選取方法,稱為「多類別馬氏-田口系統」。上述針對理論的探討與擴展也以若干資料數據集來進行有效性的驗證。 在應用方面,本研究以三個實際案例呈現所提出方法的實用性。案例一:縮減行動電話無線射頻功能檢驗屬性。由於通過測試之行動電話遠多於未通過者,因此,本案例之數據為典型的「非平衡型態」。本研究使用馬氏-田口系統及機率閾值訂定法來有效移除檢測流程中不必要的檢驗屬性,並在減少屬性的情況下仍然保有百分之百的檢驗正確性。案例二:妊娠糖尿病發展為第二型糖尿病之預測與風險因子確認。本案例之數據分為三類:無發展為第二型糖尿病者、呈現第二型糖尿病徵兆者、已發展為第二型糖尿病者。運用所發展的多類別馬氏-田口系統,可有效預測患有妊娠糖尿病之孕婦是否在妊娠結束後會發展為第二型糖尿病患者。並且,透過風險因子的確認,可提供疾病預防與衛教上的若干用途。最後,案例三:為薩克斯風製造業建立一套「多類別自動化音色檢驗系統」。該檢驗系統旨在降低人為聽力在辨識音色上的不穩定性與偏差。本案例之薩克斯風音色分為:不合格、一般品質與高品質。為此,多類別自動化音色檢驗系統除採用所發展的多類別馬氏-田口系統外,一套針對聲音或震動等一維訊號辨識而設計的波形特徵萃取法亦被提出,並用於薩克斯風聲音訊號的特徵萃取上。應用結果顯現,多類別自動化音色檢驗系統,可達到百分之百的檢驗正確率。

English Abstract

In recent years, data mining has attracted a great deal of attention in information industry because of the wide availability of huge amounts of data and the imminent need for turning such data into useful information and knowledge. The information and knowledge gained can be used for applications of business management, production control, engineering design, and so on. The Mahalanobis-Taguchi System (MTS), developed by Dr. Taguchi, is a relatively new data mining tool. MTS is a collection of methods proposed for diagnosis, forecasting, binary classification, and feature selection technique using multivariate data, and has been successfully used in various applications. This study aims to explore and extend the theory of MTS and seeks to improve its existing limit and drawbacks in both theoretical and practical domains to reinforce the reliability and practicality of MTS. Finally, several real case problems are employed and solved to specifically show the benefit coming from implementing the above-mentioned studies. The contents of this study are described as follows: In the theoretical aspect, this study investigates the reliability and robustness of MTS for dealing with the “class imbalance problems”. In the class imbalance problems, one class might be represented by a large number of examples, while the other class, usually the more important class, is represented by only a few. Class imbalance problems always diminish the performance of classification algorithms and cause classification bias. That is, the tendency is that the classifier will produce high predictive accuracy over the majority class, but will predict poor over the minority class. This may lead to a great loss for whole system. Besides, to solve the pending practical issue of determining the classification threshold for MTS, we also develop a “probabilistic thresholding method” on the basis of the Chebyshev’s theorem to derive an appropriate threshold for binary classification. On the other hand, because of the frequent occurrence of multi-class problems in real applications, a novel multi-class classification and feature selection method, namely, multi-class Mahalanobis-Taguchi System (MMTS) is developed on the basis of MTS theory. Through establishing an individual Mahalanobis space for each of the multiple classes and applying the proposed “weighted Mahalanobis distance” as the distance metric for classification, MMTS can achieve the application of multiple classes. For validating our point of view and the proposed methodologies, some datasets are used in the numerical experiments and comparisons. In the application aspect, three real cases are solved using MTS and our proposed MMTS. The purpose of first case is to reduce the number of radio frequency inspection attributes in the mobile phone manufacturing process. In this case, there are two inspection outcomes: pass and fail, and the collected data are typically imbalanced. Thus, MTS with our proposed probabilistic threshoding method is employed to detect and remove the redundant inspection attributes. The results show that the number of attributes is significantly reduced without losing inspection accuracy. The second case is about predicting the development of type 2 diabetes mellitus from gestational diabetes mellitus. This case is a multi-class application, and therefore we use the proposed MMTS to identify the significant risk factors of developing type 2 diabetes mellitus from gestational diabetes mellitus and further predict the occurrence of type 2 diabetes mellitus. Through MMTS, good prediction accuracy is obtained and the risk factors are found out. By monitoring the risk factors, medical personnel can effectively take care of the gestational diabetes mellitus women and thus help prevent from the occurrence of type 2 diabetes mellitus and ensure their health. The final case attempts to establish an automatic multi-class timbre classification system (AMTCS) to prevent from the timbre judgment bias caused from human hearing and increase the accuracy and reliability of timbre quality inspection in alto saxophone manufacture. For this purpose, in addition to employing MMTS, a feature extraction method, called “waveform shape-based feature extraction method (WFEM)”, for one-dimensional signal recognition, such as vibration and sound is developed and used to extract the saxophone sound features. Through employing the AMTCS, strong assistance are provided to implement the final timbre inspection of alto saxophone. The results show that AMTCS achieves 100% saxophone timbre inspection accuracy.

Topic Category 工學院 > 工業工程與工程管理學系
工程學 > 工程學總論
社會科學 > 管理學
Reference
  1. [2] B. Walczk and D. L. Massart, “Rough sets theory,” Chemometrics and Intelligent Laboratory Systems, vol. 47, pp. 1-16, 1999.
    連結:
  2. [3] R. A. Johnson and D. W. Wichern, Applied Multivariate Statistical Analysis, Prentice-Hall, 1998.
    連結:
  3. [4] H. Kim and G. J. Koehler, “Theory and practice of decision tree induction,” Omega, vol. 23(6), pp. 637-652, 1995.
    連結:
  4. [5] N. Japkowicz, “Learning from imbalanced data sets: a comparison of various strategies,” Learning from Imbalanced Data Sets: The AAAI Workshop, pp. 10-15, 2000.
    連結:
  5. [9] J. W. Grzymala-Busse, J. Stefanowski, and S. Wilk, “A comparison of two approaches to data mining from imbalanced data,” Lecture Notes in Computer Science, vol. 3213, pp. 757-763, 2004.
    連結:
  6. [10] P. C. Pendharkar, J. A. Rodger, G. J. Yaverbaum, N. Herman, and M. Benner, “Association, statistical, mathematical and neural approaches for mining breast cancer patterns,” Expert Systems with Applications, vol. 17, pp. 223-232, 1993.
    連結:
  7. [11] M. A. Maloof, “Learning when data sets are imbalanced and when costs are unequal and unknown,” ICML-2003 Workshop on Learning from Imbalanced Data Sets II, 2003.
    連結:
  8. [12] G. Batista, R. C. Prati, and M. C. Monard, “A study of the behavior of several methods for balancing machine learning training data,” SIGKDD Explorations, vol. 6(1), pp. 20-29, 2004.
    連結:
  9. [13] H. Guo and H. L. Viktor, “Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach,” SIGKDD Explorations, vol. 6(1), pp. 30-39, 2004.
    連結:
  10. [14] N. V. Chawla, N. Japkowicz, and A. Kolcz, “Editorial: special issue on learning from imbalanced data Sets,” SIGKDD Explorations, vol. 6(1), pp. 1-6, 2004.
    連結:
  11. [16] G. Wu and E. Y. Chang, “KBA: kernel boundary alignment considering imbalanced data distribution,” IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 6, pp.786-794, 2005.
    連結:
  12. [18] Y. Freund and R. E. Schapire, “A decision-theoretic generalization of on-line learning and an application to boosting,” Journal of Computer and System Sciences, vol. 55(1), pp. 119-139, 1997.
    連結:
  13. [20] H. Guo and H. L. Viktor, “Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach,” ACM SIGKDD Explorations Newsletter, vol. 6(1), pp. 30-39, 2004.
    連結:
  14. [22] Y. Sun, M.S. Kamel, A. K. C. Wong, and Y. Wang, “Cost-sensitive boosting for classification of imbalanced data,” Pattern Recognition, vol. 40(12), pp. 3358-3378, 2007.
    連結:
  15. [24] L. Breiman, “Bagging predictors,” Machine Learning, vol. 24(2), pp. 123-140, 1996.
    連結:
  16. [26] X. Zhu, “Lazy bagging for classifying imbalanced data,” 7th IEEE Int. Conf. Data Mining, pp.763-768, 2007.
    連結:
  17. [27] D. Tao, X. Tang, X. Li, and X. Wu, “Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28(7), pp. 1088-1099, 2006.
    連結:
  18. [31] W. H. Woodall, R. Koudelik, K. L. Tsui, S. B. Kim, Z. G. Stoumbos, and C. P. Carvounis, “A review and analysis of the Mahalanobis-Taguchi System,” Technometrics, vol. 45(1), pp. 1-15, 2003.
    連結:
  19. [33] D. M. Hawkins, “Discussion - a review and analysis of the Mahalanobis-Taguchi System,” Technometrics, vol. 45(1), pp. 25-29, 2003.
    連結:
  20. [34] J. Rajesh, G. Taguchi, and S. Taguchi, “Discussion - a review and analysis of the Mahalanobis-Taguchi System,” Technometrics, vol. 45(1), pp. 16-21, 2003.
    連結:
  21. [35] J. Srinivasaraghavan and V. Allada, “Application of Mahalanobis distance as a lean assessment metric,” International Journal of Advanced Manufacturing Technology, vol. 29, pp. 1159-1168, 2006.
    連結:
  22. [36] T. Riho, A. Suzuki, J. Oro, K. Ohmi, and H. Tanaka, “The yield enhancement methodology for invisible defects using the MTS+ method,” IEEE Transactions on Semiconductor Manufacturing, vol. 18(4), pp. 561-568, 2005.
    連結:
  23. [37] P. Das and S. Datta, “Exploring the effects of chemical composition in hot rolled steel product using Mahalanobis distance scale under Mahalanobis-Taguchi System,” Computational Materials Science, vol. 38(4), pp. 671-677, 2007.
    連結:
  24. [38] G. Taguchi, S. Chowdhury, and Y. Wu, Taguchi’s Quality Engineering Handbook, John Wiley & Sons, Hoboken, NJ, 2005.
    連結:
  25. [1] C. W. D. Justin and R. J. Victor, “Feature subset selection with a simulated annealing data mining algorithm,” Journal of Intelligent Information Systems, vol. 9, pp. 57-81, 1997.
  26. [6] N. Japkowicz and S. Stephen, “The class imbalance problem: a systematic study,” Intelligent Data Analysis, vol. 6(5), pp. 429-450, 2002.
  27. [7] C. Phua, D. Alahakoon, and V. Lee, “Minority report in fraud detection: classification of skewed data,” SIGKDD Explorations, vol. 6(1), pp. 50-59, 2004.
  28. [8] N. V. Chawla, K. Bowyer, L. Hall, and W. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 231-357, 2002.
  29. [15] K. Huang, H. Yang, I. King, and M. Lyu, “Learning classifiers from imbalanced data based on biased minimax probability machine,” Proceedings of the 04’ IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’04), pp. 558-563, 2004.
  30. [17] G. Wu and E. Chang, “Adaptive feature-space conformal transformation for imbalanced data learning,” Proc. 20th Int’l Conf. Machine Learning, pp. 816-823, 2003.
  31. [19] N. V. Chawla, A. Lazarevic, L. O. Hall, and K. W. Bowyer, “SMOTE-boost: improve prediction of the minority class in boosting,” Proc. Principles Knowledge Discovery Databases, pp. 107-119, 2003.
  32. [21] D. Mease, A. J. Wyner, and A. Buja, “Boosted classification trees and class probability/quantile estimation,” Journal of Machine Learning Research, vol. 8, pp. 409-439, 2007.
  33. [23] W. Fan, S. J. Stolfo, J. Zhang, and P. K. Chan, “AdaCost: misclassification cost-sensitive boosting,” Proc. Int. Conf. Machine Learning, pp. 97-105, 1999.
  34. [25] S. Hido and H. Kashima, “Roughly balanced bagging for imbalanced data,” http://www.siam.org/proceedings/datamining/2008/dm08_13_hido.pdf.
  35. [28] J. Zhang and I. Mani, “kNN approach to unbalanced data distributions: a case study involving information extraction,” Workshop on Learning from Imbalanced Datasets (ICML'03), 2003.
  36. [29] G. Taguchi, S. Chowdhury, and Y. Wu, The Mahalanobis-Taguchi System, McGraw-Hill, New York, NY, 2001.
  37. [30] G. Taguchi and R. Jugulum, The Mahalanobis-Taguchi Strategy, John Wiley & Sons, New York, NY, 2002.
  38. [32] A. Bovas and, V. Asokan Mulayath, “Discussion - a review and analysis of the Mahalanobis-Taguchi System,” Technometrics, vol. 45(1), pp. 22-25, 2003.
  39. [39] B. Sch
Times Cited
  1. 李翊禎(2017)。應用馬氏-田口系統建構機台健康指標-以半導體封裝研磨製程為例。成功大學工業與資訊管理學系碩士在職專班學位論文。2017。1-70。