  • 學位論文


Hierarchical Expansion Approach for Ensemble Learning and Prediction in Mixed Datasets

指導教授 : 吳政鴻


在工業與商業應用領域的混合資料集中,經常會面臨類別屬性和數值屬性之間具有複雜交互作用的問題,在商業領域方面,價格預測是重要的商業問題,而產品與服務價格同時受到各種類別與數值屬性影響,例如電力需求和價格會受到不同季節、商務活動與化石燃料價格等因素影響;而製造系統中,不同機器類和產品屬性組合,對生產率有不同程度的影響,過往在面對混合資料集的預測問題,對於複雜的產品組合模式普遍針對單一資料集建構預測模型,其預測模型無法因應生產現場的產品組合變化而動態調整,導致後續安排加工時間不易並難以精準規劃產品交期。 本研究將針對混合資料集,以類別屬性將資料集進行階層式的切分,藉由階層式展開方法,可以改善過去傳統機器學習方法的缺點,在減少運算的複雜度的同時建立準確度更高的預測模型,當系統中的特徵或屬性增加時,所需要訓練的模型不會隨之大幅度增加,仍然能夠維持有效率的運算效能,另外透過階層式展開與模型選擇,追溯機器學習的推論結果以提升模型的解釋力,並且以階層式展開模型應用至非等效機台的排程問題。數值分析結果顯示,在半導體混合資料集中,與XGBoost模型相比,運用階層式展開方法可以降低17.7%的均方根誤差值,且與張鈺欣(2020)提出的分層組合模型更具運算效能與預測效能;由於階層式展開預測模型的準確度與精確度,以其預測非等效機台的排程問題的最佳化模型參數,能夠有效提升排程的效果,且掌握生產系統中的不確定因子。


分層方法 分群 預測模型 機器學習 排程


It is a common issue that there is complicate interaction between categorical attributes and numerical attributes when detailing with mixed dataset in engineering and business area. In business application area, price prediction is an important task. However, product and service price could be diverse under different categorical attributes and numerical attributes. As for manufacturing system, various types of machines and product attributes combinations would have different impact on throughput rate. In the past, when facing with throughput rate forecasting problem, we usually build single prediction model which can not dynamically adjust in response to changes of job combinations at production site and lead to difficulty in scheduling and planning product delivery. In this study, we propose a hierarchical expansion method which is doing hierarchical data segmentation by categorical attributes to improve disadvantages of traditional machine learning method. The proposed method can simultaneously reduce computational effort and building prediction models with higher accuracy. Numerical results demonstrate the potential of the proposed method in a mixed dataset from a semiconductor manufacturer. In comparison with XGBoost model, around 17.7% reduction is observed. And hierarchical expansion method has higher accuracy and need lesser computational effort than partial combination prediction models proposed by Chang(2020). Due to the high accuracy and precision, using the proposed method to predict parameters of unrelated parallel machine scheduling problem can improve the optimization performance and control the uncertainty in the manufacturing system.


Afzalirad, M., Rezaeian, J. (2016). Resource-constrained unrelated parallel machine scheduling problem with sequence dependent setup times, precedence constraints and machine eligibility restrictions. Computers Industrial Engineering, 98, 40-52.
Andreopoulos, B., An, A., Wang, X. (2006). Bi-level clustering of mixed categorical and numerical biomedical data. International journal of data mining and bioinformatics, 1(1), 19-56. doi:10.1504/ijdmb.2006.009920
Asi, H., Duchi, J. C. (2019). The importance of better models in stochastic optimization. Proceedings of the National Academy of Sciences, 116(46), 22924-22930.
Barcelo-Rico, F., Diez, J.-L. (2012). Geometrical codification for clustering mixed categorical and numerical databases. Journal of Intelligent Information Systems, 39(1), 167-185. doi:10.1007/s10844-011-0187-y
