資料探勘技術應用於航空業航班延誤分析-以C公司為例

導致航班延誤的原因眾多，概分可控制因素如航空公司自身因素、機場地面作業、機務維護不周、航班調度不當等；以及不可控制因素如天氣因素、空中管制、機械故障等。對於航班延誤相關的研究有以法律面討論延誤賠償、以統計方式進行分析研究等，目前較少文獻利用資料探勘方式進行探討。本研究以個案公司以2004年至2014年由台北起飛航班資料進行延誤原因分析，對於航班延誤與上述延誤因素進行分類技術探討其相關性進而發掘出有用資訊期能對個案公司與學術界提供參考。實驗步驟本研究以WEKA3.6.10資料探勘工具進行資料分析，資料集部份以2004至2013年台北起飛航班資料依年份作為區分。Class Label設計為延誤等級及是否延誤兩種，再依Class Label將資料內容調整分為所有航班資料及只分析延誤航班最後產生三組資料集，搭配資訊增益 (Information Gain) 、基因演算法 (Genetic Algorithm) 、不做特徵選取(No Feature Selection) 資料處理方式，再以決策樹 (C4.5、CART) 、支援向量機 (Support Vector Machine) 分類方式，多重分類器部分以Adaboost與Bagging進行航班延誤分析，決定出最佳預測模型與平均最佳預測模型後再以2014年航班資料進行驗證模型是否有較佳的預測能力。經實驗結果歸納特徵選取部分不進行任何特徵選取、使用Ada Boost - Simple CART多重分類器、採用2004年航班資料進行Training Data組合之預測模型整體來看可以得到最佳的預測準確率。以資料筆數與預測準確率趨勢來看呈現反比情形，也就是資料筆數愈多準確率會隨之下降；對於錯誤歸類成本航空公司對於預測準時實際延誤預測誤差預測錯誤成本較預測延誤實際準時預測錯誤成本相對較高，本研究所建立的預測模型產生出的預測錯誤成本情形在此部份呈現較低比率顯示此模型有較佳預測錯誤成本。在預測模型判斷延誤情形綜整分析得出預測延誤最大宗為機務維護，對於未來延誤的預防可以由改善機務地停檢查及修護流程改善以縮短作業時間降低航班延誤著手供個案公司及日後相關研究參考。

關鍵字

航班延誤；資料探勘；資料前處理；單一分類器；多重分類器

並列摘要

Flight delays can be caused by many reasons. Some factors are controllable such as factors relating to airlines’ factors, airport ground handling, aircraft maintenance, improper flight scheduling. On the other hand, there are some uncontrollable factors, such as weather, air traffic control, mechanical failure. For the related studies of flight delays, very few explore the use of data mining methods. This research focuses on an airline corporation and the main factors to the cause of the delay of Taipei flight are collected from 2004 to 2014 as the dataset. Data mining techniques are used to discover useful information about flight delays and can provide some guidelines for the company and academia about the delay factors. The experiments were conducted by WEKA3.6.10. The information focuses on annual departure of airlines from 2004 to 2013, and the Class Label design is based on the flight delay. In addition, two feature selection methods are used to select representative features from the dataset, which are information gain and the genetic algorithm. The decision trees (C4.5 and CART), support vector machine (SVM), and multiple classifiers by bagging and boosting are developed as the prediction models for comparison. Furthermore, the data of 2014 are used to validate some better prediction models. Our research has evidently showed that using the training data of 2004 flight information and highly predictable model is the most accurate research method. The increased quantity of the data and the performances of the prediction methods have presented contrasting results, which means that higher quantity data will result in the loss of the predictability of the airlines. According to the incorrect prediction of airline delays, our logical explanation has concluded that when the delayed of flights has been incorrectly predicted, it results in the massive loss of production cost. This research has identified the better prediction models of flight delays for the airline companies. We have found that the greatest cause of the delayed of airlines based on our prediction models is due to the lack of regular maintenance on the machineries. We should perform regular machinery check-ups and reorganize airline schedules in order to prevent future accidents and effectively reduce the operation time and flight delayed time.

並列關鍵字

flight delays ； data mining ； data pre-processing ； single classifier ； multiple classifier

參考文獻

10.楊正三、莊麗月、陳禹融、楊正宏 (2008) ，「利用資訊增益與瀰集演算法於基因微陣列之特徵選取與分類問題」，資訊科技國際研討會論文集。

6.高棋楠 (2012) ，「資料探勘技術建構財務危機公司預警模式之研究」國立中正大學會計與資訊科技研究所碩士論文。

3.侯育周 (2007) ，「隨機性班機到離延誤下動態機門指派之研究」，國立中央大學土木工程學系碩士論文。

13.蔡世昌 (2012) ，「航空網路中航班延誤之因果模式」，國立交通大學交通運輸研究所博士論文。

8.陳彥琴 (2005) ，「應用灰色理論預測新上市之生技保健食品銷售量」，國立成功大學工業與資訊管理學系碩士在職專班論文。

國際替代計量

資料探勘技術應用於航空業航班延誤分析-以C公司為例

未授權

主題瀏覽