建築業多年來一直是工安死亡事故居高的行業之一。儘管企業和勞動部職業安全衛生署(Occupational Safety and Health Administration, OSHA)努力降低事故發生率,由OSHA記錄的違規案件仍居高不下。良好的安全規劃,尤其是在專案早期階段,是防止未來事故的必要條件。為了實現這一目標,多年來人們進行了大量的研究, 包括電腦視覺、建築資訊建模( Building Information Modeling, BIM)、規則化編程和自然語言處理(Natural Language Processing, NLP)的應用。這個研究的目的在於為施工排程建立一個危害識別系統,以便在專案早期階段識別危害。本研究方法選擇了詞頻-逆文檔頻率(Term Frequency – Inverse Document Frequency, TF-IDF)方法,並結合關鍵詞的映射,以創建 一個能夠識別危害類型、頻率和來源的模型。透過從排程中提取關鍵詞並將其作 為搜尋OSHA數據庫的輸入詞,TF-IDF能夠在事故的最終敘述中搜索到相關危害記 錄。根據模型在訓練和測試過程所獲得的閾值,最終敘述被篩選出來。總體來說,訓練和測試顯示的正向結果表明TF-IDF能夠在不犧牲精度的前提下展示危害的類 型和來源。這項研究將有助於更快速和精確的危害識別,並可作為進一步危害分 析的基礎。
The construction industry is one of the industries that has contributed to a high number of work fatalities over the years. There have been numerous attempts to lower the number of accidents either by companies or Occupational Safety and Health Administration (OSHA). However, despite all the efforts to lower the number of casualties, the number of violations cited by OSHA is still high. Good safety planning is necessary, especially in the early stages of the project to prevent future accidents. To achieve this, much research has been done over the years, using technologies that range from computer vision, building information modeling (BIM), rule-based programming, and NLP. This research aims to create a hazard identification system based on a construction schedule so that the hazards can be identified in the early stages of the project by using NLP. The method chosen for this research is TF-IDF combined with mapping of the keywords in order to create a prototype that is able to identify the type of hazards, frequency of hazards, and source of hazards. By extracting the keywords from the schedule and using them as input in the OSHA Database, TF-IDF managed to search through the Final Narrative of accidents to find relevant hazards. The final narratives are then filtered out based on the threshold obtained from the training and testing process. Overall, the training and testing results show positively that TF-IDF is capable of showcasing types and sources of hazards without sacrificing the precision of the results. This research contributes to faster and more precise hazard identification that can later be used as a basis for further hazard analysis.