營建工程專案的施工進度監控是工程能否如期完成的重要關鍵,而工程專案常常會因為不同因素而發生延誤,尤其是室內施工階段。近年來,很多研究透過收集工地室內的資訊(如二維照片、三維點雲),透過不同技術來辨識室內元件的狀態,並與建築資訊模型(Building Information Model)或工項排程文件進行比對,以獲得室內個別工項(如室內隔間牆、電氣系統、管道系統)的施工進度。然而這些工程進度訊息並未包含足夠的文字語義資訊。目前,室內工程的工程進度量測及記錄大多仍倚賴人工對所有室內工項及元件的狀態進行逐項檢查,並以簡單的圖片和文字進行記錄其變化。這種記錄方式除了耗時和費力以外,記錄的品質和仔細程度也取決於工程檢查人員的主觀判斷及專業度。因此為了彌補目前研究缺少語義資訊的缺口以及協助工程人員進行快速與客觀的室內元件狀態變化的文字記錄,本研究提出一種利用多模態大型語言模型(Multimodal Large Language Model)描述室內元件狀態變化的框架,記錄室內元件的增減及位移等資訊,如牆面從未粉刷狀態變成已粉刷狀態,或偵測電燈、窗戶等是否已安裝,協助室內元件狀態檢查及記錄流程更加自動化。具體來說,本研究利用兩個同一位置但不同時段的室內房間點雲,合併後輸入至多模態大型語言模型中,以獲得兩個時段之間的室內元件狀態變化的文字描述。本研究經實際工地案例的收集資料測試後,成功描述室內元件的狀態及他們之間的變化。實驗對模型生成的描述與人類標註的描述兩者進行了評估,結果顯示,關注句子結構的ROUGE-L分數達到0.505,而關注精確匹配、語義和詞序的METEOR分數則達到0.415。此外,經過GPT-4對兩者的比較和評估,所評估的分數為55分,代表模型生成的描述能達到人類基準的55%。這些成果展現了模型偵測及描述室內元件的狀態變化的能力。
Monitoring the construction progress of building projects is crucial to ensuring timely completion, particularly during the interior construction phase, which is often prone to delays due to various factors. Recent studies have increasingly focused on collecting on-site interior data, such as 2D images and 3D point clouds, to identify the status of interior components. These components are then compared with Building Information Models (BIM) or project scheduling documents to assess the progress of specific interior tasks, such as partition walls, electrical systems, and piping systems. However, the progress information obtained from these methods often lacks sufficient semantic detail. Currently, the measurement and recording of construction progress for interior projects predominantly rely on manual inspection of each interior component. Inspectors typically record changes using simple images and text, a process that is time-consuming and labor-intensive, with the quality and thoroughness of the records heavily dependent on the inspector's subjective judgment and expertise. To address the gap in semantic information in current research and to assist construction personnel in swiftly and objectively documenting the status changes of interior components, this study proposes a framework that utilizes a Multimodal Large Language Model (MLLM) to describe changes in the status of interior components. This framework captures and records information on additions, removals, and relocations of interior components, such as the transition of a wall from an unpainted to a painted state, or the installation of lighting fixtures and windows. The goal is to make the inspection and documentation process for interior components more automated. Specifically, this study involves combining two 3D point clouds of the same interior space, captured at different times, and inputting them into an MLLM to generate a textual description of the changes in interior component status between the two time points. After testing this framework with real construction site data, the study successfully described the status of interior components and their changes. The generated descriptions were evaluated against human-annotated descriptions, with a ROUGE-L score of 0.505, indicating sentence structure accuracy, and a METEOR score of 0.415, reflecting precision in matching semantics and word order. Additionally, a comparison and evaluation conducted using GPT-4 resulted in a score of 55, suggesting that the model-generated descriptions achieved 55% of the quality of human benchmarks. These results demonstrate the model's capability in detecting and describing changes in the status of interior components.