  • 學位論文


A Hybrid-Approach Method for Schema Matching Problem in Data Exchange

指導教授 : 項衛中


供應鏈環境中的企業要能夠相互溝通,重點在於商業交易資料能相互傳遞與解譯。在傳遞資料的方面,XML已成為大部分企業採用的資料交換格式。在定義資料格式方面,每個企業所使用的產業資料交換標準可能不一樣,所以在相互解譯資料時常會有不能解讀或誤判的情況發生,可稱之為資料交換的綱要衝突問題。針對此問題,常見的解決方法有語意對應以及結構對應兩種,若單獨只考慮語意對應或結構對應可能無法解決一對多或多對一的配對情形,因而造成無法正確判定所對應的結果。 本研究提出結合語意對應與結構對應的混合式演算法-改良型SF(Similarity Flooding)演算法來解決商業交易資料交換的一對一綱要衝突問題,期望在供應鏈環境中的商業交易資料能快速且正確的對應。SF演算法原型在結構對應過程分為四個階段,第一階段以OEM結構表達需要配對的兩個綱要,第二階段將兩個綱要的OEM圖形結合成所有可能配對的連接圖形(Pairwise Connecting Graph, PCG)進行結構圖形的重組,第三階段開始進行結構相似度的計算,第四階段則是將結構對應結果放入篩選機制中,找出最可能的配對提供使用者參考並做出最後判斷。改良型 SF演算法主要針對第二階段的過程進行改善,在結合OEM圖形的過程中參考語意對應相關的資訊,排除較為不可能的配對,達到簡化PCG圖形結構的效果。 改良型SF演算法利用多種機制有效的解決因為語意對應所產生的一對多與多對一的綱要衝突問題,得到較佳的演算效率及配對正確性。在演算法運算時間方面,由於簡化了PCG圖形的結構,因此改良型SF演算法所花費的處理時間也比SF原型演算法來的短。在配對正確性方面,利用對應品質指標Recall及Precision來計算個別演算法的績效,透過結合語意對應與結構對應兩種對應方式,再以二次對應輔助的結果,在多數情況下改良型SF演算法比SF演算法原型有較好的對應品質。 關鍵詞:資料交換、綱要對應、語意相似度、結構相似度。


Data exchange between companies in a supply chain environment needs to fulfill the requirements of common data format and data representation to assure the accuracy of communication. XML has recently emerged as a common data format for cross-platform information exchange over the Internet. Since information systems are developed independently, identical data represented with different schemas in each system is a common state; therefore information systems may not understand the true meaning of exchanged data. This kind of communication problem is named as schema conflict. The core technique for solving schema conflict in data exchange is correctly matching imported XML documents into internal relational database schemas. There are two major methods in schema matching: linguistic matching and structural matching. From previous research results, only one single method can not effectively solve linguistic matching problems in one-to-many and many-to-one cases. Similarity flooding (SF) originally is a purely structure-oriented algorithm based on the propagation graph, pairwise connecting graph (PCG), and fixpoint computation to detect similar schema structure. A modified similarity flooding method using linguistic similarity values to simplify the PCG is proposed to improve the effectiveness of schema matching. With a simplified data structure in the PCG, this hybrid method can reduce the computation effort in matching schemas. Based on the experimental results, in most cases this method increases matching accuracy with less computing time compared to the original SF method. The major factor could be only linguistically qualified candidates are included in the PCG, and this modification may increase the matching ability of the proposed method. Keywords: data exchange, schema matching, similarity flooding, XML.


[26]. 羅莉鈁、簡永仁,「原生型XML 資料庫系統之研究及應用--以學校法規查詢系統為例」,(2004)。
[24]. 劉冠宏、項衛中,「運用校正距離計算結構相似度增進資料交換中綱要對應正確性」,(2005)。
[4]. G.A. Miller, “WordNet: An On-line Lexical Database,” International Journal of Lexicography, vol. 3, no. 4, pp.235-312, 1990.
[12]. R. Barzilay and M. Elhadad, “Using Lexical Chains for Text Summarization,”ACL/EACL Workshop on Intelligent Scalable Text Summarization, 1997.
[3]. Ekaterina Pavlova, Igor Nekrestyanov, Boris Novikov,”Constraints for Semistructured Data”.


