  • 學位論文


The Study of Applying Edit Distance to Calculate Structural Similarity and Enhance Schema Matching in Data Exchange

指導教授 : 項衛中


摘要 供應鏈已經成為企業經營的常態,因此如何整合供應鏈中各個企業的資訊流,亦即讓企業間不同的資訊系統能夠彼此溝通,成為重要且極需解決的問題。雖然XML已成為大部分企業採用的資料交換格式,可是每個企業使用的資料交換標準可能不一樣,所以在相互傳遞資料時常會有不能解讀或誤判的情況發生。本論文採用語意對映和結構對映來解決資料交換的綱要衝突,透過詞庫語意擴張的可能狀況來判定資料的語意相似程度。但詞庫在判斷同義字和多義詞會產生多個配對的可能狀況,因此再加上計算XML文件與資料庫綱要樹狀結構的結構相似度以確認相互的關係。本論文以校正距離計算XML文件與資料庫綱要之間的結構相似度,並且用來判別語意對映產生的一對多與多對一是否為可能的配對,再將結構對映所得到可能的配對輔以判斷法則來評估是否為正確的配對。判斷法則針對所接收到的可能配對考量其來源XML文件節點的父節點與目標綱要節點的父節點是否有對映關係來判定此配對是否正確,最後將判斷法則不能評估的配對提供使用者參考並做出判斷。為了驗證本論文的方法,透過四個範例顯示出經由詞庫、校正距離以及判斷法則能夠找出正確的對映關係,增加資料交換的效能,但在判斷法則無法適用時則需由人工來做最後決定。


Abstract Data exchange between different information systems in a supply chain system needs to fulfill the requirement of schema integration. Since ERP systems with relational database systems are developed independently, schema conflicts between databases is a common problem for schema integration. The core technique to solve schema conflicts is matching exchanged XML documents with relational database schemas. Linguistic similarity and structural similarity are proposed as two major techniques to develop data matching solutions. Procedures of these two approaches were integrated to find matching candidates between the XML document and the database. Linguistic similarity using the repository of synonyms and common words dictionary as references provided schema mapping suggestions. For cases of one to many or many to one mapping, the multiple mapping suggestions need further analysis to find more accurate results. Structural similarity was calculated with the edit distance method to evaluate multiple mapping suggestions, and some guidelines were applied to check the mapping suggestions. Basically, multiple mapping suggestions in the lower level with same mapping suggestion in the parent’s level are more likely accurate matching candidates. This integrated method would show the matching candidates as the results for users to verify. This method was applied to solve four mapping problems as examples to show the capability and limitation in finding matching candidates.


data exchange XML schema matching


[32].何明營,「運用XML技術輔助資料交換中綱要配對與轉換之探討」,中原大學工業工程系碩士班碩士論文 (2005)。
[1].Bertino, Elisa., & Catania, Barbara. (2001). Integrating XML and databases. IEEE INTERNET COMPUTING, 84-88
[3].Chieh-Yuan, Fang-Chih Tien & Tsung-Yu Pan. (2004). Development of an XML-based structural product retrieval system for virtual enterprises.
[5].Doan AH, Domingos P, Levy A. (2001) Reconciling schemas of disparate data sources: a machine-learning approach. In:Proc ACM SIGMOD Conf, pp. 509-520.
[6].Eric Jui-Lin Lu & Yu-Ming Jung. “XDSearch:an efficient search engine for XML document schemata,” Expert Systems with Applications, vol.24, issue.2, 2003, pp.213-224.


