A Primary Study of XML Document Exchange with Shared Schema Components

指導教授 : 項衛中


近年來網路技術的迅速發展使得全球企業逐步邁向e化, XML也逐漸成為資料傳遞時所採用的標準格式。XML 綱要可用來定義XML文件的結構、資料型態…等。在XML 綱要的性質中具有可被重複引用的特性,並將此特性稱為共享綱要元件。利用共享綱要元件可降低編寫XML文件的時間,但是在異質性資料交換中將增加資料對映的複雜度。其主要因素為若無法正確的對映共享綱要元件與引用此元件的綱要元素,將產生混淆並造成資料的錯誤。本論文依據片段式綱要對映方法,解決包含共享綱要元件的XML文件間的綱要對映問題。首先依據是否為共享綱要元件,將XML 文件拆解為許多片段,接下來以語意詞庫與校正距離演算法進行相似度的計算,最後將各片段對映的結果重整,並提出最有可能的建議配對。利用此對映方法經過拆解後,對映的複雜度降低,可提升各個片段對映的速度。個別的對映結果,在重組時加入整體的考慮,可增加整體配對的正確性。本論文並實作此XML 綱要對映方法,並利用範例來驗證此模組的功效與限制。利用此對映方法進行配對,即使來源與或目標XML文件在設計上有所差異,但在大部分的範例中皆可得到正確的建議配對。


XML has recently emerged as a common data format for cross-platform information exchange over the Internet. XML Schema is used for defining the structure of XML documents, data type, etc. One of XML Schema features is reusability, and such feature is realized as shared schema component. Using shared schema components can reduce the time of defining XML documents. However, in heterogeneous data exchange cases, shared schema components will increase the complexity of schema matching. The matching of the shared schema component and schema elements which reference it should be considered simultaneously; otherwise it may cause data errors in data exchange. This research proposes a matching method based on the fragment-based approach to solve the schema matching problem with shared schema component. This method simplifies the whole matching problem by increasing the matching efficiency of fragments. This method first decomposes the source and target XML documents into several fragments according to the shared schema components. The next step is calculating similarity between the source and target fragments by combining linguistic similarity values from the WordNet and structural similarity values from the edit distance method. Finally, the fragments match results are integrated into a total match result and the most possible matching candidates are suggested to the user. This method is implemented as a matching module in Java codes, and some examples are tested to show the capability and limitation of this matching method. Based on the experimental results, in most cases this method increases the matching accuracy and precision, even though the source or target XML documents have different structures of shared components.


