  • 學位論文


The Study of Document Structurization and Automatically Saving to Original Database

指導教授 : 姚修慎


目前大部分各公司的商業資訊都會以某種標準格式來做資料的交換,XML(eXtensible Markup Language)文件是最常用來資料傳遞的載具,但由於各家公司資料庫設計不盡相同導致100間公司可能有100種資料庫欄位名稱,在做資料轉換時常會因為兩家公司資料庫欄位名稱不完全相同而產生資料傳遞的困難。雖然對於一般公司的上下游公司(supplying chain)還能夠協調成統一的資料庫欄位名稱或是建立一些正確的轉換機制來傳遞彼此的資訊。我們期望能做到公司能透過原本資料庫內的資訊和字詞字典輔助就能將外來的相關資料庫電子文件資訊轉入資料庫中。 本研究提出一個方式希望能透果某些字詞的協助將各類電子文件以自動化方式透過下列3階段來擷取文件中資料,將其轉為結構化資訊進而能存回原有資料庫中。下列為本系統的3個步驟: 1. 從文件中尋找與原資料庫的相關表格。 2. 區分第一步驟中的相關表格為目的表格及參照表格。前者為 即E-R模型中菱形方塊,後者為即E-R模型中長方形方塊。 3. 擷取文章中與表格相關資訊及回存資料庫中。 本論文共分為6個部分:序論、相關技術分析、研究設計模型、實作步驟及方法、實驗結果與討論、結論。


At present, commercial information of most company always use standard form for information exchange. XML document is the most common used carrier to transfer information. But since each company has its own database design or schema, there are some difficulties resulted from information exchange between two different companies. Although supplying chain companies may adjust their databases into one common schema or construct a common exchange mechanism to exchange information among each other, some unavoidable problems still exist. We expect that company may use information contained in original database and the assistance of term dictionary to convert external electronic document into original database. In this paper, we propose an automatic conversion method by using the assistance of some specific terms and applying the three steps we introduce to retrieve information contained in document. The retrieved information will then be converted into structural information and be stored back into original database. The three steps in the system we construct are as follows: 1. Find tables that are related to database from document 2. Separate the tables from step 1 into destination table and reference table; the former is that represented by diamond shape of E-R model, the latter is that represented by rectangular shape of E-R model. 3. Retrieve information related to tables from document and save into original database. This paper consists of 6 section: introduction, analysis of related techniques, model design, implementation method, experiment result and discussion, conclusion.


[4] “Clock: synchronizing internal relational storage with external XML
[1] David Kosiur 原著, 李國熙編譯 , 戴建耘 校閱, 松岡電腦圖書資料股份有限公司 認識電子商務 [UNDERSTANDING Electronic Commerce] 3-19, 3-20
[2] XMLEDI研究報告XML/EDI Research Report
