Building a Lattice Based HTML Query System


一般公司都會提供網頁搜尋器幫助使用者查詢其公司資料。但此搜尋器大都使用一般適用於網際網路上的搜尋器,這類搜尋引擎一般只提供類似關鍵字性質的查詢。其實公司的網頁通常具有一定格式,因此公司的查詢引擎應提供網頁內容外,應還包含網頁格式。在HTML中,網頁格式包含了URL、標籤及其屬性,這些資料組合成一個具有階層和順序關係結構的文件。 本論文以Lattice理論來代表網頁的結構化關係,將網頁的資料和結構關係紀錄在資料庫裡面,同時提供使用者一套類似SQL語法的查詢語言,讓一般使用者在可經由其查詢網頁的文字資料、HTML標籤及其屬性、網頁間的超連結關係,和上述這些資料項之間的前後與階層關係。 因基本理論已在[15]中刊出,本論文描述如何實作此一系統。系統分為網頁擷取代理人、資料轉換模組、與查詢模組。 同時因XML資料共包含了很多文件結構於其中,此查詢系統亦提供了一個建構XML查尋系統的研究方向。


Most companies provide company portals for internal and external users. With the portals, also come query systems. However, most systems used are Internet Search Engines that provide key words or full text search only. Since most organizations define special and consistent formats for their pages, i.e. many fields have unique meanings in the pages, a good query system should enable MIS users or end users to query with structures as well as contents in pages. The contents include HTML tags, attributes, URL links, and texts in HTML files. The structures include hierarchies and orders in which the contents are organized. The structures are modeled as lattices that can describe any partial order relationships. Both contents and structures are then stored in relational databases. In [15], algebra is developed to query a system based on such a lattice theory. The paper proposes a SQL query interface and show how to implement such a system in reality. The system includes a web page retrieval agent, a translation module and a query module. Since XML also has strict field definition in it. The proposed theory and system should also point a direction for developing XML query systems.


Internet Search Engine Query Language Lattice HTML


