透過您的圖書館登入
IP:3.136.97.64
  • 期刊

網頁地理資訊檢索與探勘-以民宿主題為例

Geographic Information Retrieval on Web Pages-Taking Homestay as an Example

摘要


網際網路上散佈了各式主題與大量的網頁資料,其中隱含了非常多的知識,但是這些內容大多是半結構性,甚至是非結構性的資料,如何能夠有效率的管理這些資料,並且進行資訊與知識的擷取,一直是研究與開發的重點,因此也就有各式各樣的網路搜尋引擎、資料探勘以及網路行銷技術的開發。但是目前一般的網路搜尋技術大多只著重於關鍵字的檢索,對於網頁內容與主題的分析,則仍未盡理想。另外,對於網頁內容中的地理資訊,也未能進行有效的檢索與分析,以致於犧牲了許多內含的地理資訊。 本研究以網頁中的民宿主題為例,使用Google Search Web Service為網路搜尋的基礎,結合中央研究院詞庫小組開發的斷詞斷字系統與文字資料探勘的技術,對於Google所搜尋到的網頁,進行空間與語意內容的探勘、檢索與排序,找出與所查詢主題在內容與地理資訊上最相關的網頁。接著,透過地理資訊檢索與正規表示式,由這些篩選過的網頁內容中,檢索出有用的地理資訊,再透過Google Map API地址對位的技術,將檢索出來的地理資訊與文字內容結合顯示於Google Map地圖上。以這樣的方式所搜尋出來的結果,將是包含了地理資訊的圖與文,且更貼近需求的查詢結果,將可應用於各種與空間主題相關之內容的查詢、分析、地理資料蒐集與空間知識的管理上。

並列摘要


The World Wide Web (WWW) offers an enormous spread of information and data, and assembles a tremendous amount of knowledge. Much of this knowledge however, comprises either non-structured data or semi-structured data. In order to make use of these unexploited or underexploited resources more efficiently, the management of information and data gathering have become essential direction for research and development. However, at the present moment, the ability of regular search engines to access and use this data, is still far from perfect, since it is limited to the retrieval of basic keywords rather than analysis of the subject matter and content of the webpage itself. In addition, there are limited capabilities for effective retrieval and analysis of implicit geographic information contained within the webpage. This paper focuses on the task of researching a hostel or homestay by using the Google Search Web Service as a base search engine. From the search results, mining, retrieving and sorting out location and semantic data were carried out by combining the Chinese Word Segmentation System with Text Mining technology in order to find geographic information thatthatthat can be derived from the webpage. The results obtained from this particular searching method allowed users to get closer to the answers they sought and achieve greater accuracy, since the results included graphics and associated textual geographic information. In the future, this method may be suitable for and applicable to various types of queries, analyses and geographic data collection, and in managing spatial knowledge related to different keywords within a document.

參考文獻


李俐槿、李祐陞、林金龍、黃國倫()。
Amitay, E.,Har''E, l N.,Sivan, R.,Soffer, A.(2004).Web-a-Where: Geotagging Web content.Proceedings of the 27th annual international ACM SIGIR Conference on research and development in information retrieval.(Proceedings of the 27th annual international ACM SIGIR Conference on research and development in information retrieval).
Andrade, L.,Silva, M.(2006).Relevance Ranking for Geographic IR.Proceedings of the workshop on Geographic Information Retrieval.(Proceedings of the workshop on Geographic Information Retrieval).
Boguraev, B.,Neff, M. S.(2000).Discourse segmentation in aid of document summarization.Proceedings of the 33rd Hawaii International Conference on System Sciences.(Proceedings of the 33rd Hawaii International Conference on System Sciences).
Buyukkokten, O.,Cho, J.,Garcia-molina, H.,Gravano, L.,Shivakumar, N.(1999).Exploiting geographical location information of web pages.Proceedings of the ACM SIGMOD Workshop on the Web and Databases (WebDB'99).(Proceedings of the ACM SIGMOD Workshop on the Web and Databases (WebDB'99)).

被引用紀錄


謝育慈(2016)。醫學博碩士論文關鍵詞與MeSH詞彙之對應研究-以臺北醫學大學為例〔碩士論文,淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2016.00730
林麗鳳、陳逸郎(2020)。總體經濟因素與旅遊搜尋對來臺灣旅遊需求之探討觀光休閒學報26(1),97-122。https://doi.org/10.6267/JTLS.202004_26(1).0004

延伸閱讀