透過您的圖書館登入
IP:34.204.177.148
  • 學位論文

商品展覽會深網整合及其關鍵字查詢排名策略

Deep web integration of product exhibitions and its ranking strategy for keyword search

指導教授 : 周清江

摘要


隨著網路使用量不斷地增加,搜尋引擎已成為蒐集資訊情報的重要工具,但仍然有許多有價值資料隱藏在深層網路的資料庫內,無法有效率的在傳統搜尋引擎中被找到,本研究以商品展覽會網路資料庫為例,提供一個解決方案。一個中小企業人員及參展廠商,在網路上常面臨到無法確實得知何時何地有國際展覽會舉行,而展覽會中又有哪些公司及相關產品參展,所花費的時間過長且找尋到資料未必齊全,無法有效地蒐集展覽會相關資訊。本研究整合網路上來自相同領域不同展覽會的資料,並提供使用者進行產品關鍵字查詢,查詢結果包括了產品所屬的公司及該公司中與關鍵字相關產品。本研究由兩個系統完成:(1)資料整合系統:使用網路機器人,蒐集多個展覽會網站資料來源、將不同網站所提供的資訊,整合於關聯式資料庫中;(2)排名處理系統:處理關鍵字查詢,且提供排名策略,除了參考過去研究之值組樹大小標準化、文件長度標準化、反向文件頻率標準化及文件之間權重標準化的調整因素外,本研究加入特定欄位出現次數權重及異質資料倍率權重進行排序調整,讓公司及產品資訊與使用者輸入的關鍵字相關性較高者,排名較前面。經過使用者測試評估顯示,當特定欄位出現次數權重值為9及異質資料倍率權重值為2-7時,平均準確率(Mean Average Precision, MAP)的結果為0.6471,與未考慮這兩項的做法比較,有59.70%的改善。

並列摘要


With the rapid development of World Wide Web, the search engine has become an important tool to collect information. However, there are still lots of valuable information in the deep web that can’t be found by traditional search engine efficiently. We tackle the problem using web exhibition product databases. A small and medium enterprises (SMEs) personnel and exhibitor often face a problem in the web that they could not exactly know when and where an international exhibition to would be held and they could not get the information about which companies and related products are in the exhibition. The collection of this information takes time. Furthermore, it may not be the complete information. In this study, we integrate different exhibition websites information in the same field. It provides users to search product through keyword query. Moreover, the query results include the product’s company and its other products related to the keyword. The system is implemented by the combination of two systems. The first one is the crawler extracting system that uses network robot to collect many data of exhibition sites in the same field and to integrate these data into a relational database. The other one is the query processing system that answers a keyword query with its ranking strategies. Except for the tuple tree size normalization, the document length normalization reconsidered, the document frequency normalization and the inter-document weight normalization that were used in the past research, we join the specific field occurrences weight and heterogeneous data weights to adjust ranking list. The more company and product descriptions related to the keywords, the closer they will be put in the top of the result. Compared with past practices, when specific field occurrences weight is with value 9 and heterogeneous data weights with value 2-7, our experiments had a MAP (Mean Average Precision) value 0.6471, which was 59.70% improvement.

參考文獻


3. Agrawal, S., Chaudhuri, S., & Das, G. (2002). DBXplorer: A system for keyword-based search over relational databases. In: Proc. of the 18th Int’l Conf. on Data Engineering (ICDE 2002), 5-16.
6. Balmin, A., Hristidis, V., & Papakonstantinou, Y. (2004). Objectrank: Authority-based keyword search in databases. In: VLDB’04, 564-575.
7. Bergman, M. K. (2001). White paper: the deep web: surfacing hidden value. Journal of Electronic Publishing, 7(1).
8. Bhalotia, G., Hulgeri, A., Nakhe, C., Chakrabarti, S., & Sudarshan, S. (2002). Keyword searching and browsing in databases using BANKS. In ICDE, 431-440.
9. Cafarella, M. J., Halevy, A., & Khoussainova, N. (2009). Data integration for the relational web. Proceedings of the VLDB Endowment, 2(1), 1090-1101.

被引用紀錄


曹家瑜(2013)。以模糊自動機解決排名問題〔碩士論文,國立臺北商業大學〕。華藝線上圖書館。https://doi.org/10.6818%2fNTUB.2013.00009
陳秋燕(2016)。以模糊平衡計分卡探討企業績效排名〔碩士論文,國立臺北商業大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0064-0901201715233604

延伸閱讀