透過您的圖書館登入
IP:13.59.218.147
  • 學位論文

最佳優先及同心球樹:優化球樹在最近鄰居法的效能

Best First and Concentric Ball Tree : Improving Efficiency of K-Nearest Neighbors Search

指導教授 : 鄭卜壬

摘要


最近鄰居法是一種在機器學習及資料探勘應用上相當常見的演算法。有相當多種方法可以實作最近鄰居法,其中樹狀結構演算法包含k維樹及球樹。球樹搜尋法是一種在高維度資料裡表現相當好的演算法。本工作專注於增進球樹搜尋法的效能。我們提出同心球樹搜尋法用來改變球樹的根結點結構。我們也提出了幾種策略法用來改變樹狀搜尋的順序。實驗結果顯示我們的方法能有效地降低拜訪的資料點個數及樹狀節點個數,以提升效能,節省不少搜尋時間。另外我們發現同心球樹在高維度的資料上表現相當良好,能增進更多的效能。最後我們的實驗也發現,如同傳統的球樹,同心球樹在不同的資料集的表現差異相當大。

並列摘要


The K-nearest neighbors(KNN) is often a necessary algorithm in many machine learning and data mining applications.There are several tree structure algorithm to implement KNN, like K-d tree search and Ball-tree search.Ball-tree search is a powerful algorithm to search KNN for high dimension.In this work, we focus on improving the efficiency of ball-tree.We propose concentric ball-tree which change the leaf node structure of ball-tree.We also use several heuristic to change the traverse order of ball-tree search.We empirically show that our approach can improve the efficiency a lot to save search time of KNN by reducing the number of visited data points, the number of visited node in tree structure.In addition, we find that concentric ball-tree scale well with the number of dimensions. It can improve more efficiency for traditional ball-tree at high dimension.We also show that the performance of ball-tree is data driven, and so dose concentric ball-tree.

參考文獻


[9] T. Liu, A. W. Moore, and A. Gray. New algorithms for efficient high-dimensional nonparametric classification. J. Mach. Learn. Res., 7:1135–1158, Dec. 2006.
[10] A. W. Moore. The anchors hierarchy: Using the triangle inequality to survive high dimensional data. In In Twelfth Conference on Uncertainty in Artificial Intelligence, pages 397–405. AAAI Press, 2000.
[4] J. Chen, H.-r. Fang, and Y. Saad. Fast approximate knn graph construction for high dimensional data via recursive lanczos bisection. J. Mach. Learn. Res., 10:1989–2012, Dec. 2009.
[2] J. L. Bentley. Multidimensional binary search trees used for associative searching. Commun. ACM, 18:509–517, September 1975.
[3] N. Bhatia and Vandana. Survey of nearest neighbor techniques. CoRR, abs/1007.0085, 2010.

延伸閱讀