DAHC-tree: An Effective Index for Approximate Search in High-Dimensional Metric Spaces

Jurandy Almeida, Eduardo Valle, Ricardo da S. Torres, Neucimar J. Leite

Abstract


Similarity search in high-dimensional metric spaces is a key operation in many applications, such as multimedia databases, image retrieval, object recognition, and others. The high dimensionality of the data requires special index structures to facilitate the search. A problem regarding the creation of suitable index structures for high-dimensional data is the relationship between the geometry of the data and the organization of an index structure. In this paper, we study the performance of a new index structure, called Divisive-Agglomerative Hierarchical Clustering tree (DAHC-tree), which reduces the effects imposed by the above liability. DAHC-tree is constructed by dividing and grouping the data set into compact clusters. We perform a rigorous experimental design and analyze the trade-offs involved in building such an index structure. Additionally, we present extensive experiments comparing our method against state-of-the-art of exact and approximate solutions. The conducted analysis and the reported comparative test results demonstrate that our technique significantly improves the performance of similarity queries.

Keywords


clustering methods; database indexing; metric access methods; metric spaces; similarity search

Full Text:

PDF


An official publication of the Brazilian Computer Society Special Interest Group on Databases.