Nearest Neighbor Queries with Counting Aggregate-based Conditions

Daniel S Kaster, Willian D Oliveira, Renato Bueno, Agma J M Traina, Caetano Traina Jr.


The amount of complex data, such as images, videos and time series, has been increasing in the past few years in a very fast pace. The main property employed to query such kind of data based on its content is the similarity between elements. One of the most common similarity queries is the k-Nearest Neighbor query (k-NNq), which returns the k elements most similar to a given reference element. Although this kind of query is useful for many applications, it does not consider additional conditions that may modify the basic nearest neighbor search, which would allow retrieving more relevant information to the user. Complex data are usually associated with other information and usually have metadata describing the data content and/or context. Such external (non-content-based) information can be employed to define the conditions modifying the k-NN search to answer advanced queries over complex data. This paper presents a new variation to the k-NNq, which includes counting aggregate-based conditions.
This extension allows answering queries that are not easily definable using only external conditions and regular k-NN operation. Our work defines the counting aggregate-based conditions and presents algorithms to execute k-NN queries with these conditions, using either sequential scan or a metric access method.
Experiments over real datasets are also presented, comparing the developed algorithms and showing that search performance can be largely improved using an appropriate access method.


Metric Access Methods, Nearest Neighbor Search, Similarity Queries

Full Text:


An official publication of the Brazilian Computer Society Special Interest Group on Databases.