Analyzing Missing Data in Metric Spaces

Safia Brinis, Agma J. M. Traina, Caetano Traina Jr.

Abstract


Similarity search in multimedia databases has challenged researchers for the last two decades, whose studies resulted in several achievements. However, searching in incomplete databases, i.e., databases with missing attribute values, has been less studied so far.
In this article, we present a set of experimental analyzes that evaluate the impact of missing data on the query performance in metric spaces. The results show that missing data cause severe skew in the metric space with only 2% of missing values and drastically affect the performance of the metric indexing techniques. Interestingly, our analyzes, confirmed by the presented experiments, show that data missing not at random are more prone of skew and raise the conditions of distance concentration phenomenon where the distances between pairs of elements in the space become homogeneous. Thus, this study provides an understanding of the issues involved with metric spaces when indexing incomplete databases and gives ground for research that supports the development of advanced metric access methods with handling of missing attribute values.


Keywords


Distance Concentration, Data Distribution, Missing attribute values, Similarity Search

Full Text:

PDF


An official publication of the Brazilian Computer Society Special Interest Group on Databases.