Hyperbolic Vision Transformers: Combining Improvements in Metric LearningCVPR,
Metric learning aims to learn a highly discriminative model encouraging the embeddings of similar classes to be close in the chosen metrics and pushed apart for dissimilar ones. The common recipe is to use an encoder to extract embeddings and a distance-based loss function to match the representations – usually, the Euclidean distance is utilized. An emerging interest in learning hyperbolic data embeddings suggests that hyperbolic geometry can be beneficial for natural data. Following this line of work, we propose a new hyperbolic-based model for metric learning. At the core of our method is a vision transformer with output embeddings mapped to hyperbolic space. These embeddings are directly optimized using modified pairwise cross-entropy loss. We evaluate the proposed model with six different formulations on four datasets achieving the new state-of-the-art performance. The source code is available at https://github.com/htdt/hyp_metric
Neural Side-by-Side: Predicting Human Preferences for No-Reference Super-Resolution EvaluationCVPR,
Super-resolution based on deep convolutional networks is currently gaining much attention from both academia and industry. However, lack of proper evaluation measures makes it difficult to compare approaches, hampering progress in the field. Traditional measures, such as PSNR or SSIM, are known to poorly correlate with the human perception of image quality. Therefore, in existing works common practice is also to report Mean-Opinion-Score (MOS) -- the results of human evaluation of super-resolved images. Unfortunately, the MOS values from different papers are not directly comparable, due to the varying number of raters, their subjectivity, etc. By this paper, we introduce Neural Side-By-Side -- a new measure that allows super-resolution models to be compared automatically, effectively approximating human preferences. Namely, we collect a large dataset of aligned image pairs, which were produced by different super-resolution models. Then each pair is annotated by several raters, who were instructed to choose a more visually appealing image. Given the dataset and the labels, we trained a CNN model that obtains a pair of images and for each image predicts a probability of being more preferable than its counterpart. In this work, we show that Neural Side-By-Side generalizes across both new models and new data. Hence, it can serve as a natural approximation of human preferences, which can be used to compare models or tune hyperparameters without raters' assistance. We open-source the dataset and the pretrained model and expect that it will become a handy tool for researchers and practitioners.
Non-metric Similarity Graphs for Maximum Inner Product SearchNeurIPS,
In this paper we address the problem of Maximum Inner Product Search (MIPS) that is currently the computational bottleneck in a large number of machine learning applications. While being similar to the nearest neighbor search (NNS), the MIPS problem was shown to be more challenging, as the inner product is not a proper metric function. We propose to solve the MIPS problem with the usage of similarity graphs, i.e., graphs where each vertex is connected to the vertices that are the most similar in terms of some similarity function. Originally, the framework of similarity graphs was proposed for metric spaces and in this paper we naturally extend it to the non-metric MIPS scenario. We demonstrate that, unlike existing approaches, similarity graphs do not require any data transformation to reduce MIPS to the NNS problem and should be used for the original data. Moreover, we explain why such a reduction is detrimental for similarity graphs. By an extensive comparison to the existing approaches, we show that the proposed method is a game-changer in terms of the runtime/accuracy trade-off for the MIPS problem.
Learning to rank is a central problem in information retrieval. The objective is to rank a given set of items to optimize the overall utility of the list.