Research areas
Our research covers the most significant areas in modern machine learning.
Computer vision
Yandex Research team regularly contributes to the computer vision research community, mostly in the field of image retrieval and generative modelling.32 publications4 posts1 datasetNatural language processing
Language is one of the key forms of communication. We study methods of language representation and understanding to simplify human-computer interactions.21 publication2 posts2 datasetsLarge-scale machine learning
Today, training most powerful models often takes significant resources. Our research aims to make large-scale training more efficient and accessible to the entire machine learning community.8 publications1 postMachine learning theory
We study various aspects related to theoretical understanding of ML models and algorithms.24 publications2 postsGraph machine learning
Graphs are a natural way to represent data from various domains such as social networks, molecules, text, code, etc. We develop and analyze algorithms for graph-structured data.11 publications3 posts1 datasetProbabilistic machine learning
Probabilistic machine learning describes methods which enable reasoning and inference over unknown quantities. Commonly used in generative modelling, regression and uncertainty quantification.6 publicationsDistributional shift
Distributional shift is the mismatch between training and deployment data that is ubiquitous in the real-world. Studying this phenomenon can enable safer and more reliable ML systems.5 publications3 posts1 datasetUncertainty estimation
Uncertainty estimation enables detecting when ML models make mistakes. This is of critical importance in high risk machine learning applications, such as autonomous vehicle and medical ML.8 publications3 posts1 datasetGradient boosting
Gradient boosting iteratively combines weak learners (usually decision trees) to create a stronger model. It achieves state-of-the-art results on tabular data with heterogeneous features.12 publications1 postOptimization
Most machine learning algorithms build an optimization model and learn its parameters from the given data. Thus, developing effective and efficient optimization methods is of the essence.13 publicationsMachine translation
Language barriers hinder global communication and access to worldwide knowledge. By improving machine translation systems, we hope to facilitate the exchange of culture and information.9 publications1 datasetSpeech processing
Speech is an important data modality and relatives to applications such as speech recognition and speech synthesis, which are core technologies in products such as vocal assistants.5 publicationsNearest neighbor search
Nearest neighbor search is a long-standing problem arising in a large number of machine learning applications, such as recommender services, information retrieval, and others.14 publications3 posts1 datasetGenerative models
Generative models in computer vision are powerful tool for various applications.10 publications5 postsSegmentation
Image segmentation is a long-standing pixel-level problem in computer vision, which can also serve as a testbed for other dense prediction tasks.1 publicationRepresentations
Constructing high-quality data representations are a necessary component in common machine learning pipelines.8 publicationsRanking
Learning to rank is a central problem in information retrieval. The objective is to rank a given set of items to optimize the overall utility of the list.17 publicationsTabular data
Tabular data involves two-dimensional tables with objects (rows) and features (columns), which are used in numerous applied tasks such as classification, regression, ranking and many others.6 publications2 posts1 datasetBayesian methods
Bayesian Methods are a subset of Probabilistic ML which provides a normative theory of ML and the ability to reason over subjective unknowns and handle epistemological uncertainties.4 publications1 post