Bayesian methods

Bayesian Methods are a subset of Probabilistic ML which provides a normative theory of ML and the ability to reason over subjective unknowns and handle epistemological uncertainties.

Area 1. Bayesian Methods.svg



  • Uncertainty Measures in Neural Belief Tracking and the Effects on Dialogue Policy Performance

    Natural language processing Distributional shiftUncertainty estimation Bayesian methods
    Carel van Niekerk
    Andrey Malinin
    Christian Geishauser
    Michael Heck
    Hsien-chin Lin
    Nurul Lubis
    Shutong Feng
    Milica Gašic

    The ability to identify and resolve uncertainty is crucial for the robustness of a dialogue system. Indeed, this has been confirmed empirically on systems that utilise Bayesian approaches to dialogue belief tracking. However, such systems consider only confidence estimates and have difficulty scaling to more complex settings. Neural dialogue systems, on the other hand, rarely take uncertainties into account. They are therefore overconfident in their decisions and less robust. Moreover, the performance of the tracking task is often evaluated in isolation, without consideration of its effect on the downstream policy optimisation. We propose the use of different uncertainty measures in neural belief tracking. The effects of these measures on the downstream task of policy optimisation are evaluated by adding selected measures of uncertainty to the feature space of the policy and training policies through interaction with a user simulator. Both human and simulated user results show that incorporating these measures leads to improvements both of the performance and of the robustness of the downstream dialogue policy. This highlights the importance of developing neural dialogue belief trackers that take uncertainty into account.

  • Uncertainty Estimation in Autoregressive Structured Prediction

    Natural language processing Large-scale machine learningMachine translationSpeech processingDistributional shiftUncertainty estimation Bayesian methods
    Andrey Malinin
    Mark Gales

    Uncertainty estimation is important for ensuring safety and robustness of AI systems. While most research in the area has focused on un-structured prediction tasks, limited work has investigated general uncertainty estimation approaches for structured prediction. Thus, this work aims to investigate uncertainty estimation for structured prediction tasks within a single unified and interpretable probabilistic ensemble-based framework. We consider: uncertainty estimation for sequence data at the token-level and complete sequence-level; interpretations for, and applications of, various measures of uncertainty; and discuss both the theoretical and practical challenges associated with obtaining them. This work also provides baselines for token-level and sequence-level error detection, and sequence-level out-of-domain input detection on the WMT’14 English-French and WMT’17 English-German translation and LibriSpeech speech recognition datasets.

  • Uncertainty in Gradient Boosting via Ensembles

    Gradient boostingTabular dataDistributional shiftUncertainty estimation Probabilistic machine learningBayesian methods
    Andrey Malinin
    Liudmila Prokhorenkova
    Aleksei Ustimenko

    For many practical, high-risk applications, it is essential to quantify uncertainty in a model's predictions to avoid costly mistakes. While predictive uncertainty is widely studied for neural networks, the topic seems to be under-explored for models based on gradient boosting. However, gradient boosting often achieves state-of-the-art results on tabular data. This work examines a probabilistic ensemble-based framework for deriving uncertainty estimates in the predictions of gradient boosting classification and regression models. We conducted experiments on a range of synthetic and real datasets and investigated the applicability of ensemble approaches to gradient boosting models that are themselves ensembles of decision trees. Our analysis shows that ensembles of gradient boosting models successfully detect anomalous inputs while having limited ability to improve the predicted total uncertainty. Importantly, we also propose a concept of a virtual ensemble to get the benefits of an ensemble via only one gradient boosting model, which significantly reduces complexity.