Welcome

Welcome to PPS, workshop on probabilistic programming semantics, on Tuesday, 9 January 2018, colocated right before POPL. This informal workshop aims to bring programming-language and machine-learning researchers together to advance the semantic foundations of probabilistic programming.

We are delighted that Erik Meijer has accepted our invitation to give a talk “Software is eating the world, but ML is going to eat software” and that Rif A. Saurous and Dustin Tran have accepted our invitation to give a tutorial on “Deep Probabilistic Programming: TensorFlow Distributions and Edward”. In addition, as listed on the workshop website, we have accepted 22 extended abstracts submitted by a wide range of contributors. We accepted 14 submissions as posters and 8 as talks, not on the basis of reviewer scores but based on which medium we thought would be most effective in conveying the material. So, some highly ranked submissions that are more technical in nature are accepted as posters.

To foster collaboration and establish common ground, we ask all accepted contributors to post their revised extended abstracts on this site, along with any other materials such as preprints they want to share.

Everyone is welcome to post comments, questions, and other discussion on the posts. Because probabilistic programming is a research area that bridges multiple communities with different vocabularies, comments of the flavor “I don’t understand what you mean by X” are particularly valuable!

Slides from Erik Meijer’s talk: “Software is eating the world, but ML is going to eat software”

Slides from Rif A. Saurous and Dustin Tran’s tutorial: “Deep Probabilistic Programming: TensorFlow Distributions and Edward”

Colab notebook from the tutorial

Posted in Uncategorized | Comments Off on Welcome

Probabilistic Programming for Robotics

Autonomous agents use noisy sensors and actuators to interact with a complex external world. As such, inference engines and point estimators are essential to making modern robots work. Often, these are highly specialized and optimized to run in real-time on autonomous agents. These methods are hand-coded and typically make approximations to gain speed, e.g linearization, approximate independence, or static world assumptions. Extended Kalman filters (EKF) are used in a variety of applications from low-level state estimator for feedback control, to high-level mapping applications.

As a high-level language and inference tool, we believe that probabilistic programming has much to offer to the robotics community. As robots become increasingly capable, they also pose greater risks. A wrong state estimate can result in serious damage and injury when robots are equipped with powerful motors. Eliminating the possibility of coding errors in the inference engine by compiling them from high-level specifications would be beneficial both from a safety and design-time perspective. Conversely, robotics can be a rich source of challenging inference problems related to robust long-term autonomy that probabilistic programming might help answer. We present a specific example problem of a common estimation task in autonomous robot and show how PP can address a (vexing) practical calibration issues that practitioners face.

[Extended Abstract][Poster]

Posted in Uncategorized | Comments Off on Probabilistic Programming for Robotics

Probabilistic Program Equivalence for NetKAT

Over the past decade, formal methods have found successful application in the verification of computer networks and various network verification tools have been developed in academia and industry. These tools are all based on deterministic network models, and as such cannot reason about network features involving uncertainty: faulty links, randomized routing algorithms, fault-tolerance mechanism, etc. This is a serious limitation as such features are pervasive in real-world networks.

NetKAT is one such framework. In NetKAT, verification questions such as waypointing, reachability, and isolation can be naturally phrased as questions about program equivalence, and then be answered by a symbolic (worst-case PSPACE) decision procedure that scales well in practise.

This work studies the equivalence problem for Probabilistic NetKAT, i.e. NetKAT extended with a probabilistic choice operator.  We show that the problem is decidable for the history-free fragment of the language and use the system to study resilient routing algorithms.

It remains open whether the equivalence problem for the full language is decidable. The problem is technically challenging because ProbNetKATs semantics is defined over the uncountable space of packet history sets and the calculus is powerful enough to encode continuous (i.e., atomless) distributions.

[ poster | extended-abstract | technical-report ]

Posted in Uncategorized | Comments Off on Probabilistic Program Equivalence for NetKAT

Contextual Equivalence for a Probabilistic Language with Continuous Random Variables and Recursion

We present a complete reasoning principle for contextual equivalence in an untyped probabilistic programming language. The language includes continuous random variables, conditionals, and scoring. The language also includes recursion, since in an untyped language the standard call-by-value fixpoint combinator is expressible. The language is similar to that of [Borgstrom et al 2016].

To demonstrate the usability of our characterization, we use it to prove that reordering the draws in a probabilistic program preserves contextual equivalence (ie, that sampling in our language is a commutative effect). This allows us to show, for example, that

let x = e1 in let y = e2 in e3 and let y = e2 in x = e1 in e3

are contextual equivalent (provided x is not among the free variables of e2, and y is not among the free variables of e1), despite the fact that e1 and e2 may have the effect of drawing from the source of entropy.

[pdf]

Posted in Uncategorized | Comments Off on Contextual Equivalence for a Probabilistic Language with Continuous Random Variables and Recursion

Comparing the speed of probabilistic processes

A key aspect of many probabilistic systems such as continuous-time Markov chains is that of time: time passes as the system moves from state to state, and many properties of interest, such as safety properties, are concerned with time, e.g. “the airbag has a probability of 0.99 or greater of being deployed before 0.1 seconds”.

We therefore propose a natural and intuitive way of comparing such probabilistic systems with respect to time by way of a faster-than relation. This relation is trace-based and requires that for any word and time bound, the faster process has a higher probability of producing a trace that begins with that word within the time bound. This means that time is accumulated along the trace, and it allows for the faster process to be slower than the slower process in some states along the trace, as long as the overall trace is faster. Our investigation of the faster-than relation is carried out in the context of semi-Markov processes, where the time behaviour can be governed by probability distributions other than the exponential distribution, since many systems encountered in practice do not have exponential time behaviour.

We investigate the computational complexity of the proposed relation, and show that is a difficult problem. Through a connection to probabilistic automata, we show that the relation in general is undeciable, and that if we restrict to only a single output label, the problem remains as hard as the Positivity problem for linear recurrence sequences, which has remained an open problem in number theory for many decades. Exploiting the connection to probabilistic automata further, we also show that the relation can not even be approximated.

Although the problem is a difficult one, we also obtain some decidability results. If we restrict ourselves to unambiguous processes, where every output label leads to a unique successor state, then the problem becomes decidable in coNP. If we instead restrict the time so that we only want the faster process to be faster up to a given time bound, then we are able to approximate the relation under the assumption that all timing distributions are slow, meaning that they must take some amount of time to fire a transition.

In this work we have investigated the computational complexity of the faster-than relation, but there are many other interesting aspects of the relation that we hope to investigate in future work, such as logical and compositional aspects.

Authors: Mathias Ruggaard Pedersen, Nathanaël Fijalkow, Giorgio Bacci, Kim Guldstrand Larsen, and Radu Mardare.

Extended abstract

Poster

This extended abstract is based on a paper that has been accepted for LATA 2018, a preprint of which can be found on arXiv here.

Posted in Uncategorized | Comments Off on Comparing the speed of probabilistic processes

S-finite Kernels and Game Semantics for Probabilistic Programming

Staton has recently argued convincingly that first-order functional probabilistic programs with sampling from continuous distributions and soft constraints correspond precisely to so-called s-finite kernels. This class of possibly infinite kernels is a little studied extension of their better known σ-finite cousins, obtained by closing them under kernel integration – or pushforwards along measurable functions.

Staton has demonstrated that this class of kernels has various desirable properties like closure under composition and a Fubini-Tonelli result on swapping order of integration.  Still, s-finite measures and kernels remain poorly understood and their basic theory still needs to be established.

We characterise their precise relationship to σ-finite and probability measures and kernels. This lets us establish basic results in the s-finite setting like Randomisation, Radon-Nikodým and Disintegration Theorems. These results explicate, respectively, how to understand s-finite probabilistic computation as deterministic computation with access to a Lebesgue measure oracle, when we can apply importance sampling in this setting and that s-finite measures have a well-defined theory of conditional probability.

Further, we sketch how s-finite kernels work together well with traditional game semantics techniques to give fully abstract models of higher-order probabilistic programming languages. The key insight here is that the intensional level of description of game semantics restricts higher-order functions to be tame enough such that classical measure theory (standard Borel spaces) suffices to describe function types, thus circumventing Aumann’s celebrated no-go theorem. Perhaps this explains why this no-go theorem is not encountered as an obstruction in implementing higher-order probabilistic languages?

Authors: Luke Ong, Matthijs Vákár.

Extended abstract: [pdf]

Poster: [pdf]

 

Posted in Uncategorized | Comments Off on S-finite Kernels and Game Semantics for Probabilistic Programming

The Support Method for Computing Expectations

In this abstract, we discuss an alternative method to sampling for computing the answer to probabilistic expectation queries: enumerating values in the support of the model and adding their contribution to the expectation. We propose several criteria for a good enumeration scheme and discuss several issues that arise when implementing this idea. We present a method for enumerating the support of continuous variables that meets these criteria. We also present a general method for enumerating the support of models consisting of many variables. Preliminary experiments show that this method can be better than sampling methods on some queries.

Extended abstract: Support Method

Posted in Uncategorized | Comments Off on The Support Method for Computing Expectations

Using Reinforcement Learning for Probabilistic Program Inference

Probabilistic program inference often involves choices between various strategies. Rather than try to make the choices in advance or delegate them to the user, we can use reinforcement learning to try different strategies and see which performs well. When a compositional inference process is being used, we get a network of reinforcement learners.   In our approach, the solution to an inference task is represented as a stream of successive approximations. We present strategies for choosing between a fixed set of such streams, for combining multiple streams to produce a single output stream, and for merging a stream of streams into a single stream.

Extended abstract: Reinforcement Learning for Inference

Posted in Uncategorized | Comments Off on Using Reinforcement Learning for Probabilistic Program Inference

The semantic structure of quasi-Borel spaces

Higher-order functions — functions that take functions as arguments — are incredibly useful as a programming abstraction. But what does a higher-order probabilistic program actually mean? We are used to thinking of program functions as mathematical functions, but does this make sense in the context of measure and integration? What’s a semantic foundation for probability theory that supports higher-order functions and a monad of measures? We argue that quasi-Borel spaces are just that.

A quasi-Borel space is a set X together with a set of functions ⊆ (ℜ → X). The idea is that the reals ℜ are the source of randomness and M is a set of allowed random elements. Thus quasi-Borel spaces put the focus on random elements rather than σ-algebras, just as probabilists do.

Authors: Chris Heunen, Ohad Kammar, Sean Moss, Adam Ścibior, Sam Staton, Matthijs Vákár, Hongseok Yang.

Extended abstract: [pdf]

See also our LICS 2017 paper, A Convenient Category for Higher-Order Probability Theory, and our POPL 2018 paper, Denotational validation of higher-order Bayesian inference.

Posted in Uncategorized | Comments Off on The semantic structure of quasi-Borel spaces

Constructive probabilistic semantics with non-spatial locales

Ideally, a probabilistic programming language should admit a computable semantics. But languages often provide operators that denote uncomputable functions, such as comparison of real numbers. While the use of these uncomputable operators may result in uncomputable programs, a programmer can productively use these operators and still produce a computable program, such as one that compares a real number drawn from a normal distribution with 0.

We propose locale theory, and particularly non-spatial sublocales, as a constructive semantic framework for probabilistic programming. Whereas in measure theory, measurable spaces may not have a smallest probability-1 subspace (for a given probability distribution), some locales have smallest probability-1 sublocales, called random sublocales. Partial functions that almost surely terminate and discontinuous functions that are almost everywhere continuous become terminating and continuous, respectively, when restricted to random sublocales. We present a definition of disintegration and provide an example distribution where in locale theory, a unique continuous disintegration exists using random sublocales, whereas classically the disintegration is discontinuous and is only unique up to null sets.

Authors: Benjamin Sherman, Jared Tramontano, Michael Carbin

Extended Abstract: Constructive probabilistic semantics with non-spatial locales

Posted in Uncategorized | Comments Off on Constructive probabilistic semantics with non-spatial locales

TensorFlow Distributions

The TensorFlow Distributions library implements a vision of probability theory adapted to the modern deep-learning paradigm of end-to-end differentiable computation. Building on two basic abstractions, it offers flexible building blocks for probabilistic computation. Distributions provide fast, numerically stable methods for generating samples and computing statistics, e.g., log density. Bijectors provide composable volume-tracking transformations with automatic caching. Together these enable modular construction of high dimensional distributions and transformations not possible with previous libraries (e.g., pixelCNNs, autoregressive flows, and reversible residual networks). They are the workhorse behind deep probabilistic programming systems like Edward and empower fast black-box inference in probabilistic models built on deep-network components. TensorFlow Distributions has proven an important part of the TensorFlow toolkit within Google and in the broader deep learning community.

Authors: Joshua V. Dillon, Ian Langmore, Eugene Brevdo, Srinivas Vasudevan,Brian Patton, Matt Hoffman, Dave Moore, Dustin Tran, Rif A. Saurous

Extended Abstract

Long Version

Posted in Uncategorized | Comments Off on TensorFlow Distributions