ab.layers¶

Network layers and utilities.

class aboleth.layers.Activation(h=<function Activation.<lambda>>)¶

Bases: aboleth.baselayers.Layer

Activation function layer.

Parameters:	h (callable) – the element-wise activation function.

__call__(X)¶

Construct the subgraph for this layer.

Parameters:	X (Tensor) – the input to this layer
Returns:	Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.

class aboleth.layers.Conv2D(filters, kernel_size, strides=(1, 1), padding='SAME', l1_reg=0.0, l2_reg=0.0, use_bias=True, init_fn='glorot_trunc')¶

Bases: aboleth.layers.SampleLayer

A 2D convolution layer.

This layer uses maximum likelihood or maximum a-posteriori inference to learn the convolutional kernels and biases, and so also returns complexity penalities (l1 or l2) for the weights and biases.

Parameters:

filters (int) – the dimension of the output of this layer (i.e. the number of filters in the convolution).
kernel_size (int, tuple or list) – width and height of the 2D convolution window. Can be a single integer to specify the same value for all spatial dimensions.
strides (int, tuple or list) – the strides of the convolution along the height and width. Can be a single integer to specify the same value for all spatial dimensions
padding (str) – One of ‘SAME’ or ‘VALID’. Defaults to ‘SAME’. The type of padding algorithm to use.
l1_reg (float) – the value of the l1 weight regularizer, \(\text{l1_reg} \times \|\mathbf{W}\|_1\)
l2_reg (float) – the value of the l2 weight regularizer, \(\frac{1}{2} \text{l2_reg} \times \|\mathbf{W}\|^2_2\)
use_bias (bool) – If true, also learn a bias weight, e.g. a constant offset weight.
init_fn (str, callable) – The function to use to initialise the weights. The default is ‘glorot_trunc’, the truncated normal glorot function. If supplied, the callable takes a shape (input_dim, output_dim) as an argument and returns the weight matrix.

__call__(X)¶

Construct the subgraph for this layer.

Parameters:	X (Tensor) – the input to this layer
Returns:	Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.

class aboleth.layers.Conv2DVariational(filters, kernel_size, strides=(1, 1), padding='SAME', prior_std='glorot', learn_prior=False, use_bias=True)¶

Bases: aboleth.layers.SampleLayer

A 2D convolution layer, with variational inference.

(Does not currently support full covariance weights.)

Parameters:

filters (int) – the dimension of the output of this layer (i.e. the number of filters in the convolution).
kernel_size (int, tuple or list) – width and height of the 2D convolution window. Can be a single integer to specify the same value for all spatial dimensions.
strides (int, tuple or list) – the strides of the convolution along the height and width. Can be a single integer to specify the same value for all spatial dimensions
padding (str) – One of ‘SAME’ or ‘VALID’. Defaults to ‘SAME’. The type of padding algorithm to use.
prior_std (str, float) – the value of the weight prior standard deviation (\(\sigma\) above). The user can also provide a string to specify an initialisation function. Defaults to ‘glorot’. If a string, must be one of ‘glorot’ or ‘autonorm’.
learn_prior (bool, optional) – Whether to learn the prior standard deviation.
use_bias (bool) – If true, also learn a bias weight, e.g. a constant offset weight.

__call__(X)¶

Construct the subgraph for this layer.

Parameters:	X (Tensor) – the input to this layer
Returns:	Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.

class aboleth.layers.Dense(output_dim, l1_reg=0.0, l2_reg=0.0, use_bias=True, init_fn='glorot')¶

Bases: aboleth.layers.SampleLayer

Dense (fully connected) linear layer.

This implements a linear layer, and when called returns

\[f(\mathbf{X}) = \mathbf{X} \mathbf{W} + \mathbf{b}\]

where \(\mathbf{X} \in \mathbb{R}^{N \times D_{in}}\), \(\mathbf{W} \in \mathbb{R}^{D_{in} \times D_{out}}\) and \(\mathbf{b} \in \mathbb{R}^{D_{out}}\). This layer uses maximum likelihood or maximum a-posteriori inference to learn the weights and biases, and so also returns complexity penalities (l1 or l2) for the weights and biases.

Parameters:

output_dim (int) – the dimension of the output of this layer
l1_reg (float) – the value of the l1 weight regularizer, \(\text{l1_reg} \times \|\mathbf{W}\|_1\)
l2_reg (float) – the value of the l2 weight regularizer, \(\frac{1}{2} \text{l2_reg} \times \|\mathbf{W}\|^2_2\)
use_bias (bool) – If true, also learn a bias weight, e.g. a constant offset weight.
init_fn (str, callable) – The function to use to initialise the weights. The default is ‘glorot’, the uniform glorot function. If supplied, the callable takes a shape (input_dim, output_dim) as an argument and returns the weight matrix.

__call__(X)¶

Construct the subgraph for this layer.

Parameters:	X (Tensor) – the input to this layer
Returns:	Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.

class aboleth.layers.DenseNCP(output_dim, prior_std=1.0, learn_prior=False, use_bias=True, latent_mean=0.0, latent_std=1.0)¶

Bases: aboleth.layers.DenseVariational

A DenseVariational layer with Noise Constrastive Prior.

This is basically just a DenseVariational layer, but with an added Kullback Leibler penalty on the latent function, as derived in Equation (6) in “Reliable Uncertainty Estimates in Deep Neural Networks using Noise Contrastive Priors” https://arxiv.org/abs/1807.09289.

This should be the last layer in a network, and needs to be used in conjuction with NCPContinuousPerturb and/or NCPCategoricalPerturb layers (after an input layer). For example:

net = (
    ab.InputLayer(name="X", n_samples=n_samples_) >>
    ab.NCPContinuousPerturb() >>
    ab.Dense(output_dim=32) >>
    ab.Activation(tf.nn.selu) >>
    ...
    ab.Dense(output_dim=8) >>
    ab.Activation(tf.nn.selu) >>
    ab.DenseNCP(output_dim=1)
)

As you can see from this example, we have only made the last layer probabilistic/Bayesian (DenseNCP), and have left the rest of the network maximum likelihood/MAP. This is also how the original authors of the algorithm have implemented it. While this layer also works with DenseVariational layers (etc.) this is not how is has been originally implemented, and the contribution of uncertainty from these layers to the latent function will not be accounted for in this layer. This is because the nonlinear activations between layers make evaluating this density intractable, unless we had something like normalising flows.

Parameters:

output_dim (int) – the dimension of the output of this layer
prior_std (str, float) – the value of the weight prior standard deviation (\(\sigma\) above). The user can also provide a string to specify an initialisation function. Defaults to ‘glorot’. If a string, must be one of ‘glorot’ or ‘autonorm’.
learn_prior (bool, optional) – Whether to learn the prior on the weights.
use_bias (bool) – If true, also learn a bias weight, e.g. a constant offset weight.
latent_mean (float) – The prior mean over the latent function(s) on the output of this layer. This specifies what value the latent function should take away from the support of the training data.
latent_std (float) – The prior standard deviation over the latent function(s) on the output of this layer. This controls the strength of the regularisation away from the latent mean.

Note

This implementation is inspired by: https://github.com/brain-research/ncp/blob/master/ncp/models/bbb_ncp.py

__call__(X)¶

Construct the subgraph for this layer.

Parameters:	X (Tensor) – the input to this layer
Returns:	Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.

class aboleth.layers.DenseVariational(output_dim, prior_std=1.0, learn_prior=False, full=False, use_bias=True)¶

Bases: aboleth.layers.SampleLayer3

A dense (fully connected) linear layer, with variational inference.

This implements a dense linear layer,

\[f(\mathbf{X}) = \mathbf{X} \mathbf{W} + \mathbf{b}\]

where prior, \(p(\cdot)\), and approximate posterior, \(q(\cdot)\) distributions are placed on the weights and also the biases. Here \(\mathbf{X} \in \mathbb{R}^{N \times D_{in}}\), \(\mathbf{W} \in \mathbb{R}^{D_{in} \times D_{out}}\) and \(\mathbf{b} \in \mathbb{R}^{D_{out}}\). By default, the same Normal prior is placed on each of the layer weights and biases,

\[w_{ij} \sim \mathcal{N}(0, \sigma^2), \quad b_{j} \sim \mathcal{N}(0, \sigma^2),\]

and a different Normal posterior is learned for each of the layer weights and biases,

\[w_{ij} \sim \mathcal{N}(m_{ij}, c_{ij}), \quad b_{j} \sim \mathcal{N}(l_{j}, o_{j}).\]

We also have the option of placing full-covariance Gaussian posteriors on the input dimension of the weights,

\[\mathbf{w}_{j} \sim \mathcal{N}(\mathbf{m}_{j}, \mathbf{C}_{j}),\]

where \(\mathbf{m}_j \in \mathbb{R}^{D_{in}}\) and \(\mathbf{C}_j \in \mathbb{R}^{D_{in} \times D_{in}}\).

This layer will use variational inference to learn the posterior parameters, and optionally the prior_std parameter can be learned if learn_prior is set to True. The given value is then used to initialize.

Whenever this layer is called, it will return the result,

\[f^{(s)}(\mathbf{X}) = \mathbf{X} \mathbf{W}^{(s)} + \mathbf{b}^{(s)}\]

with samples from the posteriors, \(\mathbf{W}^{(s)} \sim q(\mathbf{W})\) and \(\mathbf{b}^{(s)} \sim q(\mathbf{b})\). The number of samples, s, can be controlled by using the n_samples argument in an InputLayer used to feed the first layer of a model, or by tiling \(\mathbf{X}\) on the first dimension. This layer also returns the result of \(\text{KL}[q\|p]\) for all parameters.

Parameters:

output_dim (int) – the dimension of the output of this layer
prior_std (str, float) – the value of the weight prior standard deviation (\(\sigma\) above). The user can also provide a string to specify an initialisation function. Defaults to ‘glorot’. If a string, must be one of ‘glorot’ or ‘autonorm’.
learn_prior (bool, optional) – Whether to learn the prior
full (bool) – If true, use a full covariance Gaussian posterior for each of the output weight columns, otherwise use an independent (diagonal) Normal posterior.
use_bias (bool) – If true, also learn a bias weight, e.g. a constant offset weight.

__call__(X)¶

Construct the subgraph for this layer.

Parameters:	X (Tensor) – the input to this layer
Returns:	Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.

class aboleth.layers.DropOut(keep_prob, independent=True, observation_axis=1, alpha=False)¶

Bases: aboleth.baselayers.Layer

Dropout layer, Bernoulli probability of not setting an input to zero.

This is just a thin wrapper around tf.dropout

Parameters:

keep_prob (float, Tensor) –
the probability of keeping an input. See tf.dropout.
independent (bool) – Use independently sampled droput for each observation if True. This may dramatically increase convergence, but will no longer only sample the latent function.
observation_axis (int) – The axis that indexes the observations (N). This will assume the obserations are on the second axis, i.e. (n_samples, N, ...). This is so we can repeat the dropout pattern over observations, which has the effect of dropping out weights consistently, thereby sampling the “latent function” of the layer. This is only active if independent is set to False.
alpha (bool) – Use alpha dropout (tf.contrib.nn.alpha_dropout) that maintains the self normalising property of SNNs.

Note

If a more complex noise shape, or some other modification to dropout is required, you can use an Activation layer. E.g. ab.Activation(lambda x: tf.nn.dropout(x, **your_args)).

__call__(X)¶

Construct the subgraph for this layer.

Parameters:	X (Tensor) – the input to this layer
Returns:	Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.

class aboleth.layers.Embed(output_dim, n_categories, l1_reg=0.0, l2_reg=0.0, init_fn='glorot')¶

Bases: aboleth.layers.SampleLayer3

Dense (fully connected) embedding layer.

This layer works directly on inputs of K category indices rather than one-hot representations, for efficiency. Note, this only works on a single column, see the PerFeature layer to embed multiple columns. E.g.

cat_layers = [Embed(10, k) for k in x_categories]

net = (
    ab.InputLayer(name="X", n_samples=n_samples_) >>
    ab.PerFeature(*cat_layers) >>
    ab.Activation(tf.nn.selu) >>
    ...
)

It is a dense linear layer,

\[f(\mathbf{X}) = \mathbf{X} \mathbf{W}\]

Here \(\mathbf{X} \in \mathbb{N}_2^{N \times K}\) and \(\mathbf{W} \in \mathbb{R}^{K \times D_{out}}\). Though in code we represent \(\mathbf{X}\) as a vector of indices in \(\mathbb{N}_K^{N \times 1}\). This layer uses maximum likelihood or maximum a-posteriori inference to learn the weights and so also returns complexity penalities (l1 or l2) for the weights.

Parameters:

output_dim (int) – the dimension of the output (embedding) of this layer
n_categories (int) – the number of categories in the input variable
l1_reg (float) – the value of the l1 weight regularizer, \(\text{l1_reg} \times \|\mathbf{W}\|_1\)
l2_reg (float) – the value of the l2 weight regularizer, \(\frac{1}{2} \text{l2_reg} \times \|\mathbf{W}\|^2_2\)
init_fn (str, callable) – The function to use to initialise the weights. The default is ‘glorot’, the uniform glorot function. If supplied, the callable takes a shape (input_dim, output_dim) as an argument and returns the weight matrix.

__call__(X)¶

Construct the subgraph for this layer.

Parameters:	X (Tensor) – the input to this layer
Returns:	Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.

class aboleth.layers.EmbedVariational(output_dim, n_categories, prior_std=1.0, learn_prior=False, full=False)¶

Bases: aboleth.layers.DenseVariational

Dense (fully connected) embedding layer, with variational inference.

This layer works directly on inputs of K category indices rather than one-hot representations, for efficiency. Note, this only works on a single column, see the PerFeature layer to embed multiple columns. Eg.

cat_layers = [EmbedVar(10, k) for k in x_categories]

net = (
    ab.InputLayer(name="X", n_samples=n_samples_) >>
    ab.PerFeature(*cat_layers) >>
    ab.Activation(tf.nn.selu) >>
    ...
)

This layer is a effectively a DenseVariational layer,

\[f(\mathbf{X}) = \mathbf{X} \mathbf{W},\]

where prior, \(p(\cdot)\), and approximate posterior, \(q(\cdot)\) distributions are placed on the weights. Here \(\mathbf{X} \in \mathbb{N}_2^{N \times K}\) and \(\mathbf{W} \in \mathbb{R}^{K \times D_{out}}\). Though in code we represent \(\mathbf{X}\) as a vector of indices in \(\mathbb{N}_K^{N \times 1}\). By default, the same Normal prior is placed on each of the layer weights,

\[w_{ij} \sim \mathcal{N}(0, \sigma^2),\]

and a different Normal posterior is learned for each of the layer weights,

\[w_{ij} \sim \mathcal{N}(m_{ij}, c_{ij}).\]

We also have the option of placing full-covariance Gaussian posteriors on the input dimension of the weights,

\[\mathbf{w}_{j} \sim \mathcal{N}(\mathbf{m}_{j}, \mathbf{C}_{j}),\]

where \(\mathbf{m}_j \in \mathbb{R}^{K}\) and \(\mathbf{C}_j \in \mathbb{R}^{K \times K}\).

This layer will use variational inference to learn the posterior parameters, and optionally the prior_std parameter can be learned if learn_prior is set to True. The prior_std value given will be used for initialization.

Whenever this layer is called, it will return the result,

\[f^{(s)}(\mathbf{X}) = \mathbf{X} \mathbf{W}^{(s)}\]

with samples from the posterior, \(\mathbf{W}^{(s)} \sim q(\mathbf{W})\). The number of samples, s, can be controlled by using the n_samples argument in an InputLayer used to feed the first layer of a model, or by tiling \(\mathbf{X}\) on the first dimension. This layer also returns the result of \(\text{KL}[q\|p]\) for all parameters.

Parameters:

output_dim (int) – the dimension of the output (embedding) of this layer
n_categories (int) – the number of categories in the input variable
prior_std (str, float) – the value of the weight prior standard deviation (\(\sigma\) above). The user can also provide a string to specify an initialisation function. Defaults to ‘glorot’. If a string, must be one of ‘glorot’ or ‘autonorm’.
learn_prior (bool, optional) – Whether to learn the prior
full (bool) – If true, use a full covariance Gaussian posterior for each of the output weight columns, otherwise use an independent (diagonal) Normal posterior.

__call__(X)¶

Construct the subgraph for this layer.

Parameters:	X (Tensor) – the input to this layer
Returns:	Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.

class aboleth.layers.Flatten¶

Bases: aboleth.baselayers.Layer

Flattening layer.

Reshape and output a tensor to be always rank 3 (keeps first dimension which is samples, and second dimension which is observations).

I.e. if X.shape is (3, 100, 5, 5, 3) this flatten the last dimensions to (3, 100, 75).

__call__(X)¶

Construct the subgraph for this layer.

Parameters:	X (Tensor) – the input to this layer
Returns:	Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.

class aboleth.layers.InputLayer(name, n_samples=1)¶

Bases: aboleth.baselayers.MultiLayer

Create an input layer.

This layer defines input kwargs so that a user may easily provide the right inputs to a complex set of layers. It takes a tensor of shape (N, ...). The input is tiled along a new first axis creating a (n_samples, N, ...) tensor for propagating samples through a variational deep net.

Parameters:	name (string) – The name of the input. Used as the argument for input into the net. n_samples (int, Tensor) – The number of samples to propagate through the network. We recommend making this a `tf.placeholder` so you can vary it as required.

Note

We recommend making n_samples a tf.placeholder so it can be varied between training and prediction!

__call__(**kwargs)¶

Construct the subgraph for this layer.

Parameters:	**kwargs – the inputs to this layer (Tensors)
Returns:	Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.

class aboleth.layers.MaxPool2D(pool_size, strides, padding='SAME')¶

Bases: aboleth.baselayers.Layer

Max pooling layer for 2D inputs (e.g. images).

This is just a thin wrapper around tf.nn.max_pool

Parameters:	pool_size (tuple or list of 2 ints) – width and height of the pooling window. strides (tuple or list of 2 ints) – the strides of the pooling operation along the height and width. padding (str) – One of ‘SAME’ or ‘VALID’. Defaults to ‘SAME’. The type of padding

__call__(X)¶

Construct the subgraph for this layer.

Parameters:	X (Tensor) – the input to this layer
Returns:	Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.

class aboleth.layers.NCPCategoricalPerturb(n_categories, flip_prob=0.1)¶

Bases: aboleth.layers.SampleLayer

Noise Constrastive Prior categorical variable perturbation layer.

This layer doubles the number of samples going through the model, and randomly flips the categories in the second set of samples. This implements (the categorical version of) Equation 3 in “Reliable Uncertainty Estimates in Deep Neural Networks using Noise Contrastive Priors” https://arxiv.org/abs/1807.09289.

The choice to randomly flip a category is drawn from a Bernoulli distribution per sample (with probability flip_prob), then the new category is randomly chosen with probability 1 / n_categories.

This should be the first layer in a network after an input layer, and needs to be used in conjuction with DenseNCP. Also, like the embedding layers, this only applies to one column of categorical inputs, so we advise you use it with the PerFeature layer. For example:

cat_layers = [
    (NCPCategoricalPerturb(k) >> Embed(10, k))
    for k in x_categories
]

net = (
    ab.InputLayer(name="X", n_samples=n_samples_) >>
    ab.PerFeature(*cat_layers) >>
    ab.Activation(tf.nn.selu) >>
    ab.Dense(output_dim=32) >>
    ab.Activation(tf.nn.selu) >>
    ...
    ab.Dense(output_dim=8) >>
    ab.Activation(tf.nn.selu) >>
    ab.DenseNCP(output_dim=1)
)

Parameters:	input_noise (float, tf.Tensor, tf.Variable) – The standard deviation of the random perturbation to add to the inputs.

__call__(X)¶

Construct the subgraph for this layer.

Parameters:	X (Tensor) – the input to this layer
Returns:	Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.

class aboleth.layers.NCPContinuousPerturb(input_noise=1.0)¶

Bases: aboleth.layers.SampleLayer

Noise Constrastive Prior continous variable perturbation layer.

This layer doubles the number of samples going through the model, and adds a random normal perturbation to the second set of samples. This implements Equation 3 in “Reliable Uncertainty Estimates in Deep Neural Networks using Noise Contrastive Priors” https://arxiv.org/abs/1807.09289.

This should be the first layer in a network after an input layer, and needs to be used in conjuction with DenseNCP. For example:

net = (
    ab.InputLayer(name="X", n_samples=n_samples_) >>
    ab.NCPContinuousPerturb() >>
    ab.Dense(output_dim=32) >>
    ab.Activation(tf.nn.selu) >>
    ...
    ab.Dense(output_dim=8) >>
    ab.Activation(tf.nn.selu) >>
    ab.DenseNCP(output_dim=1)
)

Parameters:	input_noise (float, tf.Tensor, tf.Variable) – The standard deviation of the random perturbation to add to the inputs.

__call__(X)¶

Construct the subgraph for this layer.

Parameters:	X (Tensor) – the input to this layer
Returns:	Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.

class aboleth.layers.RandomArcCosine(n_features, lenscale=None, p=1, variational=False, learn_lenscale=False)¶

Bases: aboleth.layers.RandomFourier

Random arc-cosine kernel layer.

Parameters:

n_features (int) – the number of unique random features, the actual output dimension of this layer will be 2 * n_features.
lenscale (float, ndarray, optional) – The length scales of the arc-cosine kernel. This can be a scalar for an isotropic kernel, or a vector of shape (input_dim,) for an automatic relevance detection (ARD) kernel. If not provided, it will be set to sqrt(1 / input_dim) (this is similar to the ‘auto’ setting for a scikit learn SVM with a RBF kernel). If learn_lenscale is True, lenscale will be its initial value.
p (int) – The order of the arc-cosine kernel, this must be an integer greater than, or eual to zero. 0 will lead to sigmoid-like kernels, 1 will lead to relu-like kernels, 2 quadratic-relu kernels etc.
variational (bool) – use variational features instead of random features, (i.e. VAR-FIXED in [2]).
learn_lenscale (bool) – Whether to learn the length scale. If True, the lenscale value provided is used for initialisation.

Note

This should be followed by a dense layer to properly implement a kernel approximation.