ab.layers¶

Network layers and utilities.

class aboleth.layers.Activation(h=<function Activation.<lambda>>)¶

Bases: aboleth.baselayers.Layer

Activation function layer.

Parameters:	h (callable) – the element-wise activation function.

__call__(X)¶

Construct the subgraph for this layer.

Parameters:	X (Tensor) – the input to this layer
Returns:	Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.

class aboleth.layers.Conv2DMAP(filters, kernel_size, strides=(1, 1), padding='SAME', l1_reg=1.0, l2_reg=1.0, use_bias=True)¶

Bases: aboleth.layers.SampleLayer

A 2D convolution layer, with maximum a posteriori (MAP) inference.

This layer uses maximum a-posteriori inference to learn the convolutional kernels and biases, and so also returns complexity penalities (l1 or l2) for the weights and biases.

Parameters:

filters (int) – the dimension of the output of this layer (i.e. the number of filters in the convolution).
kernel_size (int, tuple or list) – width and height of the 2D convolution window. Can be a single integer to specify the same value for all spatial dimensions.
strides (int, tuple or list) – the strides of the convolution along the height and width. Can be a single integer to specify the same value for all spatial dimensions
padding (str) – One of ‘SAME’ or ‘VALID’. Defaults to ‘SAME’. The type of padding algorithm to use.
l1_reg (float) – the value of the l1 weight regularizer, \(\text{l1_reg} \times \|\mathbf{W}\|_1\)
l2_reg (float) – the value of the l2 weight regularizer, \(\frac{1}{2} \text{l2_reg} \times \|\mathbf{W}\|^2_2\)
use_bias (bool) – If true, also learn a bias weight, e.g. a constant offset weight.

__call__(X)¶

Construct the subgraph for this layer.

Parameters:	X (Tensor) – the input to this layer
Returns:	Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.

class aboleth.layers.Conv2DVariational(filters, kernel_size, strides=(1, 1), padding='SAME', std=1.0, use_bias=True, prior_W=None, prior_b=None, post_W=None, post_b=None)¶

Bases: aboleth.layers.SampleLayer

A 2D convolution layer, with variational inference.

(Does not currently support full covariance weights.)

Parameters:

filters (int) – the dimension of the output of this layer (i.e. the number of filters in the convolution).
kernel_size (int, tuple or list) – width and height of the 2D convolution window. Can be a single integer to specify the same value for all spatial dimensions.
strides (int, tuple or list) – the strides of the convolution along the height and width. Can be a single integer to specify the same value for all spatial dimensions
padding (str) – One of ‘SAME’ or ‘VALID’. Defaults to ‘SAME’. The type of padding algorithm to use.
std (float) – the initial value of the weight prior standard deviation (\(\sigma\) above), this is optimized a la maximum likelihood type II.
use_bias (bool) – If true, also learn a bias weight, e.g. a constant offset weight.
prior_W (tf.distributions.Distribution, optional) – This is the prior distribution object to use on the layer weights. It must have parameters compatible with (input_dim, output_dim) shaped weights. This ignores the std parameter.
prior_b (tf.distributions.Distribution, optional) – This is the prior distribution object to use on the layer intercept. It must have parameters compatible with (output_dim,) shaped weights. This ignores the std and use_bias parameters.
post_W (tf.distributions.Distribution, optional) – It must have parameters compatible with (input_dim, output_dim) shaped weights. This ignores the full parameter. See also distributions.gaus_posterior.
post_b (tf.distributions.Distributions, optional) – This is the posterior distribution object to use on the layer intercept. It must have parameters compatible with (output_dim,) shaped weights. This ignores the use_bias parameters. See also distributions.norm_posterior.

__call__(X)¶

Construct the subgraph for this layer.

Parameters:	X (Tensor) – the input to this layer
Returns:	Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.

class aboleth.layers.DenseMAP(output_dim, l1_reg=1.0, l2_reg=1.0, use_bias=True)¶

Bases: aboleth.layers.SampleLayer

Dense (fully connected) linear layer, with MAP inference.

This implements a linear layer, and when called returns

\[f(\mathbf{X}) = \mathbf{X} \mathbf{W} + \mathbf{b}\]

where \(\mathbf{X} \in \mathbb{R}^{N \times D_{in}}\), \(\mathbf{W} \in \mathbb{R}^{D_{in} \times D_{out}}\) and \(\mathbf{b} \in \mathbb{R}^{D_{out}}\). This layer uses maximum a-posteriori inference to learn the weights and biases, and so also returns complexity penalities (l1 or l2) for the weights and biases.

Parameters:	output_dim (int) – the dimension of the output of this layer l1_reg (float) – the value of the l1 weight regularizer, \(\text{l1_reg} \times \\|\mathbf{W}\\|_1\) l2_reg (float) – the value of the l2 weight regularizer, \(\frac{1}{2} \text{l2_reg} \times \\|\mathbf{W}\\|^2_2\) use_bias (bool) – If true, also learn a bias weight, e.g. a constant offset weight.

__call__(X)¶

Construct the subgraph for this layer.

Parameters:	X (Tensor) – the input to this layer
Returns:	Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.

class aboleth.layers.DenseVariational(output_dim, std=1.0, full=False, use_bias=True, prior_W=None, prior_b=None, post_W=None, post_b=None)¶

Bases: aboleth.layers.SampleLayer3

A dense (fully connected) linear layer, with variational inference.

This implements a dense linear layer,

\[f(\mathbf{X}) = \mathbf{X} \mathbf{W} + \mathbf{b}\]

where prior, \(p(\cdot)\), and approximate posterior, \(q(\cdot)\) distributions are placed on the weights and also the biases. Here \(\mathbf{X} \in \mathbb{R}^{N \times D_{in}}\), \(\mathbf{W} \in \mathbb{R}^{D_{in} \times D_{out}}\) and \(\mathbf{b} \in \mathbb{R}^{D_{out}}\). By default, the same Normal prior is placed on each of the layer weights and biases,

\[w_{ij} \sim \mathcal{N}(0, \sigma^2), \quad b_{j} \sim \mathcal{N}(0, \sigma^2),\]

and a different Normal posterior is learned for each of the layer weights and biases,

\[w_{ij} \sim \mathcal{N}(m_{ij}, c_{ij}), \quad b_{j} \sim \mathcal{N}(l_{j}, o_{j}).\]

We also have the option of placing full-covariance Gaussian posteriors on the input dimension of the weights,

\[\mathbf{w}_{j} \sim \mathcal{N}(\mathbf{m}_{j}, \mathbf{C}_{j}),\]

where \(\mathbf{m}_j \in \mathbb{R}^{D_{in}}\) and \(\mathbf{C}_j \in \mathbb{R}^{D_{in} \times D_{in}}\).

This layer will use variational inference to learn all of the non-zero prior and posterior parameters.

Whenever this layer is called, it will return the result,

\[f^{(s)}(\mathbf{X}) = \mathbf{X} \mathbf{W}^{(s)} + \mathbf{b}^{(s)}\]

with samples from the posteriors, \(\mathbf{W}^{(s)} \sim q(\mathbf{W})\) and \(\mathbf{b}^{(s)} \sim q(\mathbf{b})\). The number of samples, s, can be controlled by using the n_samples argument in an InputLayer used to feed the first layer of a model, or by tiling \(\mathbf{X}\) on the first dimension. This layer also returns the result of \(\text{KL}[q\|p]\) for all parameters.

Parameters:

output_dim (int) – the dimension of the output of this layer
std (float) – the initial value of the weight prior standard deviation (\(\sigma\) above), this is optimized a la maximum likelihood type II.
full (bool) – If true, use a full covariance Gaussian posterior for each of the output weight columns, otherwise use an independent (diagonal) Normal posterior.
use_bias (bool) – If true, also learn a bias weight, e.g. a constant offset weight.
prior_W (tf.distributions.Distribution, optional) – This is the prior distribution object to use on the layer weights. It must have parameters compatible with (input_dim, output_dim) shaped weights. This ignores the std parameter.
prior_b (tf.distributions.Distribution, optional) – This is the prior distribution object to use on the layer intercept. It must have parameters compatible with (output_dim,) shaped weights. This ignores the std and use_bias parameters.
post_W (tf.distributions.Distribution, optional) – It must have parameters compatible with (input_dim, output_dim) shaped weights. This ignores the full parameter. See also distributions.gaus_posterior.
post_b (tf.distributions.Distributions, optional) – This is the posterior distribution object to use on the layer intercept. It must have parameters compatible with (output_dim,) shaped weights. This ignores the use_bias parameters. See also distributions.norm_posterior.

__call__(X)¶

Construct the subgraph for this layer.

Parameters:	X (Tensor) – the input to this layer
Returns:	Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.

class aboleth.layers.DropOut(keep_prob, observation_axis=1)¶

Bases: aboleth.baselayers.Layer

Dropout layer, Bernoulli probability of not setting an input to zero.

This is just a thin wrapper around tf.dropout

Parameters:

keep_prob (float, Tensor) –
the probability of keeping an input. See tf.dropout.
observation_axis (int) – The axis that indexes the observations (N). This will assume the obserations are on the second axis, i.e. (n_samples, N, ...). This is so we can repeat the dropout pattern over observations, which has the effect of dropping out weights consistently, thereby sampling the “latent function” of the layer.

__call__(X)¶

Construct the subgraph for this layer.

Parameters:	X (Tensor) – the input to this layer
Returns:	Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.

class aboleth.layers.EmbedMAP(output_dim, n_categories, l1_reg=1.0, l2_reg=1.0)¶

Bases: aboleth.layers.SampleLayer3

Dense (fully connected) embedding layer, with MAP inference.

This layer works directly inputs of K category indices rather than one-hot representations, for efficiency. Each column of the input is embedded seperately, and the result concatenated along the last axis. It is a dense linear layer,

\[f(\mathbf{X}) = \mathbf{X} \mathbf{W}\]

Here \(\mathbf{X} \in \mathbb{N}_2^{N \times K}\) and \(\mathbf{W} \in \mathbb{R}^{K \times D_{out}}\). Though in code we represent \(\mathbf{X}\) as a vector of indices in \(\mathbb{N}_K^{N \times 1}\). This layer uses maximum a-posteriori inference to learn the weights and so also returns complexity penalities (l1 or l2) for the weights.

Parameters:	output_dim (int) – the dimension of the output (embedding) of this layer n_categories (int) – the number of categories in the input variable l1_reg (float) – the value of the l1 weight regularizer, \(\text{l1_reg} \times \\|\mathbf{W}\\|_1\) l2_reg (float) – the value of the l2 weight regularizer, \(\frac{1}{2} \text{l2_reg} \times \\|\mathbf{W}\\|^2_2\)

__call__(X)¶

Construct the subgraph for this layer.

Parameters:	X (Tensor) – the input to this layer
Returns:	Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.

class aboleth.layers.EmbedVariational(output_dim, n_categories, std=1.0, full=False, prior_W=None, post_W=None)¶

Bases: aboleth.layers.DenseVariational

Dense (fully connected) embedding layer, with variational inference.

This layer works directly inputs of K category indices rather than one-hot representations, for efficiency. Each column of the input is embedded seperately, and the result concatenated along the last axis. It is a dense linear layer,

\[f(\mathbf{X}) = \mathbf{X} \mathbf{W},\]

where prior, \(p(\cdot)\), and approximate posterior, \(q(\cdot)\) distributions are placed on the weights. Here \(\mathbf{X} \in \mathbb{N}_2^{N \times K}\) and \(\mathbf{W} \in \mathbb{R}^{K \times D_{out}}\). Though in code we represent \(\mathbf{X}\) as a vector of indices in \(\mathbb{N}_K^{N \times 1}\). By default, the same Normal prior is placed on each of the layer weights,

\[w_{ij} \sim \mathcal{N}(0, \sigma^2),\]

and a different Normal posterior is learned for each of the layer weights,

\[w_{ij} \sim \mathcal{N}(m_{ij}, c_{ij}).\]

We also have the option of placing full-covariance Gaussian posteriors on the input dimension of the weights,

\[\mathbf{w}_{j} \sim \mathcal{N}(\mathbf{m}_{j}, \mathbf{C}_{j}),\]

where \(\mathbf{m}_j \in \mathbb{R}^{K}\) and \(\mathbf{C}_j \in \mathbb{R}^{K \times K}\).

This layer will use variational inference to learn all of the non-zero prior and posterior parameters.

Whenever this layer is called, it will return the result,

\[f^{(s)}(\mathbf{X}) = \mathbf{X} \mathbf{W}^{(s)}\]

with samples from the posterior, \(\mathbf{W}^{(s)} \sim q(\mathbf{W})\). The number of samples, s, can be controlled by using the n_samples argument in an InputLayer used to feed the first layer of a model, or by tiling \(\mathbf{X}\) on the first dimension. This layer also returns the result of \(\text{KL}[q\|p]\) for all parameters.

Parameters:

output_dim (int) – the dimension of the output (embedding) of this layer
n_categories (int) – the number of categories in the input variable
std (float) – the initial value of the weight prior standard deviation (\(\sigma\) above), this is optimized a la maximum likelihood type II.
full (bool) – If true, use a full covariance Gaussian posterior for each of the output weight columns, otherwise use an independent (diagonal) Normal posterior.
prior_W (tf.distributions.Distribution, optional) – This is the prior distribution object to use on the layer weights. It must have parameters compatible with (input_dim, output_dim) shaped weights. This ignores the std parameter.
post_W (tf.distributions.Distribution, optional) – This is the posterior distribution object to use on the layer weights. It must have parameters compatible with (input_dim, output_dim) shaped weights. This ignores the full parameter. See also distributions.gaus_posterior.

__call__(X)¶

Construct the subgraph for this layer.

Parameters:	X (Tensor) – the input to this layer
Returns:	Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.

class aboleth.layers.InputLayer(name, n_samples=1)¶

Bases: aboleth.baselayers.MultiLayer

Create an input layer.

This layer defines input kwargs so that a user may easily provide the right inputs to a complex set of layers. It takes a tensor of shape (N, ...). The input is tiled along a new first axis creating a (n_samples, N, ...) tensor for propagating samples through a variational deep net.

Parameters:	name (string) – The name of the input. Used as the argument for input into the net. n_samples (int, Tensor) – The number of samples to propagate through the network. We recommend making this a `tf.placeholder` so you can vary it as required.

Note

We recommend making n_samples a tf.placeholder so it can be varied between training and prediction!

__call__(**kwargs)¶

Construct the subgraph for this layer.

Parameters:	**kwargs – the inputs to this layer (Tensors)
Returns:	Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.

class aboleth.layers.MaxPool2D(pool_size, strides, padding='SAME')¶

Bases: aboleth.baselayers.Layer

Max pooling layer for 2D inputs (e.g. images).

This is just a thin wrapper around tf.nn.max_pool

Parameters:	pool_size (tuple or list of 2 ints) – width and height of the pooling window. strides (tuple or list of 2 ints) – the strides of the pooling operation along the height and width. padding (str) – One of ‘SAME’ or ‘VALID’. Defaults to ‘SAME’. The type of padding

__call__(X)¶

Construct the subgraph for this layer.

Parameters:	X (Tensor) – the input to this layer
Returns:	Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.

class aboleth.layers.RandomArcCosine(n_features, lenscale=1.0, p=1, variational=False, lenscale_posterior=None)¶

Bases: aboleth.layers.RandomFourier

Random arc-cosine kernel layer.

Parameters:

n_features (int) – the number of unique random features, the actual output dimension of this layer will be 2 * n_features.
lenscale (float, ndarray, Tensor) – the lenght scales of the ar-cosine kernel, this can be a scalar for an isotropic kernel, or a vector for an automatic relevance detection (ARD) kernel.
p (int) – The order of the arc-cosine kernel, this must be an integer greater than, or eual to zero. 0 will lead to sigmoid-like kernels, 1 will lead to relu-like kernels, 2 quadratic-relu kernels etc.
variational (bool) – use variational features instead of random features, (i.e. VAR-FIXED in [2]).
lenscale_posterior (float, ndarray, optional) – the initial value for the posterior length scale. This is only used if variational==True. This can be a scalar or vector (different initial value per input dimension). If this is left as None, it will be set to sqrt(1 / input_dim) (this is similar to the ‘auto’ setting for a scikit learn SVM with a RBF kernel).

Note

This should be followed by a dense layer to properly implement a kernel approximation.