ab.layers

Network layers and utilities.

class aboleth.layers.Activation(h=<function Activation.<lambda>>)

Bases: aboleth.baselayers.Layer

Activation function layer.

Parameters:h (callable) – the element-wise activation function.
__call__(X)

Construct the subgraph for this layer.

Parameters:X (Tensor) – the input to this layer
Returns:
  • Net (Tensor) – the output of this layer
  • KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
class aboleth.layers.Conv2DMAP(filters, kernel_size, strides=(1, 1), padding='SAME', l1_reg=1.0, l2_reg=1.0, use_bias=True)

Bases: aboleth.layers.SampleLayer

A 2D convolution layer, with maximum a posteriori (MAP) inference.

This layer uses maximum a-posteriori inference to learn the convolutional kernels and biases, and so also returns complexity penalities (l1 or l2) for the weights and biases.

Parameters:
  • filters (int) – the dimension of the output of this layer (i.e. the number of filters in the convolution).
  • kernel_size (int, tuple or list) – width and height of the 2D convolution window. Can be a single integer to specify the same value for all spatial dimensions.
  • strides (int, tuple or list) – the strides of the convolution along the height and width. Can be a single integer to specify the same value for all spatial dimensions
  • padding (str) – One of ‘SAME’ or ‘VALID’. Defaults to ‘SAME’. The type of padding algorithm to use.
  • l1_reg (float) – the value of the l1 weight regularizer, \(\text{l1_reg} \times \|\mathbf{W}\|_1\)
  • l2_reg (float) – the value of the l2 weight regularizer, \(\frac{1}{2} \text{l2_reg} \times \|\mathbf{W}\|^2_2\)
  • use_bias (bool) – If true, also learn a bias weight, e.g. a constant offset weight.
__call__(X)

Construct the subgraph for this layer.

Parameters:X (Tensor) – the input to this layer
Returns:
  • Net (Tensor) – the output of this layer
  • KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
class aboleth.layers.Conv2DVariational(filters, kernel_size, strides=(1, 1), padding='SAME', prior_std=1.0, post_std=1.0, use_bias=True, prior_W=None, prior_b=None, post_W=None, post_b=None)

Bases: aboleth.layers.SampleLayer

A 2D convolution layer, with variational inference.

(Does not currently support full covariance weights.)

Parameters:
  • filters (int) – the dimension of the output of this layer (i.e. the number of filters in the convolution).
  • kernel_size (int, tuple or list) – width and height of the 2D convolution window. Can be a single integer to specify the same value for all spatial dimensions.
  • strides (int, tuple or list) – the strides of the convolution along the height and width. Can be a single integer to specify the same value for all spatial dimensions
  • padding (str) – One of ‘SAME’ or ‘VALID’. Defaults to ‘SAME’. The type of padding algorithm to use.
  • prior_std (float, np.array, tf.Tensor) – the value of the weight prior standard deviation (\(\sigma\) above)
  • post_std (float) – the initial value of the posterior standard deviation.
  • use_bias (bool) – If true, also learn a bias weight, e.g. a constant offset weight.
  • prior_W (tf.distributions.Distribution, optional) – This is the prior distribution object to use on the layer weights. It must have parameters compatible with (input_dim, output_dim) shaped weights. This ignores the prior_std parameter.
  • prior_b (tf.distributions.Distribution, optional) – This is the prior distribution object to use on the layer intercept. It must have parameters compatible with (output_dim,) shaped weights. This ignores the prior_std and use_bias parameters.
  • post_W (tf.distributions.Distribution, optional) – It must have parameters compatible with (input_dim, output_dim) shaped weights. This ignores the full and post_std parameters. See also distributions.norm_posterior.
  • post_b (tf.distributions.Distributions, optional) – This is the posterior distribution object to use on the layer intercept. It must have parameters compatible with (output_dim,) shaped weights. This ignores the full and post_std parameters. See also distributions.norm_posterior.
__call__(X)

Construct the subgraph for this layer.

Parameters:X (Tensor) – the input to this layer
Returns:
  • Net (Tensor) – the output of this layer
  • KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
class aboleth.layers.DenseMAP(output_dim, l1_reg=1.0, l2_reg=1.0, use_bias=True)

Bases: aboleth.layers.SampleLayer

Dense (fully connected) linear layer, with MAP inference.

This implements a linear layer, and when called returns

\[f(\mathbf{X}) = \mathbf{X} \mathbf{W} + \mathbf{b}\]

where \(\mathbf{X} \in \mathbb{R}^{N \times D_{in}}\), \(\mathbf{W} \in \mathbb{R}^{D_{in} \times D_{out}}\) and \(\mathbf{b} \in \mathbb{R}^{D_{out}}\). This layer uses maximum a-posteriori inference to learn the weights and biases, and so also returns complexity penalities (l1 or l2) for the weights and biases.

Parameters:
  • output_dim (int) – the dimension of the output of this layer
  • l1_reg (float) – the value of the l1 weight regularizer, \(\text{l1_reg} \times \|\mathbf{W}\|_1\)
  • l2_reg (float) – the value of the l2 weight regularizer, \(\frac{1}{2} \text{l2_reg} \times \|\mathbf{W}\|^2_2\)
  • use_bias (bool) – If true, also learn a bias weight, e.g. a constant offset weight.
__call__(X)

Construct the subgraph for this layer.

Parameters:X (Tensor) – the input to this layer
Returns:
  • Net (Tensor) – the output of this layer
  • KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
class aboleth.layers.DenseVariational(output_dim, prior_std=1.0, post_std=1.0, full=False, use_bias=True, prior_W=None, prior_b=None, post_W=None, post_b=None)

Bases: aboleth.layers.SampleLayer3

A dense (fully connected) linear layer, with variational inference.

This implements a dense linear layer,

\[f(\mathbf{X}) = \mathbf{X} \mathbf{W} + \mathbf{b}\]

where prior, \(p(\cdot)\), and approximate posterior, \(q(\cdot)\) distributions are placed on the weights and also the biases. Here \(\mathbf{X} \in \mathbb{R}^{N \times D_{in}}\), \(\mathbf{W} \in \mathbb{R}^{D_{in} \times D_{out}}\) and \(\mathbf{b} \in \mathbb{R}^{D_{out}}\). By default, the same Normal prior is placed on each of the layer weights and biases,

\[w_{ij} \sim \mathcal{N}(0, \sigma^2), \quad b_{j} \sim \mathcal{N}(0, \sigma^2),\]

and a different Normal posterior is learned for each of the layer weights and biases,

\[w_{ij} \sim \mathcal{N}(m_{ij}, c_{ij}), \quad b_{j} \sim \mathcal{N}(l_{j}, o_{j}).\]

We also have the option of placing full-covariance Gaussian posteriors on the input dimension of the weights,

\[\mathbf{w}_{j} \sim \mathcal{N}(\mathbf{m}_{j}, \mathbf{C}_{j}),\]

where \(\mathbf{m}_j \in \mathbb{R}^{D_{in}}\) and \(\mathbf{C}_j \in \mathbb{R}^{D_{in} \times D_{in}}\).

This layer will use variational inference to learn the posterior parameters, and optionally the prior_std parameter can be passed in as a tf.Variable, in which case it will also be learned.

Whenever this layer is called, it will return the result,

\[f^{(s)}(\mathbf{X}) = \mathbf{X} \mathbf{W}^{(s)} + \mathbf{b}^{(s)}\]

with samples from the posteriors, \(\mathbf{W}^{(s)} \sim q(\mathbf{W})\) and \(\mathbf{b}^{(s)} \sim q(\mathbf{b})\). The number of samples, s, can be controlled by using the n_samples argument in an InputLayer used to feed the first layer of a model, or by tiling \(\mathbf{X}\) on the first dimension. This layer also returns the result of \(\text{KL}[q\|p]\) for all parameters.

Parameters:
  • output_dim (int) – the dimension of the output of this layer
  • prior_std (float, np.array, tf.Tensor) – the value of the weight prior standard deviation (\(\sigma\) above)
  • post_std (float) – the initial value of the posterior standard deviation.
  • full (bool) – If true, use a full covariance Gaussian posterior for each of the output weight columns, otherwise use an independent (diagonal) Normal posterior.
  • use_bias (bool) – If true, also learn a bias weight, e.g. a constant offset weight.
  • prior_W (tf.distributions.Distribution, optional) – This is the prior distribution object to use on the layer weights. It must have parameters compatible with (input_dim, output_dim) shaped weights. This ignores the prior_std parameter.
  • prior_b (tf.distributions.Distribution, optional) – This is the prior distribution object to use on the layer intercept. It must have parameters compatible with (output_dim,) shaped weights. This ignores the prior_std and use_bias parameters.
  • post_W (tf.distributions.Distribution, optional) – It must have parameters compatible with (input_dim, output_dim) shaped weights. This ignores the full and post_std parameters. See also distributions.gaus_posterior.
  • post_b (tf.distributions.Distributions, optional) – This is the posterior distribution object to use on the layer intercept. It must have parameters compatible with (output_dim,) shaped weights. This ignores the use_bias and post_std parameters. See also distributions.norm_posterior.
__call__(X)

Construct the subgraph for this layer.

Parameters:X (Tensor) – the input to this layer
Returns:
  • Net (Tensor) – the output of this layer
  • KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
class aboleth.layers.DropOut(keep_prob, observation_axis=1)

Bases: aboleth.baselayers.Layer

Dropout layer, Bernoulli probability of not setting an input to zero.

This is just a thin wrapper around tf.dropout

Parameters:
  • keep_prob (float, Tensor) –

    the probability of keeping an input. See tf.dropout.

  • observation_axis (int) – The axis that indexes the observations (N). This will assume the obserations are on the second axis, i.e. (n_samples, N, ...). This is so we can repeat the dropout pattern over observations, which has the effect of dropping out weights consistently, thereby sampling the “latent function” of the layer.
__call__(X)

Construct the subgraph for this layer.

Parameters:X (Tensor) – the input to this layer
Returns:
  • Net (Tensor) – the output of this layer
  • KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
class aboleth.layers.EmbedMAP(output_dim, n_categories, l1_reg=1.0, l2_reg=1.0)

Bases: aboleth.layers.SampleLayer3

Dense (fully connected) embedding layer, with MAP inference.

This layer works directly inputs of K category indices rather than one-hot representations, for efficiency. Each column of the input is embedded seperately, and the result concatenated along the last axis. It is a dense linear layer,

\[f(\mathbf{X}) = \mathbf{X} \mathbf{W}\]

Here \(\mathbf{X} \in \mathbb{N}_2^{N \times K}\) and \(\mathbf{W} \in \mathbb{R}^{K \times D_{out}}\). Though in code we represent \(\mathbf{X}\) as a vector of indices in \(\mathbb{N}_K^{N \times 1}\). This layer uses maximum a-posteriori inference to learn the weights and so also returns complexity penalities (l1 or l2) for the weights.

Parameters:
  • output_dim (int) – the dimension of the output (embedding) of this layer
  • n_categories (int) – the number of categories in the input variable
  • l1_reg (float) – the value of the l1 weight regularizer, \(\text{l1_reg} \times \|\mathbf{W}\|_1\)
  • l2_reg (float) – the value of the l2 weight regularizer, \(\frac{1}{2} \text{l2_reg} \times \|\mathbf{W}\|^2_2\)
__call__(X)

Construct the subgraph for this layer.

Parameters:X (Tensor) – the input to this layer
Returns:
  • Net (Tensor) – the output of this layer
  • KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
class aboleth.layers.EmbedVariational(output_dim, n_categories, prior_std=1.0, post_std=1.0, full=False, prior_W=None, post_W=None)

Bases: aboleth.layers.DenseVariational

Dense (fully connected) embedding layer, with variational inference.

This layer works directly inputs of K category indices rather than one-hot representations, for efficiency. Each column of the input is embedded seperately, and the result concatenated along the last axis. It is a dense linear layer,

\[f(\mathbf{X}) = \mathbf{X} \mathbf{W},\]

where prior, \(p(\cdot)\), and approximate posterior, \(q(\cdot)\) distributions are placed on the weights. Here \(\mathbf{X} \in \mathbb{N}_2^{N \times K}\) and \(\mathbf{W} \in \mathbb{R}^{K \times D_{out}}\). Though in code we represent \(\mathbf{X}\) as a vector of indices in \(\mathbb{N}_K^{N \times 1}\). By default, the same Normal prior is placed on each of the layer weights,

\[w_{ij} \sim \mathcal{N}(0, \sigma^2),\]

and a different Normal posterior is learned for each of the layer weights,

\[w_{ij} \sim \mathcal{N}(m_{ij}, c_{ij}).\]

We also have the option of placing full-covariance Gaussian posteriors on the input dimension of the weights,

\[\mathbf{w}_{j} \sim \mathcal{N}(\mathbf{m}_{j}, \mathbf{C}_{j}),\]

where \(\mathbf{m}_j \in \mathbb{R}^{K}\) and \(\mathbf{C}_j \in \mathbb{R}^{K \times K}\).

This layer will use variational inference to learn the posterior parameters, and optionally the prior_std parameter can be passed in as a tf.Variable, in which case it will also be learned.

Whenever this layer is called, it will return the result,

\[f^{(s)}(\mathbf{X}) = \mathbf{X} \mathbf{W}^{(s)}\]

with samples from the posterior, \(\mathbf{W}^{(s)} \sim q(\mathbf{W})\). The number of samples, s, can be controlled by using the n_samples argument in an InputLayer used to feed the first layer of a model, or by tiling \(\mathbf{X}\) on the first dimension. This layer also returns the result of \(\text{KL}[q\|p]\) for all parameters.

Parameters:
  • output_dim (int) – the dimension of the output (embedding) of this layer
  • n_categories (int) – the number of categories in the input variable
  • prior_std (float, np.array, tf.Tensor) – the value of the weight prior standard deviation (\(\sigma\) above)
  • post_std (float) – the initial value of the posterior standard deviation.
  • full (bool) – If true, use a full covariance Gaussian posterior for each of the output weight columns, otherwise use an independent (diagonal) Normal posterior.
  • prior_W (tf.distributions.Distribution, optional) – This is the prior distribution object to use on the layer weights. It must have parameters compatible with (input_dim, output_dim) shaped weights. This ignores the prior_std parameter.
  • post_W (tf.distributions.Distribution, optional) – This is the posterior distribution object to use on the layer weights. It must have parameters compatible with (input_dim, output_dim) shaped weights. This ignores the full and post_std parameters. See also distributions.gaus_posterior.
__call__(X)

Construct the subgraph for this layer.

Parameters:X (Tensor) – the input to this layer
Returns:
  • Net (Tensor) – the output of this layer
  • KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
class aboleth.layers.InputLayer(name, n_samples=1)

Bases: aboleth.baselayers.MultiLayer

Create an input layer.

This layer defines input kwargs so that a user may easily provide the right inputs to a complex set of layers. It takes a tensor of shape (N, ...). The input is tiled along a new first axis creating a (n_samples, N, ...) tensor for propagating samples through a variational deep net.

Parameters:
  • name (string) – The name of the input. Used as the argument for input into the net.
  • n_samples (int, Tensor) – The number of samples to propagate through the network. We recommend making this a tf.placeholder so you can vary it as required.

Note

We recommend making n_samples a tf.placeholder so it can be varied between training and prediction!

__call__(**kwargs)

Construct the subgraph for this layer.

Parameters:**kwargs – the inputs to this layer (Tensors)
Returns:
  • Net (Tensor) – the output of this layer
  • KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
class aboleth.layers.MaxPool2D(pool_size, strides, padding='SAME')

Bases: aboleth.baselayers.Layer

Max pooling layer for 2D inputs (e.g. images).

This is just a thin wrapper around tf.nn.max_pool

Parameters:
  • pool_size (tuple or list of 2 ints) – width and height of the pooling window.
  • strides (tuple or list of 2 ints) – the strides of the pooling operation along the height and width.
  • padding (str) – One of ‘SAME’ or ‘VALID’. Defaults to ‘SAME’. The type of padding
__call__(X)

Construct the subgraph for this layer.

Parameters:X (Tensor) – the input to this layer
Returns:
  • Net (Tensor) – the output of this layer
  • KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
class aboleth.layers.RandomArcCosine(n_features, lenscale=1.0, p=1, variational=False, lenscale_posterior=None)

Bases: aboleth.layers.RandomFourier

Random arc-cosine kernel layer.

Parameters:
  • n_features (int) – the number of unique random features, the actual output dimension of this layer will be 2 * n_features.
  • lenscale (float, ndarray, Tensor) – the lenght scales of the ar-cosine kernel, this can be a scalar for an isotropic kernel, or a vector for an automatic relevance detection (ARD) kernel.
  • p (int) – The order of the arc-cosine kernel, this must be an integer greater than, or eual to zero. 0 will lead to sigmoid-like kernels, 1 will lead to relu-like kernels, 2 quadratic-relu kernels etc.
  • variational (bool) – use variational features instead of random features, (i.e. VAR-FIXED in [2]).
  • lenscale_posterior (float, ndarray, optional) – the initial value for the posterior length scale. This is only used if variational==True. This can be a scalar or vector (different initial value per input dimension). If this is left as None, it will be set to sqrt(1 / input_dim) (this is similar to the ‘auto’ setting for a scikit learn SVM with a RBF kernel).

Note

This should be followed by a dense layer to properly implement a kernel approximation.

See also

[1] Cho, Youngmin, and Lawrence K. Saul.
“Analysis and extension of arc-cosine kernels for large margin classification.” arXiv preprint arXiv:1112.3712 (2011).
[2] Cutajar, K. Bonilla, E. Michiardi, P. Filippone, M.
Random Feature Expansions for Deep Gaussian Processes. In ICML, 2017.
__call__(X)

Construct the subgraph for this layer.

Parameters:X (Tensor) – the input to this layer
Returns:
  • Net (Tensor) – the output of this layer
  • KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
class aboleth.layers.RandomFourier(n_features, kernel)

Bases: aboleth.layers.SampleLayer3

Random Fourier feature (RFF) kernel approximation layer.

Parameters:
  • n_features (int) – the number of unique random features, the actual output dimension of this layer will be 2 * n_features.
  • kernel (kernels.ShiftInvariant) – the kernel object that yeilds the random samples from the fourier spectrum of a particular kernel to approximate. See the ab.kernels module.

Note

This should be followed by a dense layer to properly implement a kernel approximation.

__call__(X)

Construct the subgraph for this layer.

Parameters:X (Tensor) – the input to this layer
Returns:
  • Net (Tensor) – the output of this layer
  • KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
class aboleth.layers.Reshape(target_shape)

Bases: aboleth.baselayers.Layer

Reshape layer.

Reshape and output an tensor to a specified shape.

Parameters:target_shape (tuple of ints) – Does not include the samples or batch axes.
__call__(X)

Construct the subgraph for this layer.

Parameters:X (Tensor) – the input to this layer
Returns:
  • Net (Tensor) – the output of this layer
  • KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
class aboleth.layers.SampleLayer

Bases: aboleth.baselayers.Layer

Sample Layer base class.

This is the base class for layers that build upon stochastic (variational) nets. These expect rank >= 3 input Tensors, where the first dimension indexes the random samples of the stochastic net.

__call__(X)

Construct the subgraph for this layer.

Parameters:X (Tensor) – the input to this layer
Returns:
  • Net (Tensor) – the output of this layer
  • KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
class aboleth.layers.SampleLayer3

Bases: aboleth.layers.SampleLayer

Special case of SampleLayer restricted to rank == 3 input Tensors.

__call__(X)

Construct the subgraph for this layer.

Parameters:X (Tensor) – the input to this layer
Returns:
  • Net (Tensor) – the output of this layer
  • KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.