
Network layers and utilities.

class aboleth.layers.Activation(h=<function Activation.<lambda>>)

Bases: aboleth.baselayers.Layer

Activation function layer.

Parameters:h (callable) – the element-wise activation function.

Construct the subgraph for this layer.

Parameters:X (Tensor) – the input to this layer
  • Net (Tensor) – the output of this layer
  • KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
class aboleth.layers.DenseMAP(output_dim, l1_reg=1.0, l2_reg=1.0, use_bias=True)

Bases: aboleth.layers.SampleLayer

Dense (fully connected) linear layer, with MAP inference.

This implements a linear layer, and when called returns

\[f(\mathbf{X}) = \mathbf{X} \mathbf{W} + \mathbf{b}\]

where \(\mathbf{X} \in \mathbb{R}^{N \times D_{in}}\), \(\mathbf{W} \in \mathbb{R}^{D_{in} \times D_{out}}\) and \(\mathbf{b} \in \mathbb{R}^{D_{out}}\). This layer uses maximum a-posteriori inference to learn the weights and biases, and so also returns complexity penalities (l1 or l2) for the weights and biases.

  • output_dim (int) – the dimension of the output of this layer
  • l1_reg (float) – the value of the l1 weight regularizer, \(\text{l1_reg} \times \|\mathbf{W}\|_1\)
  • l2_reg (float) – the value of the l2 weight regularizer, \(\frac{1}{2} \text{l2_reg} \times \|\mathbf{W}\|^2_2\)
  • use_bias (bool) – If true, also learn a bias weight, e.g. a constant offset weight.

Construct the subgraph for this layer.

Parameters:X (Tensor) – the input to this layer
  • Net (Tensor) – the output of this layer
  • KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
class aboleth.layers.DenseVariational(output_dim, std=1.0, full=False, use_bias=True, prior_W=None, prior_b=None, post_W=None, post_b=None)

Bases: aboleth.layers.SampleLayer3

A dense (fully connected) linear layer, with variational inference.

This implements a dense linear layer,

\[f(\mathbf{X}) = \mathbf{X} \mathbf{W} + \mathbf{b}\]

where prior, \(p(\cdot)\), and approximate posterior, \(q(\cdot)\) distributions are placed on the weights and also the biases. Here \(\mathbf{X} \in \mathbb{R}^{N \times D_{in}}\), \(\mathbf{W} \in \mathbb{R}^{D_{in} \times D_{out}}\) and \(\mathbf{b} \in \mathbb{R}^{D_{out}}\). By default, the same Normal prior is placed on each of the layer weights and biases,

\[w_{ij} \sim \mathcal{N}(0, \sigma^2), \quad b_{j} \sim \mathcal{N}(0, \sigma^2),\]

and a different Normal posterior is learned for each of the layer weights and biases,

\[w_{ij} \sim \mathcal{N}(m_{ij}, c_{ij}), \quad b_{j} \sim \mathcal{N}(l_{j}, o_{j}).\]

We also have the option of placing full-covariance Gaussian posteriors on the input dimension of the weights,

\[\mathbf{w}_{j} \sim \mathcal{N}(\mathbf{m}_{j}, \mathbf{C}_{j}),\]

where \(\mathbf{m}_j \in \mathbb{R}^{D_{in}}\) and \(\mathbf{C}_j \in \mathbb{R}^{D_{in} \times D_{in}}\).

This layer will use variational inference to learn all of the non-zero prior and posterior parameters.

Whenever this layer is called, it will return the result,

\[f^{(s)}(\mathbf{X}) = \mathbf{X} \mathbf{W}^{(s)} + \mathbf{b}^{(s)}\]

with samples from the posteriors, \(\mathbf{W}^{(s)} \sim q(\mathbf{W})\) and \(\mathbf{b}^{(s)} \sim q(\mathbf{b})\). The number of samples, s, can be controlled by using the n_samples argument in an InputLayer used to feed the first layer of a model, or by tiling \(\mathbf{X}\) on the first dimension. This layer also returns the result of \(\text{KL}[q\|p]\) for all parameters.

  • output_dim (int) – the dimension of the output of this layer
  • std (float) – the initial value of the weight prior standard deviation (\(\sigma\) above), this is optimized a la maximum likelihood type II.
  • full (bool) – If true, use a full covariance Gaussian posterior for each of the output weight columns, otherwise use an independent (diagonal) Normal posterior.
  • use_bias (bool) – If true, also learn a bias weight, e.g. a constant offset weight.
  • prior_W (tf.distributions.Distribution, optional) – This is the prior distribution object to use on the layer weights. It must have parameters compatible with (input_dim, output_dim) shaped weights. This ignores the std parameter.
  • prior_b (tf.distributions.Distribution, optional) – This is the prior distribution object to use on the layer intercept. It must have parameters compatible with (output_dim,) shaped weights. This ignores the std and use_bias parameters.
  • post_W (tf.distributions.Distribution, optional) – It must have parameters compatible with (input_dim, output_dim) shaped weights. This ignores the full parameter. See also distributions.gaus_posterior.
  • post_b (tf.distributions.Distributions, optional) – This is the posterior distribution object to use on the layer intercept. It must have parameters compatible with (output_dim,) shaped weights. This ignores the use_bias parameters. See also distributions.norm_posterior.

Construct the subgraph for this layer.

Parameters:X (Tensor) – the input to this layer
  • Net (Tensor) – the output of this layer
  • KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
class aboleth.layers.DropOut(keep_prob)

Bases: aboleth.baselayers.Layer

Dropout layer, Bernoulli probability of not setting an input to zero.

This is just a thin wrapper around tf.dropout

Parameters:keep_prob (float, Tensor) –

the probability of keeping an input. See tf.dropout.


Construct the subgraph for this layer.

Parameters:X (Tensor) – the input to this layer
  • Net (Tensor) – the output of this layer
  • KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
class aboleth.layers.EmbedVariational(output_dim, n_categories, std=1.0, full=False, prior_W=None, post_W=None)

Bases: aboleth.layers.DenseVariational

Dense (fully connected) embedding layer, with variational inference.

This layer works directly on shape (N, 1) inputs of K category indices rather than one-hot representations, for efficiency, and is a dense linear layer,

\[f(\mathbf{X}) = \mathbf{X} \mathbf{W},\]

where prior, \(p(\cdot)\), and approximate posterior, \(q(\cdot)\) distributions are placed on the weights. Here \(\mathbf{X} \in \mathbb{N}_2^{N \times K}\) and \(\mathbf{W} \in \mathbb{R}^{K \times D_{out}}\). Though in code we represent \(\mathbf{X}\) as a vector of indices in \(\mathbb{N}_K^{N \times 1}\). By default, the same Normal prior is placed on each of the layer weights,

\[w_{ij} \sim \mathcal{N}(0, \sigma^2),\]

and a different Normal posterior is learned for each of the layer weights,

\[w_{ij} \sim \mathcal{N}(m_{ij}, c_{ij}).\]

We also have the option of placing full-covariance Gaussian posteriors on the input dimension of the weights,

\[\mathbf{w}_{j} \sim \mathcal{N}(\mathbf{m}_{j}, \mathbf{C}_{j}),\]

where \(\mathbf{m}_j \in \mathbb{R}^{K}\) and \(\mathbf{C}_j \in \mathbb{R}^{K \times K}\).

This layer will use variational inference to learn all of the non-zero prior and posterior parameters.

Whenever this layer is called, it will return the result,

\[f^{(s)}(\mathbf{X}) = \mathbf{X} \mathbf{W}^{(s)}\]

with samples from the posterior, \(\mathbf{W}^{(s)} \sim q(\mathbf{W})\). The number of samples, s, can be controlled by using the n_samples argument in an InputLayer used to feed the first layer of a model, or by tiling \(\mathbf{X}\) on the first dimension. This layer also returns the result of \(\text{KL}[q\|p]\) for all parameters.

  • output_dim (int) – the dimension of the output (embedding) of this layer
  • n_categories (int) – the number of categories in the input variable
  • std (float) – the initial value of the weight prior standard deviation (\(\sigma\) above), this is optimized a la maximum likelihood type II.
  • full (bool) – If true, use a full covariance Gaussian posterior for each of the output weight columns, otherwise use an independent (diagonal) Normal posterior.
  • prior_W (tf.distributions.Distribution, optional) – This is the prior distribution object to use on the layer weights. It must have parameters compatible with (input_dim, output_dim) shaped weights. This ignores the std parameter.
  • post_W (tf.distributions.Distribution, optional) – This is the posterior distribution object to use on the layer weights. It must have parameters compatible with (input_dim, output_dim) shaped weights. This ignores the full parameter. See also distributions.gaus_posterior.

Construct the subgraph for this layer.

Parameters:X (Tensor) – the input to this layer
  • Net (Tensor) – the output of this layer
  • KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
class aboleth.layers.InputLayer(name, n_samples=None)

Bases: aboleth.baselayers.MultiLayer

Create an input layer.

This layer defines input kwargs so that a user may easily provide the right inputs to a complex set of layers. It takes a 2D tensor of shape (N, D). If n_samples is specified, the input is tiled along a new first axis creating a (n_samples, N, D) tensor for propogating samples through a variational deep net.

  • name (string) – The name of the input. Used as the agument for input into the net.
  • n_samples (int > 0) – The number of samples.

Construct the subgraph for this layer.

Parameters:**kwargs – the inputs to this layer (Tensors)
  • Net (Tensor) – the output of this layer
  • KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
class aboleth.layers.MaxPool2D(pool_size, strides, padding='SAME')

Bases: aboleth.baselayers.Layer

Max pooling layer for 2D inputs (e.g. images).

This is just a thin wrapper around tf.nn.max_pool

  • pool_size (tuple or list of 2 ints) – width and height of the pooling window.
  • strides (tuple or list of 2 ints) – the strides of the pooling operation along the height and width.
  • padding (str) – One of ‘SAME’ or ‘VALID’. Defaults to ‘SAME’. The type of padding

Construct the subgraph for this layer.

Parameters:X (Tensor) – the input to this layer
  • Net (Tensor) – the output of this layer
  • KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
class aboleth.layers.RandomArcCosine(n_features, lenscale=1.0, p=1, variational=False, lenscale_posterior=None)

Bases: aboleth.layers.RandomFourier

Random arc-cosine kernel layer.

NOTE: This should be followed by a dense layer to properly implement a
kernel approximation.
  • n_features (int) – the number of unique random features, the actual output dimension of this layer will be 2 * n_features.
  • lenscale (float, ndarray, Tensor) – the lenght scales of the ar-cosine kernel, this can be a scalar for an isotropic kernel, or a vector for an automatic relevance detection (ARD) kernel.
  • p (int) – The order of the arc-cosine kernel, this must be an integer greater than, or eual to zero. 0 will lead to sigmoid-like kernels, 1 will lead to relu-like kernels, 2 quadratic-relu kernels etc.
  • variational (bool) – use variational features instead of random features, (i.e. VAR-FIXED in [2]).
  • lenscale_posterior (float, ndarray, optional) – the initial value for the posterior length scale. This is only used if variational==True. This can be a scalar or vector (different initial value per input dimension). If this is left as None, it will be set to sqrt(1 / input_dim) (this is similar to the ‘auto’ setting for a scikit learn SVM with a RBF kernel).

See also

[1] Cho, Youngmin, and Lawrence K. Saul.
“Analysis and extension of arc-cosine kernels for large margin classification.” arXiv preprint arXiv:1112.3712 (2011).
[2] Cutajar, K. Bonilla, E. Michiardi, P. Filippone, M.
Random Feature Expansions for Deep Gaussian Processes. In ICML, 2017.

Construct the subgraph for this layer.

Parameters:X (Tensor) – the input to this layer
  • Net (Tensor) – the output of this layer
  • KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
class aboleth.layers.RandomFourier(n_features, kernel)

Bases: aboleth.layers.SampleLayer3

Random Fourier feature (RFF) kernel approximation layer.

NOTE: This should be followed by a dense layer to properly implement a
kernel approximation.
  • n_features (int) – the number of unique random features, the actual output dimension of this layer will be 2 * n_features.
  • kernel (kernels.ShiftInvariant) – the kernel object that yeilds the random samples from the fourier spectrum of a particular kernel to approximate. See the ab.kernels module.

Construct the subgraph for this layer.

Parameters:X (Tensor) – the input to this layer
  • Net (Tensor) – the output of this layer
  • KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
class aboleth.layers.Reshape(target_shape)

Bases: aboleth.baselayers.Layer

Reshape layer.

Reshape and output an tensor to a specified shape.

Parameters:targe_shape (tuple of ints) – Does not include the samples or batch axes.

Construct the subgraph for this layer.

Parameters:X (Tensor) – the input to this layer
  • Net (Tensor) – the output of this layer
  • KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
class aboleth.layers.SampleLayer

Bases: aboleth.baselayers.Layer

Sample Layer base class.

This is the base class for layers that build upon stochastic (variational) nets. These expect rank >= 3 input Tensors, where the first dimension indexes the random samples of the stochastic net.


Construct the subgraph for this layer.

Parameters:X (Tensor) – the input to this layer
  • Net (Tensor) – the output of this layer
  • KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
class aboleth.layers.SampleLayer3

Bases: aboleth.layers.SampleLayer

Special case of SampleLayer restricted to rank == 3 input Tensors.


Construct the subgraph for this layer.

Parameters:X (Tensor) – the input to this layer
  • Net (Tensor) – the output of this layer
  • KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.