# ab.layers¶

Network layers and utilities.

class aboleth.layers.Activation(h=<function Activation.<lambda>>)

Activation function layer.

Parameters: h (callable) – the element-wise activation function.
__call__(X)

Construct the subgraph for this layer.

Parameters: X (Tensor) – the input to this layer Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
class aboleth.layers.Conv2DMAP(filters, kernel_size, strides=(1, 1), padding='SAME', l1_reg=1.0, l2_reg=1.0, use_bias=True)

A 2D convolution layer, with maximum a posteriori (MAP) inference.

This layer uses maximum a-posteriori inference to learn the convolutional kernels and biases, and so also returns complexity penalities (l1 or l2) for the weights and biases.

Parameters: filters (int) – the dimension of the output of this layer (i.e. the number of filters in the convolution). kernel_size (int, tuple or list) – width and height of the 2D convolution window. Can be a single integer to specify the same value for all spatial dimensions. strides (int, tuple or list) – the strides of the convolution along the height and width. Can be a single integer to specify the same value for all spatial dimensions padding (str) – One of ‘SAME’ or ‘VALID’. Defaults to ‘SAME’. The type of padding algorithm to use. l1_reg (float) – the value of the l1 weight regularizer, $$\text{l1_reg} \times \|\mathbf{W}\|_1$$ l2_reg (float) – the value of the l2 weight regularizer, $$\frac{1}{2} \text{l2_reg} \times \|\mathbf{W}\|^2_2$$ use_bias (bool) – If true, also learn a bias weight, e.g. a constant offset weight.
__call__(X)

Construct the subgraph for this layer.

Parameters: X (Tensor) – the input to this layer Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
class aboleth.layers.Conv2DVariational(filters, kernel_size, strides=(1, 1), padding='SAME', std=1.0, use_bias=True, prior_W=None, prior_b=None, post_W=None, post_b=None)

A 2D convolution layer, with variational inference.

(Does not currently support full covariance weights.)

Parameters: filters (int) – the dimension of the output of this layer (i.e. the number of filters in the convolution). kernel_size (int, tuple or list) – width and height of the 2D convolution window. Can be a single integer to specify the same value for all spatial dimensions. strides (int, tuple or list) – the strides of the convolution along the height and width. Can be a single integer to specify the same value for all spatial dimensions padding (str) – One of ‘SAME’ or ‘VALID’. Defaults to ‘SAME’. The type of padding algorithm to use. std (float) – the initial value of the weight prior standard deviation ($$\sigma$$ above), this is optimized a la maximum likelihood type II. use_bias (bool) – If true, also learn a bias weight, e.g. a constant offset weight. prior_W (tf.distributions.Distribution, optional) – This is the prior distribution object to use on the layer weights. It must have parameters compatible with (input_dim, output_dim) shaped weights. This ignores the std parameter. prior_b (tf.distributions.Distribution, optional) – This is the prior distribution object to use on the layer intercept. It must have parameters compatible with (output_dim,) shaped weights. This ignores the std and use_bias parameters. post_W (tf.distributions.Distribution, optional) – It must have parameters compatible with (input_dim, output_dim) shaped weights. This ignores the full parameter. See also distributions.gaus_posterior. post_b (tf.distributions.Distributions, optional) – This is the posterior distribution object to use on the layer intercept. It must have parameters compatible with (output_dim,) shaped weights. This ignores the use_bias parameters. See also distributions.norm_posterior.
__call__(X)

Construct the subgraph for this layer.

Parameters: X (Tensor) – the input to this layer Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
class aboleth.layers.DenseMAP(output_dim, l1_reg=1.0, l2_reg=1.0, use_bias=True)

Dense (fully connected) linear layer, with MAP inference.

This implements a linear layer, and when called returns

$f(\mathbf{X}) = \mathbf{X} \mathbf{W} + \mathbf{b}$

where $$\mathbf{X} \in \mathbb{R}^{N \times D_{in}}$$, $$\mathbf{W} \in \mathbb{R}^{D_{in} \times D_{out}}$$ and $$\mathbf{b} \in \mathbb{R}^{D_{out}}$$. This layer uses maximum a-posteriori inference to learn the weights and biases, and so also returns complexity penalities (l1 or l2) for the weights and biases.

Parameters: output_dim (int) – the dimension of the output of this layer l1_reg (float) – the value of the l1 weight regularizer, $$\text{l1_reg} \times \|\mathbf{W}\|_1$$ l2_reg (float) – the value of the l2 weight regularizer, $$\frac{1}{2} \text{l2_reg} \times \|\mathbf{W}\|^2_2$$ use_bias (bool) – If true, also learn a bias weight, e.g. a constant offset weight.
__call__(X)

Construct the subgraph for this layer.

Parameters: X (Tensor) – the input to this layer Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
class aboleth.layers.DenseVariational(output_dim, std=1.0, full=False, use_bias=True, prior_W=None, prior_b=None, post_W=None, post_b=None)

A dense (fully connected) linear layer, with variational inference.

This implements a dense linear layer,

$f(\mathbf{X}) = \mathbf{X} \mathbf{W} + \mathbf{b}$

where prior, $$p(\cdot)$$, and approximate posterior, $$q(\cdot)$$ distributions are placed on the weights and also the biases. Here $$\mathbf{X} \in \mathbb{R}^{N \times D_{in}}$$, $$\mathbf{W} \in \mathbb{R}^{D_{in} \times D_{out}}$$ and $$\mathbf{b} \in \mathbb{R}^{D_{out}}$$. By default, the same Normal prior is placed on each of the layer weights and biases,

$w_{ij} \sim \mathcal{N}(0, \sigma^2), \quad b_{j} \sim \mathcal{N}(0, \sigma^2),$

and a different Normal posterior is learned for each of the layer weights and biases,

$w_{ij} \sim \mathcal{N}(m_{ij}, c_{ij}), \quad b_{j} \sim \mathcal{N}(l_{j}, o_{j}).$

We also have the option of placing full-covariance Gaussian posteriors on the input dimension of the weights,

$\mathbf{w}_{j} \sim \mathcal{N}(\mathbf{m}_{j}, \mathbf{C}_{j}),$

where $$\mathbf{m}_j \in \mathbb{R}^{D_{in}}$$ and $$\mathbf{C}_j \in \mathbb{R}^{D_{in} \times D_{in}}$$.

This layer will use variational inference to learn all of the non-zero prior and posterior parameters.

Whenever this layer is called, it will return the result,

$f^{(s)}(\mathbf{X}) = \mathbf{X} \mathbf{W}^{(s)} + \mathbf{b}^{(s)}$

with samples from the posteriors, $$\mathbf{W}^{(s)} \sim q(\mathbf{W})$$ and $$\mathbf{b}^{(s)} \sim q(\mathbf{b})$$. The number of samples, s, can be controlled by using the n_samples argument in an InputLayer used to feed the first layer of a model, or by tiling $$\mathbf{X}$$ on the first dimension. This layer also returns the result of $$\text{KL}[q\|p]$$ for all parameters.

Parameters: output_dim (int) – the dimension of the output of this layer std (float) – the initial value of the weight prior standard deviation ($$\sigma$$ above), this is optimized a la maximum likelihood type II. full (bool) – If true, use a full covariance Gaussian posterior for each of the output weight columns, otherwise use an independent (diagonal) Normal posterior. use_bias (bool) – If true, also learn a bias weight, e.g. a constant offset weight. prior_W (tf.distributions.Distribution, optional) – This is the prior distribution object to use on the layer weights. It must have parameters compatible with (input_dim, output_dim) shaped weights. This ignores the std parameter. prior_b (tf.distributions.Distribution, optional) – This is the prior distribution object to use on the layer intercept. It must have parameters compatible with (output_dim,) shaped weights. This ignores the std and use_bias parameters. post_W (tf.distributions.Distribution, optional) – It must have parameters compatible with (input_dim, output_dim) shaped weights. This ignores the full parameter. See also distributions.gaus_posterior. post_b (tf.distributions.Distributions, optional) – This is the posterior distribution object to use on the layer intercept. It must have parameters compatible with (output_dim,) shaped weights. This ignores the use_bias parameters. See also distributions.norm_posterior.
__call__(X)

Construct the subgraph for this layer.

Parameters: X (Tensor) – the input to this layer Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
class aboleth.layers.DropOut(keep_prob, observation_axis=1)

Dropout layer, Bernoulli probability of not setting an input to zero.

This is just a thin wrapper around tf.dropout

Parameters: keep_prob (float, Tensor) – the probability of keeping an input. See tf.dropout. observation_axis (int) – The axis that indexes the observations (N). This will assume the obserations are on the second axis, i.e. (n_samples, N, ...). This is so we can repeat the dropout pattern over observations, which has the effect of dropping out weights consistently, thereby sampling the “latent function” of the layer.
__call__(X)

Construct the subgraph for this layer.

Parameters: X (Tensor) – the input to this layer Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
class aboleth.layers.EmbedMAP(output_dim, n_categories, l1_reg=1.0, l2_reg=1.0)

Dense (fully connected) embedding layer, with MAP inference.

This layer works directly inputs of K category indices rather than one-hot representations, for efficiency. Each column of the input is embedded seperately, and the result concatenated along the last axis. It is a dense linear layer,

$f(\mathbf{X}) = \mathbf{X} \mathbf{W}$

Here $$\mathbf{X} \in \mathbb{N}_2^{N \times K}$$ and $$\mathbf{W} \in \mathbb{R}^{K \times D_{out}}$$. Though in code we represent $$\mathbf{X}$$ as a vector of indices in $$\mathbb{N}_K^{N \times 1}$$. This layer uses maximum a-posteriori inference to learn the weights and so also returns complexity penalities (l1 or l2) for the weights.

Parameters: output_dim (int) – the dimension of the output (embedding) of this layer n_categories (int) – the number of categories in the input variable l1_reg (float) – the value of the l1 weight regularizer, $$\text{l1_reg} \times \|\mathbf{W}\|_1$$ l2_reg (float) – the value of the l2 weight regularizer, $$\frac{1}{2} \text{l2_reg} \times \|\mathbf{W}\|^2_2$$
__call__(X)

Construct the subgraph for this layer.

Parameters: X (Tensor) – the input to this layer Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
class aboleth.layers.EmbedVariational(output_dim, n_categories, std=1.0, full=False, prior_W=None, post_W=None)

Dense (fully connected) embedding layer, with variational inference.

This layer works directly inputs of K category indices rather than one-hot representations, for efficiency. Each column of the input is embedded seperately, and the result concatenated along the last axis. It is a dense linear layer,

$f(\mathbf{X}) = \mathbf{X} \mathbf{W},$

where prior, $$p(\cdot)$$, and approximate posterior, $$q(\cdot)$$ distributions are placed on the weights. Here $$\mathbf{X} \in \mathbb{N}_2^{N \times K}$$ and $$\mathbf{W} \in \mathbb{R}^{K \times D_{out}}$$. Though in code we represent $$\mathbf{X}$$ as a vector of indices in $$\mathbb{N}_K^{N \times 1}$$. By default, the same Normal prior is placed on each of the layer weights,

$w_{ij} \sim \mathcal{N}(0, \sigma^2),$

and a different Normal posterior is learned for each of the layer weights,

$w_{ij} \sim \mathcal{N}(m_{ij}, c_{ij}).$

We also have the option of placing full-covariance Gaussian posteriors on the input dimension of the weights,

$\mathbf{w}_{j} \sim \mathcal{N}(\mathbf{m}_{j}, \mathbf{C}_{j}),$

where $$\mathbf{m}_j \in \mathbb{R}^{K}$$ and $$\mathbf{C}_j \in \mathbb{R}^{K \times K}$$.

This layer will use variational inference to learn all of the non-zero prior and posterior parameters.

Whenever this layer is called, it will return the result,

$f^{(s)}(\mathbf{X}) = \mathbf{X} \mathbf{W}^{(s)}$

with samples from the posterior, $$\mathbf{W}^{(s)} \sim q(\mathbf{W})$$. The number of samples, s, can be controlled by using the n_samples argument in an InputLayer used to feed the first layer of a model, or by tiling $$\mathbf{X}$$ on the first dimension. This layer also returns the result of $$\text{KL}[q\|p]$$ for all parameters.

Parameters: output_dim (int) – the dimension of the output (embedding) of this layer n_categories (int) – the number of categories in the input variable std (float) – the initial value of the weight prior standard deviation ($$\sigma$$ above), this is optimized a la maximum likelihood type II. full (bool) – If true, use a full covariance Gaussian posterior for each of the output weight columns, otherwise use an independent (diagonal) Normal posterior. prior_W (tf.distributions.Distribution, optional) – This is the prior distribution object to use on the layer weights. It must have parameters compatible with (input_dim, output_dim) shaped weights. This ignores the std parameter. post_W (tf.distributions.Distribution, optional) – This is the posterior distribution object to use on the layer weights. It must have parameters compatible with (input_dim, output_dim) shaped weights. This ignores the full parameter. See also distributions.gaus_posterior.
__call__(X)

Construct the subgraph for this layer.

Parameters: X (Tensor) – the input to this layer Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
class aboleth.layers.InputLayer(name, n_samples=1)

Create an input layer.

This layer defines input kwargs so that a user may easily provide the right inputs to a complex set of layers. It takes a tensor of shape (N, ...). The input is tiled along a new first axis creating a (n_samples, N, ...) tensor for propagating samples through a variational deep net.

Parameters: name (string) – The name of the input. Used as the argument for input into the net. n_samples (int, Tensor) – The number of samples to propagate through the network. We recommend making this a tf.placeholder so you can vary it as required.

Note

We recommend making n_samples a tf.placeholder so it can be varied between training and prediction!

__call__(**kwargs)

Construct the subgraph for this layer.

Parameters: **kwargs – the inputs to this layer (Tensors) Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
class aboleth.layers.MaxPool2D(pool_size, strides, padding='SAME')

Max pooling layer for 2D inputs (e.g. images).

This is just a thin wrapper around tf.nn.max_pool

Parameters: pool_size (tuple or list of 2 ints) – width and height of the pooling window. strides (tuple or list of 2 ints) – the strides of the pooling operation along the height and width. padding (str) – One of ‘SAME’ or ‘VALID’. Defaults to ‘SAME’. The type of padding
__call__(X)

Construct the subgraph for this layer.

Parameters: X (Tensor) – the input to this layer Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
class aboleth.layers.RandomArcCosine(n_features, lenscale=1.0, p=1, variational=False, lenscale_posterior=None)

Random arc-cosine kernel layer.

Parameters: n_features (int) – the number of unique random features, the actual output dimension of this layer will be 2 * n_features. lenscale (float, ndarray, Tensor) – the lenght scales of the ar-cosine kernel, this can be a scalar for an isotropic kernel, or a vector for an automatic relevance detection (ARD) kernel. p (int) – The order of the arc-cosine kernel, this must be an integer greater than, or eual to zero. 0 will lead to sigmoid-like kernels, 1 will lead to relu-like kernels, 2 quadratic-relu kernels etc. variational (bool) – use variational features instead of random features, (i.e. VAR-FIXED in [2]). lenscale_posterior (float, ndarray, optional) – the initial value for the posterior length scale. This is only used if variational==True. This can be a scalar or vector (different initial value per input dimension). If this is left as None, it will be set to sqrt(1 / input_dim) (this is similar to the ‘auto’ setting for a scikit learn SVM with a RBF kernel).

Note

This should be followed by a dense layer to properly implement a kernel approximation.

[1] Cho, Youngmin, and Lawrence K. Saul.
“Analysis and extension of arc-cosine kernels for large margin classification.” arXiv preprint arXiv:1112.3712 (2011).
[2] Cutajar, K. Bonilla, E. Michiardi, P. Filippone, M.
Random Feature Expansions for Deep Gaussian Processes. In ICML, 2017.
__call__(X)

Construct the subgraph for this layer.

Parameters: X (Tensor) – the input to this layer Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
class aboleth.layers.RandomFourier(n_features, kernel)

Random Fourier feature (RFF) kernel approximation layer.

Parameters: n_features (int) – the number of unique random features, the actual output dimension of this layer will be 2 * n_features. kernel (kernels.ShiftInvariant) – the kernel object that yeilds the random samples from the fourier spectrum of a particular kernel to approximate. See the ab.kernels module.

Note

This should be followed by a dense layer to properly implement a kernel approximation.

__call__(X)

Construct the subgraph for this layer.

Parameters: X (Tensor) – the input to this layer Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
class aboleth.layers.Reshape(target_shape)

Reshape layer.

Reshape and output an tensor to a specified shape.

Parameters: targe_shape (tuple of ints) – Does not include the samples or batch axes.
__call__(X)

Construct the subgraph for this layer.

Parameters: X (Tensor) – the input to this layer Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
class aboleth.layers.SampleLayer

Sample Layer base class.

This is the base class for layers that build upon stochastic (variational) nets. These expect rank >= 3 input Tensors, where the first dimension indexes the random samples of the stochastic net.

__call__(X)

Construct the subgraph for this layer.

Parameters: X (Tensor) – the input to this layer Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
class aboleth.layers.SampleLayer3

Special case of SampleLayer restricted to rank == 3 input Tensors.

__call__(X)

Construct the subgraph for this layer.

Parameters: X (Tensor) – the input to this layer Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.