ab.impute

Layers that impute missing data.

class aboleth.impute.ExtraCategoryImpute(datalayer, masklayer, ncategory_list)

Bases: aboleth.impute.ImputeColumnWise

Impute missing values from categorical data with an extra category.

Given categorical data, a missing mask and a number of categories for each feature (last dimension), this will assign missing values as an extra category equal to the number of categories. e.g. for 2 categories (0 and 1) missing data will be assigned 2.

Parameters:
  • datalayer (callable) – A layer that returns a data tensor. Must be an InputLayer.
  • masklayer (callable) – A layer that returns a boolean mask tensor where True values are masked. Must be an InputLayer.
  • ncategory_list (list) – A list that provides the total number of categories for each feature (last dimension) of the input. Length of the list must be equal to the size of the last dimension of X.
__call__(**kwargs)

Construct the subgraph for this layer.

Parameters:**kwargs – the inputs to this layer (Tensors)
Returns:
  • Net (Tensor) – the output of this layer
  • KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
class aboleth.impute.ImputeColumnWise(datalayer, masklayer)

Bases: aboleth.impute.ImputeOp3

Abstract class for imputing column-wise from a vector or scalar.

This implements _impute2D and this calls the _impute_columns method that returns a vector or scalar to impute X column-wise (as opposed to element-wise). You need to supply the _impute_columns method.

__call__(**kwargs)

Construct the subgraph for this layer.

Parameters:**kwargs – the inputs to this layer (Tensors)
Returns:
  • Net (Tensor) – the output of this layer
  • KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
class aboleth.impute.ImputeOp3(datalayer, masklayer)

Bases: aboleth.baselayers.MultiLayer

Abstract Base Impute operation for rank 3 Tensors (samples, N, D).

These specialise MultiLayers and they expect a data InputLayer and a mask InputLayer. They return layers in which the masked values have been imputed.

Parameters:
  • datalayer (callable) – A layer that returns a data tensor. Must be of form f(**kwargs).
  • masklayer (callable) – A layer that returns a boolean mask tensor where True values are masked. Must be of form f(**kwargs).
__call__(**kwargs)

Construct the subgraph for this layer.

Parameters:**kwargs – the inputs to this layer (Tensors)
Returns:
  • Net (Tensor) – the output of this layer
  • KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
class aboleth.impute.MaskInputLayer(name)

Bases: aboleth.baselayers.MultiLayer

Create an input layer for a binary mask tensor.

This layer defines input kwargs so that a user may easily provide the right binary mask inputs to a complex set of layers to enable imputation.

Parameters:name (string) – The name of the input. Used as the agument for input into the net.
__call__(**kwargs)

Construct the subgraph for this layer.

Parameters:**kwargs – the inputs to this layer (Tensors)
Returns:
  • Net (Tensor) – the output of this layer
  • KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
class aboleth.impute.MeanImpute(datalayer, masklayer)

Bases: aboleth.impute.ImputeColumnWise

Impute the missing values using the stochastic mean of their column.

Takes two layers, one the returns a data tensor and the other returns a mask layer. Returns a layer that returns a tensor in which the masked values have been imputed as the column means calculated from the batch.

Parameters:
  • datalayer (callable) – A layer that returns a data tensor. Must be of form f(**kwargs).
  • masklayer (callable) – A layer that returns a boolean mask tensor where True values are masked. Must be of form f(**kwargs).
__call__(**kwargs)

Construct the subgraph for this layer.

Parameters:**kwargs – the inputs to this layer (Tensors)
Returns:
  • Net (Tensor) – the output of this layer
  • KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
class aboleth.impute.NormalImpute(datalayer, masklayer, loc, scale)

Bases: aboleth.impute.ImputeColumnWise

Impute the missing values using marginal Gaussians over each column.

Takes two layers, one the returns a data tensor and the other returns a mask layer. Creates a layer that returns a tensor in which the masked values have been imputed as random draws from the marginal Gaussians.

Parameters:
  • datalayer (callable) – A layer that returns a data tensor. Must be of form f(**kwargs).
  • masklayer (callable) – A layer that returns a boolean mask tensor where True values are masked. Must be of form f(**kwargs).
  • loc (float, array-like, tf.Variable) – A list of the global mean values of each data column
  • scale (float, array-like, tf.Variable) – A list of the global standard deviation of each data column

Note

loc and scale can be tf.Variable if you wish to learn these statisics from the data.

__call__(**kwargs)

Construct the subgraph for this layer.

Parameters:**kwargs – the inputs to this layer (Tensors)
Returns:
  • Net (Tensor) – the output of this layer
  • KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
class aboleth.impute.ScalarImpute(datalayer, masklayer, scalars)

Bases: aboleth.impute.ImputeColumnWise

Impute the missing values using a scalar for each column.

Takes two layers, one the returns a data tensor and the other returns a mask layer. Creates a layer that returns a tensor in which the masked values have been imputed with a provided scalar value per colum.

Parameters:
  • datalayer (callable) – A layer that returns a data tensor. Must be an InputLayer.
  • masklayer (callable) – A layer that returns a boolean mask tensor where True values are masked. Must be an InputLayer.
  • scalars (float, array-like, tf.Variable) – A scalar or an array of the values with which to impute each data column. This can be learned if it is a tf.Variable.
__call__(**kwargs)

Construct the subgraph for this layer.

Parameters:**kwargs – the inputs to this layer (Tensors)
Returns:
  • Net (Tensor) – the output of this layer
  • KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.