1. Building a Neural Network

Building a Neural Network

The Network class enables the easy creation of a PyTorch neural network from .json files and without requiring knowledge of the output shape of each layer, and therefore, easy modification to inputs and/or layers. There are two main things required to make a PyTorch neural network from .json files:

Define the layers of the network in a .json file
Build the network object using the Network class

1. Constructing the `.json` Architecture

The file is structured as a dictionary containing two sub-dictionaries:

net: Global network parameters with the following options:
- checkpoints: bool = False, if checkpoints should be exclusively used, otherwise, the output from every layer will be cached, more user-friendly, but larger memory consumption
- layer_name: dict, default parameters for specific layer given by layer_name that overrides the default parameters of that type of layer, the layers types and corresponding parameters are found in section Layer Types
layers: list[dict] dictionaries containing information on each layer, where the first layer takes the input, and the last layer produces the output

Examples of the layers can be found under the section Layer Types and an example of creating the .json file can be found in the section Loading & Using the Network with more examples in the directory network_configs.

Layer Compatibilities

Linear layers take inputs of $(N,\ldots,L)$ where $N$ is the batch size and $L$ is the length of the input.
Recurrent layers require an input shape of $(N,C,L)$, where $C$ is the number of channels/sequence length.
Some layers, such as convolutional, require dimension $C$, but can take 1D, 2D, & 3D inputs, so the inputs would have shapes $(N,C,L)$, $(N,C,H,W)$, or $(N,C,D,H,W)$, respectively, where $D$ is the depth, $H$ is the height, $W$ is the width, and $L$ is the length for 1D data.

The reshape layer can be used to change the shape of the inputs for compatibility between the layers.

2. Loading & Using the Network

The following steps import the architecture into PyTorch:

Import Network from netloader.network.
Create a network object by calling Network with the arguments: name, config_dir, in_shape, & out_shape.
To use the network object, such as for training or evaluation, call the network with the argument x, and the network will return the forward pass.
Alternatively, use the network object with the network classes in section 2. Network Architectures and Training.

All shapes given to the network or layers should exclude the batch dimension $N$.
The network input, and therefore in_shape, can be list[list[int]] if the first layer in the network is an Unpack layer if multiple inputs are used at different points in the network.

The Network object can be safely loaded with torch.load('/path/to/network.pth', weights_only=True) if the Network object is added to torch.serialization.add_safe_globals([Network]) or by import netloader, which does this automatically.

`Network` Attributes

name: str, name of the network configuration file (without extension)
check_shapes: list[list[int]], checkpoint output shapes
shapes: list[list[int] | list[list[str]]], layer output shapes
checkpoints: list[Tensor], cloned values from the network's checkpoint layers
config: dict[str, Any], network configuration dictionary
net: ModuleList, network construction
layer_num: int | None = None, number of layers to use, if None use all layers
group: int = 0, which group is the active group if a layer has the group attribute
kl_loss: Tensor = 0, KL divergence loss on the latent space, if using a sample layer

Example decoder.json

{
  "net": {
    "checkpoints": false,
    "Linear": {
      "dropout": 0.1
    }
  },
  "layers": [
    {
      "type": "Linear",
      "features": 120
    },
    {
      "type": "Linear",
      "features": 120
    },
    {
      "type": "Linear",
      "factor": 1,
      "activation": false
    }
  ]
}

Example code

import torch

from netloader.network import Network

decoder = Network('decoder', '../network_configs/', [5], [240])

x = torch.rand((10, 5))
output = decoder(x)

Layer Types

layers have several options, each with its own parameters.

All layers can take the optional group parameter which means that that layer will only be active if the network attribute group is equal to the layer's group.
This is most useful if the head gets changed during training.
skip layers should be used between groups so that the expected input shape is correct.
See layer_examples.json under network_configs to see how to use groups and other layers.

Linear

Activation: Activation
- activation: str = 'ELU', which activation function to use from PyTorch
Linear: Linear/fully connected
- features: optional int, number of output features for the layer, if factor is provided, features will not be used
- layer: optional int, if factor is not None, which layer for factor to be relative to, if None, network output will be used
- factor: optional float, output features is equal to the factor of the network's output, or if layer is provided, which layer to be relative to, will be used if provided, else features will be used
- batch_norm: bool = False, if batch normalisation should be used
- flatten_target: bool = False, if the target should be flattened so that features is equal to the product of the target multiplied by factor, if factor is provided
- dropout: float = 0, probability of dropout
- activation: str | None = 'SELU', which activation function to use from PyTorch
OrderedBottleneck: Information-ordered bottleneck to randomly change the size of the bottleneck in an autoencoder to encode the most important information in the first values of the latent space
- min_size: int = 0, minimum gate size
Sample: Gets the mean and standard deviation of a Gaussian distribution from $C$ in the previous layer, halving $C$, and randomly samples from it, mainly for a variational autoencoder
Upsample: Linear interpolation to scale the layer input
- shape: list[int] = None, shape of the output, will be used if provided, else scale will be used
- scale: float | tuple[float, ...] = 2, factor to upscale all or individual dimensions, first dimension is ignored, won't be used if shape is provided
- mode: {'nearest', 'linear', 'bilinear', 'bicubic', 'trilinear'}, what interpolation method to use for upsampling

Convolutional

Conv: Convolution with padding using replicate
- filters: optional int, number of convolutional filters, will be used if provided, else factor will be used
- layer: optional int, if factor is not None, which layer for factor to be relative to, if None, network output will be used
- factor: optional float, number of convolutional filters equal to the output channels, or if layer is provided, the layer's channels, multiplied by factor, won't be used if filters is provided
- groups: int = 1, number of input channel groups, each with its own convolutional filter(s), input and output channels must both be divisible by the number of groups
- kernel: int | list[int] = 3, size of the kernel
- stride: int | list[int] = 1, stride of the kernel
- padding: int | str | list[int] = 0, input padding, can an integer, list of integers or same where same preserves the input shape
- dropout: float = 0, probability of dropout
- activation: str | None = 'ELU', which activation function to use from PyTorch
- norm: {None, 'batch', 'layer'}, if batch or layer normalisation should be used
ConvDepth: Depth-wise convolution
- filters: optional int, number of convolutional filters, will be used if provided, else factor will be used
- layer: optional int, if factor is not None, which layer for factor to be relative to, if None, network output will be used
- factor: optional float, number of convolutional filters equal to the output channels, or if layer is provided, the layer's channels, multiplied by factor, won't be used if filters is provided
- kernel: int | list[int] = 3, size of the kernel
- stride: int | list[int] = 1, stride of the kernel
- padding: int | str | list[int] = 0, input padding, can an integer, list of integers or same where same preserves the input shape
- dropout: float = 0, probability of dropout
- activation: str | None = 'ELU', which activation function to use from PyTorch
- norm: {None, 'batch', 'layer'}, if batch or layer normalisation should be used
ConvDepthDownscale: Reduces $C$ to one, uses kernel size of 1, same padding
- dropout: float = 0, probability of dropout
- activation: str | None = 'ELU', which activation function to use from PyTorch
- norm: {None, 'batch', 'layer'}, if batch or layer normalisation should be used
ConvDownscale: Downscales the layer input using strided convolution
- filters: optional int, number of convolutional filters, will be used if provided, else factor will be used
- layer: optional int, if factor is not None, which layer for factor to be relative to, if None, network output will be used
- factor: optional float, number of convolutional filters equal to the output channels, or if layer is provided, the layer's channels, multiplied by factor, won't be used if filters is provided
- scale: int = 2, stride and size of the kernel, which acts as the downscaling factor
- dropout: float = 0, probability of dropout
- activation: str | None = 'ELU', which activation function to use from PyTorch
- norm: {None, 'batch', 'layer'}, if batch or layer normalisation should be used
ConvTranspose: Transposed convolution, typically for input upscaling
- filters: optional int, number of convolutional filters, will be used if provided, else factor will be used
- layer: optional int, if factor is not None, which layer for factor to be relative to, if None, network output will be used
- factor: optional float, number of convolutional filters equal to the output channels, or if layer is provided, the layer's channels, multiplied by factor, won't be used if filters is provided
- kernel: int | list[int] = 3, size of the kernel
- stride: int | list[int] = 1, stride of the kernel
- out_padding: int | list[int] = 0, padding applied to the output
- dilation: int | list[int] = 1, spacing between kernel points
- padding: int | str | list[int] = 0, inverse of convolutional padding which removes rows from each dimension in the output
- dropout: float = 0, probability of dropout
- activation: str | None = 'ELU', which activation function to use from PyTorch
- norm: {None, 'batch', 'layer'}, if batch or layer normalisation should be used
ConvTransposeUpscale: Scales the layer input using fractional stride
- filters: optional int, number of convolutional filters, will be used if provided, else factor will be used
- layer: optional int, if factor is not None, which layer for factor to be relative to, if None, network output will be used
- factor: optional float, number of convolutional filters equal to the output channels, or if layer is provided, the layer's channels, multiplied by factor, won't be used if filters is provided
- scale: int | list[int] = 2, stride and size of the kernel, which acts as the upscaling factor
- out_padding: int | list[int] = 0, padding applied to the output
- dropout: float = 0, probability of dropout
- activation: str | None = 'ELU', which activation function to use from PyTorch
- norm: {None, 'batch', 'layer'}, if batch or layer normalisation should be used
ConvUpscale: Scales the layer input using convolution and pixel shuffle uses stride of 1, same padding and no dropout, uses ELU
- filters: optional int, number of convolutional filters, will be used if provided, else factor will be used
- layer: optional int, if factor is not None, which layer for factor to be relative to, if None, network output will be used
- factor: optional float, number of convolutional filters equal to the output channels, or if layer is provided, the layer's channels, multiplied by factor, won't be used if filters is provided
- scale: int = 2, factor to upscale the input by
- kernel: int | list[int] = 3, size of the kernel
- dropout: float = 0, probability of dropout
- activation: str | None = 'ELU', which activation function to use from PyTorch
- norm: {None, 'batch', 'layer'}, if batch or layer normalisation should be used
PixelShuffle: Equivalent to torch.nn.PixelShuffle, but for N-dimensions
- scale: int, upscaling factor

Pooling

AdaptivePool: Uses pooling to downscale the layer input to the desired shape
- shape: int | list[int], output shape of the layer
- channels: bool = True, if the input includes a channels dimension
- mode: {'average', 'max'}, whether to use 'max' or 'average' pooling
Pool: Performs pooling
- kernel: int | list[int] = 2, size of the kernel
- stride: int | list[int] = 2, stride of the kernel
- padding: int | str | list[int] = 0, input padding, can an integer or 'same' where 'same' preserves the input shape
- mode: {'max', 'average'}, whether to use 'max' or 'average' pooling
PoolDownscale: Downscales the input using pooling
- scale: int, stride and size of the kernel, which acts as the downscaling factor
- mode: {'max', 'average'}, whether to use 'max' or 'average' pooling

Recurrent

Recurrent: Recurrent layer
- batch_norm: bool = False, if batch normalisation should be used
- layers: int = 2, number of stacked recurrent layers
- filters: int = 1, number of output filters;
- dropout: float = 0, probability of dropout, requires layers > 1
- mode: {'gru', 'rnn', 'lstm'}, type of recurrent layer
- activation: str | None = 'ELU', which activation function to use from PyTorch
- bidirectional: {None, 'sum', 'mean', 'concatenate'}, if a bidirectional GRU should be used and the method for combining the two directions

Normalizing Flow

SplineFlow: Neural spline flow
- transforms: int, number of transforms
- hidden_features: list[int], number of features in each of the hidden layers
- context: bool = False, if the output from the previous layer should be used to condition the normalizing flow
- features: optional int, dimensions of the probability distribution, if factor is provided, features will not be used
- factor: optional float, output features is equal to the factor of the network's output, will be used if provided, else features will be used

Utility

Checkpoint: Saves the output from the previous layer for use in future layers
Concatenate: Concatenates the previous layer with a specified layer
- layer: int, layer index to concatenate the previous layer output with
- checkpoint: bool = False, if layer should be relative to checkpoints or network layers, if checkpoints in net is True, layer will always be relative to checkpoints
- dim : int = 0, dimension to concatenate to (not including $N$)
DropPath: Drop path to drop samples in a batch
- prob: float, probability of dropout
Index: Slices the output from the previous layer
- number: int, number of values to slice, can be negative
- greater: bool = True, if slice should be values greater or less than number
LayerNorm: Layer normalisation with priority from the first dimension after batch dimension
- dims: optional int, number of dimensions to normalise starting with the first dimension, ignoring batch dimension, won't be used if shape is provided
- shape: optional list[int], input shape or shape of the first dimension to normalise, will be used if provided, else dims will be used
Reshape: Reshapes the dimensions
- shape: list[int], desired shape of the output tensor, ignoring first dimension
- layer: optional int, if factor is True, which layer for factor to be relative to, if None, network output will be used
- factor: bool = False, if reshape should be relative to the network output shape, or if layer is provided, which layer to be relative to
Scale: Scales the output by a learnable tensor
- dims: int, number of dimensions to have individual scales for
- scale: float, initial scale factor
- first: bool = True, if dims should count from the first dimension after the batch dimension, or from the final dimension backwards
Shortcut: Adds the previous layer with the specified layer
- layer: int, layer index to add to the previous layer output with
- checkpoint: bool = False, if layer should be relative to checkpoints or network layers, if checkpoints in net is True, layer will always be relative to checkpoints
Skip: Passes the output from layer into the next layer
- layer: int, layer index to get the output from
- checkpoint: bool = False, if layer should be relative to checkpoints or network layers, if checkpoints in net is True, layer will always be relative to checkpoints
Unpack: Enables a list of Tensors as input into the network, then selects which Tensor in the list to output
- index: int, index of input Tensor list

Composite Layers

Custom blocks can be made from the layers above and inserted into the network. This is useful if making repetitive blocks such as the Inception block (Szegedy, et al. 2015).
The block is created in the same way a network is as a .json file.
In the network.json file, the block can be inserted by creating a composite layer with parameters:

Composite: Custom layer that combines multiple layers in a .json file for repetitive use
- name: str, name of the subnetwork
- config_dir: str, path to the directory with the network configuration file
- checkpoint: bool = True, if layer index should be relative to checkpoint layers
- channels: optional int, number of output channels, won't be used if shape is provided, if channels and shape aren't provided, the input dimensions will be preserved
- shape: optional list[int], output shape of the block, will be used if provided; otherwise, channels will be used
- defaults: optional dict[str, Any], default values for the parameters for each type of layer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

1. Building a Neural Network

Building a Neural Network

1. Constructing the `.json` Architecture

Layer Compatibilities

2. Loading & Using the Network

`Network` Attributes

Layer Types

Linear

Convolutional

Pooling

Recurrent

Normalizing Flow

Utility

Composite Layers

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

1. Building a Neural Network

Building a Neural Network

1. Constructing the .json Architecture

Layer Compatibilities

2. Loading & Using the Network

Network Attributes

Layer Types

Linear

Convolutional

Pooling

Recurrent

Normalizing Flow

Utility

Composite Layers

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

1. Constructing the `.json` Architecture

`Network` Attributes