Skip to content

1. Building a Neural Network

EthanTreg edited this page Jun 12, 2025 · 19 revisions

Building a Neural Network

The Network class enables the easy creation of a PyTorch neural network from .json files and without requiring knowledge of the output shape of each layer, and therefore, easy modification to inputs and/or layers. There are two main things required to make a PyTorch neural network from .json files:

  1. Define the layers of the network in a .json file
  2. Build the network object using the Network class

1. Constructing the .json Architecture

The file is structured as a dictionary containing two sub-dictionaries:

  • net: Global network parameters with the following options:
    • checkpoints: bool = False, if checkpoints should be exclusively used, otherwise, the output from every layer will be cached, more user-friendly, but larger memory consumption
    • layer_name: dict, default parameters for specific layer given by layer_name that overrides the default parameters of that type of layer, the layers types and corresponding parameters are found in section Layer Types
  • layers: list[dict] dictionaries containing information on each layer, where the first layer takes the input, and the last layer produces the output

Examples of the layers can be found under the section Layer Types and an example of creating the .json file can be found in the section Loading & Using the Network with more examples in the directory network_configs.

Layer Compatibilities

Linear layers take inputs of $(N,\ldots,L)$ where $N$ is the batch size and $L$ is the length of the input.
Recurrent layers require an input shape of $(N,C,L)$, where $C$ is the number of channels/sequence length.
Some layers, such as convolutional, require dimension $C$, but can take 1D, 2D, & 3D inputs, so the inputs would have shapes $(N,C,L)$, $(N,C,H,W)$, or $(N,C,D,H,W)$, respectively, where $D$ is the depth, $H$ is the height, $W$ is the width, and $L$ is the length for 1D data.

The reshape layer can be used to change the shape of the inputs for compatibility between the layers.

2. Loading & Using the Network

The following steps import the architecture into PyTorch:

  1. Import Network from netloader.network.
  2. Create a network object by calling Network with the arguments: name, config_dir, in_shape, & out_shape.
  3. To use the network object, such as for training or evaluation, call the network with the argument x, and the network will return the forward pass.
  4. Alternatively, use the network object with the network classes in section 2. Network Architectures and Training.

All shapes given to the network or layers should exclude the batch dimension $N$.
The network input, and therefore in_shape, can be list[list[int]] if the first layer in the network is an Unpack layer if multiple inputs are used at different points in the network.

The Network object can be safely loaded with torch.load('/path/to/network.pth', weights_only=True) if the Network object is added to torch.serialization.add_safe_globals([Network]) or by import netloader, which does this automatically.

Network Attributes

  • name: str, name of the network configuration file (without extension)
  • check_shapes: list[list[int]], checkpoint output shapes
  • shapes: list[list[int] | list[list[str]]], layer output shapes
  • checkpoints: list[Tensor], cloned values from the network's checkpoint layers
  • config: dict[str, Any], network configuration dictionary
  • net: ModuleList, network construction
  • layer_num: int | None = None, number of layers to use, if None use all layers
  • group: int = 0, which group is the active group if a layer has the group attribute
  • kl_loss: Tensor = 0, KL divergence loss on the latent space, if using a sample layer

Example decoder.json

{
  "net": {
    "checkpoints": false,
    "Linear": {
      "dropout": 0.1
    }
  },
  "layers": [
    {
      "type": "Linear",
      "features": 120
    },
    {
      "type": "Linear",
      "features": 120
    },
    {
      "type": "Linear",
      "factor": 1,
      "activation": false
    }
  ]
}

Example code

import torch

from netloader.network import Network

decoder = Network('decoder', '../network_configs/', [5], [240])

x = torch.rand((10, 5))
output = decoder(x)

Layer Types

layers have several options, each with its own parameters.

All layers can take the optional group parameter which means that that layer will only be active if the network attribute group is equal to the layer's group.
This is most useful if the head gets changed during training.
skip layers should be used between groups so that the expected input shape is correct.
See layer_examples.json under network_configs to see how to use groups and other layers.

Linear

  • Activation: Activation
    • activation: str = 'ELU', which activation function to use from PyTorch
  • Linear: Linear/fully connected
    • features: optional int, number of output features for the layer, if factor is provided, features will not be used
    • layer: optional int, if factor is not None, which layer for factor to be relative to, if None, network output will be used
    • factor: optional float, output features is equal to the factor of the network's output, or if layer is provided, which layer to be relative to, will be used if provided, else features will be used
    • batch_norm: bool = False, if batch normalisation should be used
    • flatten_target: bool = False, if the target should be flattened so that features is equal to the product of the target multiplied by factor, if factor is provided
    • dropout: float = 0, probability of dropout
    • activation: str | None = 'SELU', which activation function to use from PyTorch
  • OrderedBottleneck: Information-ordered bottleneck to randomly change the size of the bottleneck in an autoencoder to encode the most important information in the first values of the latent space
    • min_size: int = 0, minimum gate size
  • Sample: Gets the mean and standard deviation of a Gaussian distribution from $C$ in the previous layer, halving $C$, and randomly samples from it, mainly for a variational autoencoder
  • Upsample: Linear interpolation to scale the layer input
    • shape: list[int] = None, shape of the output, will be used if provided, else scale will be used
    • scale: float | tuple[float, ...] = 2, factor to upscale all or individual dimensions, first dimension is ignored, won't be used if shape is provided
    • mode: {'nearest', 'linear', 'bilinear', 'bicubic', 'trilinear'}, what interpolation method to use for upsampling

Convolutional

  • Conv: Convolution with padding using replicate
    • filters: optional int, number of convolutional filters, will be used if provided, else factor will be used
    • layer: optional int, if factor is not None, which layer for factor to be relative to, if None, network output will be used
    • factor: optional float, number of convolutional filters equal to the output channels, or if layer is provided, the layer's channels, multiplied by factor, won't be used if filters is provided
    • groups: int = 1, number of input channel groups, each with its own convolutional filter(s), input and output channels must both be divisible by the number of groups
    • kernel: int | list[int] = 3, size of the kernel
    • stride: int | list[int] = 1, stride of the kernel
    • padding: int | str | list[int] = 0, input padding, can an integer, list of integers or same where same preserves the input shape
    • dropout: float = 0, probability of dropout
    • activation: str | None = 'ELU', which activation function to use from PyTorch
    • norm: {None, 'batch', 'layer'}, if batch or layer normalisation should be used
  • ConvDepth: Depth-wise convolution
    • filters: optional int, number of convolutional filters, will be used if provided, else factor will be used
    • layer: optional int, if factor is not None, which layer for factor to be relative to, if None, network output will be used
    • factor: optional float, number of convolutional filters equal to the output channels, or if layer is provided, the layer's channels, multiplied by factor, won't be used if filters is provided
    • kernel: int | list[int] = 3, size of the kernel
    • stride: int | list[int] = 1, stride of the kernel
    • padding: int | str | list[int] = 0, input padding, can an integer, list of integers or same where same preserves the input shape
    • dropout: float = 0, probability of dropout
    • activation: str | None = 'ELU', which activation function to use from PyTorch
    • norm: {None, 'batch', 'layer'}, if batch or layer normalisation should be used
  • ConvDepthDownscale: Reduces $C$ to one, uses kernel size of 1, same padding
    • dropout: float = 0, probability of dropout
    • activation: str | None = 'ELU', which activation function to use from PyTorch
    • norm: {None, 'batch', 'layer'}, if batch or layer normalisation should be used
  • ConvDownscale: Downscales the layer input using strided convolution
    • filters: optional int, number of convolutional filters, will be used if provided, else factor will be used
    • layer: optional int, if factor is not None, which layer for factor to be relative to, if None, network output will be used
    • factor: optional float, number of convolutional filters equal to the output channels, or if layer is provided, the layer's channels, multiplied by factor, won't be used if filters is provided
    • scale: int = 2, stride and size of the kernel, which acts as the downscaling factor
    • dropout: float = 0, probability of dropout
    • activation: str | None = 'ELU', which activation function to use from PyTorch
    • norm: {None, 'batch', 'layer'}, if batch or layer normalisation should be used
  • ConvTranspose: Transposed convolution, typically for input upscaling
    • filters: optional int, number of convolutional filters, will be used if provided, else factor will be used
    • layer: optional int, if factor is not None, which layer for factor to be relative to, if None, network output will be used
    • factor: optional float, number of convolutional filters equal to the output channels, or if layer is provided, the layer's channels, multiplied by factor, won't be used if filters is provided
    • kernel: int | list[int] = 3, size of the kernel
    • stride: int | list[int] = 1, stride of the kernel
    • out_padding: int | list[int] = 0, padding applied to the output
    • dilation: int | list[int] = 1, spacing between kernel points
    • padding: int | str | list[int] = 0, inverse of convolutional padding which removes rows from each dimension in the output
    • dropout: float = 0, probability of dropout
    • activation: str | None = 'ELU', which activation function to use from PyTorch
    • norm: {None, 'batch', 'layer'}, if batch or layer normalisation should be used
  • ConvTransposeUpscale: Scales the layer input using fractional stride
    • filters: optional int, number of convolutional filters, will be used if provided, else factor will be used
    • layer: optional int, if factor is not None, which layer for factor to be relative to, if None, network output will be used
    • factor: optional float, number of convolutional filters equal to the output channels, or if layer is provided, the layer's channels, multiplied by factor, won't be used if filters is provided
    • scale: int | list[int] = 2, stride and size of the kernel, which acts as the upscaling factor
    • out_padding: int | list[int] = 0, padding applied to the output
    • dropout: float = 0, probability of dropout
    • activation: str | None = 'ELU', which activation function to use from PyTorch
    • norm: {None, 'batch', 'layer'}, if batch or layer normalisation should be used
  • ConvUpscale: Scales the layer input using convolution and pixel shuffle uses stride of 1, same padding and no dropout, uses ELU
    • filters: optional int, number of convolutional filters, will be used if provided, else factor will be used
    • layer: optional int, if factor is not None, which layer for factor to be relative to, if None, network output will be used
    • factor: optional float, number of convolutional filters equal to the output channels, or if layer is provided, the layer's channels, multiplied by factor, won't be used if filters is provided
    • scale: int = 2, factor to upscale the input by
    • kernel: int | list[int] = 3, size of the kernel
    • dropout: float = 0, probability of dropout
    • activation: str | None = 'ELU', which activation function to use from PyTorch
    • norm: {None, 'batch', 'layer'}, if batch or layer normalisation should be used
  • PixelShuffle: Equivalent to torch.nn.PixelShuffle, but for N-dimensions
    • scale: int, upscaling factor

Pooling

  • AdaptivePool: Uses pooling to downscale the layer input to the desired shape
    • shape: int | list[int], output shape of the layer
    • channels: bool = True, if the input includes a channels dimension
    • mode: {'average', 'max'}, whether to use 'max' or 'average' pooling
  • Pool: Performs pooling
    • kernel: int | list[int] = 2, size of the kernel
    • stride: int | list[int] = 2, stride of the kernel
    • padding: int | str | list[int] = 0, input padding, can an integer or 'same' where 'same' preserves the input shape
    • mode: {'max', 'average'}, whether to use 'max' or 'average' pooling
  • PoolDownscale: Downscales the input using pooling
    • scale: int, stride and size of the kernel, which acts as the downscaling factor
    • mode: {'max', 'average'}, whether to use 'max' or 'average' pooling

Recurrent

  • Recurrent: Recurrent layer
    • batch_norm: bool = False, if batch normalisation should be used
    • layers: int = 2, number of stacked recurrent layers
    • filters: int = 1, number of output filters;
    • dropout: float = 0, probability of dropout, requires layers > 1
    • mode: {'gru', 'rnn', 'lstm'}, type of recurrent layer
    • activation: str | None = 'ELU', which activation function to use from PyTorch
    • bidirectional: {None, 'sum', 'mean', 'concatenate'}, if a bidirectional GRU should be used and the method for combining the two directions

Normalizing Flow

  • SplineFlow: Neural spline flow
    • transforms: int, number of transforms
    • hidden_features: list[int], number of features in each of the hidden layers
    • context: bool = False, if the output from the previous layer should be used to condition the normalizing flow
    • features: optional int, dimensions of the probability distribution, if factor is provided, features will not be used
    • factor: optional float, output features is equal to the factor of the network's output, will be used if provided, else features will be used

Utility

  • Checkpoint: Saves the output from the previous layer for use in future layers
  • Concatenate: Concatenates the previous layer with a specified layer
    • layer: int, layer index to concatenate the previous layer output with
    • checkpoint: bool = False, if layer should be relative to checkpoints or network layers, if checkpoints in net is True, layer will always be relative to checkpoints
    • dim : int = 0, dimension to concatenate to (not including $N$)
  • DropPath: Drop path to drop samples in a batch
    • prob: float, probability of dropout
  • Index: Slices the output from the previous layer
    • number: int, number of values to slice, can be negative
    • greater: bool = True, if slice should be values greater or less than number
  • LayerNorm: Layer normalisation with priority from the first dimension after batch dimension
    • dims: optional int, number of dimensions to normalise starting with the first dimension, ignoring batch dimension, won't be used if shape is provided
    • shape: optional list[int], input shape or shape of the first dimension to normalise, will be used if provided, else dims will be used
  • Reshape: Reshapes the dimensions
    • shape: list[int], desired shape of the output tensor, ignoring first dimension
    • layer: optional int, if factor is True, which layer for factor to be relative to, if None, network output will be used
    • factor: bool = False, if reshape should be relative to the network output shape, or if layer is provided, which layer to be relative to
  • Scale: Scales the output by a learnable tensor
    • dims: int, number of dimensions to have individual scales for
    • scale: float, initial scale factor
    • first: bool = True, if dims should count from the first dimension after the batch dimension, or from the final dimension backwards
  • Shortcut: Adds the previous layer with the specified layer
    • layer: int, layer index to add to the previous layer output with
    • checkpoint: bool = False, if layer should be relative to checkpoints or network layers, if checkpoints in net is True, layer will always be relative to checkpoints
  • Skip: Passes the output from layer into the next layer
    • layer: int, layer index to get the output from
    • checkpoint: bool = False, if layer should be relative to checkpoints or network layers, if checkpoints in net is True, layer will always be relative to checkpoints
  • Unpack: Enables a list of Tensors as input into the network, then selects which Tensor in the list to output
    • index: int, index of input Tensor list

Composite Layers

Custom blocks can be made from the layers above and inserted into the network. This is useful if making repetitive blocks such as the Inception block (Szegedy, et al. 2015).
The block is created in the same way a network is as a .json file.
In the network.json file, the block can be inserted by creating a composite layer with parameters:

  • Composite: Custom layer that combines multiple layers in a .json file for repetitive use
    • name: str, name of the subnetwork
    • config_dir: str, path to the directory with the network configuration file
    • checkpoint: bool = True, if layer index should be relative to checkpoint layers
    • channels: optional int, number of output channels, won't be used if shape is provided, if channels and shape aren't provided, the input dimensions will be preserved
    • shape: optional list[int], output shape of the block, will be used if provided; otherwise, channels will be used
    • defaults: optional dict[str, Any], default values for the parameters for each type of layer

Clone this wiki locally