Skip to content

5. Transformations

EthanTreg edited this page Jun 12, 2025 · 1 revision

Data Transformation using transforms

In the section 2. Network Architectures and Training, BaseNetwork objects require the input data to be pre-normalised (if the data should be normalised) when returned from the dataset __getitem__.
To normalise the data, there are several child classes of BaseTransform from netloader.transforms that can apply several transformations, un-transformations, and uncertainty forward and backward propagation.

Currently, the supported transformations are:

  • Index: Slices the input along a given dimension assuming the input meets the required shape
  • Log: Logarithmic transform
  • MinClamp: Clamps the minimum value to be the smallest positive value
  • MultiTransform: Applies multiple transformations
  • Normalise: Normalises the data to zero mean and unit variance, or between 0 and 1
  • NumpyTensor: Converts Numpy arrays to PyTorch tensors
  • Reshape: Reshapes the data

Example code:

from netloader import transforms

from src.data import CustomDataset

# Create dataset
dataset = CustomDataset()

# Construct numpy to Tensor, then Log10 transformation
transform = transforms.MultiTransform(transforms.NumpyTensor(), transforms.Log())

# Transform dataset
transformed_data, transformed_uncertainties = transform(
    dataset.data,
    uncertainty=dataset.uncertainties,
)

# Untransform data
untransformed_data, untransformed_uncertainties = transform(
    transformed_data,
    back=True,
    uncertainty=transformed_uncertainties,
)
assert (untransformed_data == dataset.data).all()
assert (untransformed_uncertainties == dataset.uncertainties).all()

BaseTransform

The BaseTransform is the parent class of all transforms; therefore, the methods of the class will be what all transforms build off.

Methods:

  • forward: Forward pass of the transformation
    • x: ArrayLike, input array or tensor of shape (N,...), where N is the number of elements
    • return: ArrayLike, transformed array or tensor of shape (N,...)
  • backward: Backwards pass to invert the transformation
    • x: ArrayLike, input array or tensor of shape (N,...), where N is the number of elements
    • return: ArrayLike, untransformed array or tensor of shape (N,...)
  • forward_grad: Forward pass of the transformation and uncertainty propagation
    • x: ArrayLike, input array or tensor of shape (N,...), where N is the number of elements
    • uncertainty: ArrayLike, uncertainty of the input array or tensor of shape (N,...)
    • return: tuple[ArrayLike, ArrayLike], transformed array or tensor of shape (N,...) and transformed uncertainty of shape (N,...)
  • backward_grad: Backwards pass to invert the transformation and uncertainty propagation
    • x: ArrayLike, input array or tensor of shape (N,...), where N is the number of elements
    • uncertainty: ArrayLike, uncertainty of the input array or tensor of shape (N,...)
    • return: tuple[ArrayLike, ArrayLike], untransformed array or tensor of shape (N,...) and untransformed uncertainty of shape (N,...)

The methods do not need to be explicitly called as the magic method __call__ will automatically call the corresponding method depending upon if back is True or not, and if uncertainty is not None.

Magic Methods:

  • __call__: Calling function returns the forward, backwards or uncertainty propagation of the transformation
    • x: ArrayLike, input array or tensor of shape (N,...), where N is the number of elements
    • back: bool = False, if the inverse transformation should be applied
    • uncertainty: ArrayLike | None = None, corresponding uncertainties for the input data for uncertainty propagation of shape (N,...)
    • return: ArrayLike | tuple[ArrayLike, ArrayLike], transformed array or tensor of shape (N,...) and propagated uncertainties of shape (N,...) if provided
  • __repr__: Representation of the transformation
    • return: str, representation string
  • __getstate__: Returns a dictionary containing the state of the transformation for pickling
    • return: dict[str, Any], dictionary containing the state of the transformation
  • __setstate__: Sets the state of the transformation for pickling
    • state: dict[str, Any], dictionary containing the state of the transformation

Index

Slices the input along a given dimension.

Initialisation Arguments:

  • dim: int = -1, dimension to slice over
  • in_shape: tuple[int, ...] | None = None, target shape ignoring batch size so that the slice only occurs if the input has the same shape to prevent repeated slicing, if any dimension has a shape of -1, then the size of the dimension will be ignored
  • slice_: slice = slice(None), slicing object

Log

Applies the logarithmic transform with a given base.

Initialisation Arguments:

  • base: float = 10, base of the logarithm
  • idxs: list[int] = None, indices to slice the last dimension to perform the log on

MinClamp

Clamps the minimum value to be the smallest positive value, useful before the Log transform as this prevents negative or zero values.

Initialisation Arguments:

  • dim: int = None, dimension to take the minimum value over
  • idxs: list[int] = None, indices to slice the last dimension to perform the min clamp on

MultiTransform

Applies multiple transformations.
Can be indexed or sliced to return the indexed transform or a child MultiTransform of the sliced transforms.

Attributes:

  • transforms: list[BaseTransform], list of transformations

Initialisation Arguments:

  • *args: BaseTransform, transformations

Additional Methods:

  • append: Appends a transform to the list of transforms
    • transform: BaseTransform, transform to append to the list of transforms

Normalise

Normalises the data to zero mean and unit variance, or between 0 and 1.

Attributes:

  • offset: ndarray, offset to subtract from the data
  • scale: ndarray, scale to divide the data by

Initialisation Arguments:

  • mean: bool = True, if data should be normalised to zero mean and unit variance, or between 0 and 1
  • dim: int | tuple[int, ...] | None = None, dimensions to normalise over, if None, all dimensions will be normalised over
  • offset: ndarray | None = None, offset to subtract from the data if data argument is None
  • scale: ndarray | None = None, scale to divide the data if data argument is None
  • data: ArrayLike | None = None, data to normalise with shape (N,...), where N is the number of elements

If argument data is provided, arguments offset and scale will be ignored.

NumpyTensor

The forward pass converts Numpy arrays to PyTorch tensors, while the backwards pass converts tensors to arrays.

Attributes:

  • dtype: dtype = float32, data type of the tensor

Initialisation Arguments:

  • dtype: dtype = float32, data type of the tensor

Reshape

Reshapes the data

Initialisation Arguments

  • in_shape: list[int] | None = None, original shape of the data
  • out_shape: list[int] | None = None, output shape of the data

Creating Custom Transforms

If the transforms above do not fulfil all the transformation requirements, then you can extend the BaseTransform class.

Creating Log from BaseTransform

First, the new class must inherit BaseTransform, then define the initialisation method.
The __init__ method is defined as:

from netloader.transforms import BaseTransform


class Log(BaseTransform):
    def __init__(self, base: float = 10):
        super().__init__()
        self._base: float = base

Then the forward and backward method can be defined for tensors and arrays, where the inverse of $f(x)=\log_b{x}$ is $x=b^{f(x)}$, where $b$ is the base:

from typing import TypeVar
from types import ModuleType

import torch
import numpy as np
from torch import Tensor
from numpy import ndarray

ArrayLike = TypeVar('ArrayLike', ndarray, Tensor)


def forward(self, x: ArrayLike) -> ArrayLike:
    module: ModuleType = torch if isinstance(x, Tensor) else np
    return module.log(x) / np.log(self._base)

def backward(self, x: ArrayLike) -> ArrayLike:
    return self._base ** x

For uncertainty propagation, the forward_grad and backward_grad can be defined, where the uncertainty propagation is $\sigma_f\approx\left|\frac{\sigma_x}{x\ln{b}}\right|$:

def forward_grad(
        self,
        x: ArrayLike,
        uncertainty: ArrayLike) -> tuple[ArrayLike, ArrayLike]:
    module: ModuleType = torch if isinstance(x, Tensor) else np
    return self(x), module.abs(uncertainty / (x * np.log(self._base)))

def backward_grad(
        self,
        x: ArrayLike,
        uncertainty: ArrayLike) -> tuple[ArrayLike, ArrayLike]:
    module: ModuleType = torch if isinstance(x, Tensor) else np
    x = self(x, back=True)
    return x, module.abs(uncertainty * x * np.log(self._base))

Finally, for save saving of the transform state, __getstate__ and __setstate__ can be defined to only save tensors, primitive types, and dictionaries, such as the base of the logarithm as an integer:

def __getstate__(self):
    return {'base': self._base}

def __setstate__(self, state):
    self._base = state['base']

Therefore, the full class is:

from typing import TypeVar
from types import ModuleType

import torch
import numpy as np
from torch import Tensor
from numpy import ndarray
from netloader.transforms import BaseTransform

ArrayLike = TypeVar('ArrayLike', ndarray, Tensor)


class Log(BaseTransform):
    def __init__(self, base: float = 10):
        super().__init__()
        self._base: float = base
        
    def __getstate__(self):
        return {'base': self._base}
  
    def __setstate__(self, state):
        self._base = state['base']

    def forward(self, x: ArrayLike) -> ArrayLike:
        module: ModuleType = torch if isinstance(x, Tensor) else np
        return module.log(x) / np.log(self._base)

    def backward(self, x: ArrayLike) -> ArrayLike:
        return self._base ** x

    def forward_grad(
            self,
            x: ArrayLike,
            uncertainty: ArrayLike) -> tuple[ArrayLike, ArrayLike]:
        module: ModuleType = torch if isinstance(x, Tensor) else np
        return self(x), module.abs(uncertainty / (x * np.log(self._base)))

    def backward_grad(
            self,
            x: ArrayLike,
            uncertainty: ArrayLike) -> tuple[ArrayLike, ArrayLike]:
        module: ModuleType = torch if isinstance(x, Tensor) else np
        x = self(x, back=True)
        return x, module.abs(uncertainty * x * np.log(self._base))