Skip to content

Vortex Tensor#24

Merged
gatesn merged 1 commit intodevelopfrom
ct/tensor
Mar 4, 2026
Merged

Vortex Tensor#24
gatesn merged 1 commit intodevelopfrom
ct/tensor

Conversation

@connortsui20
Copy link
Contributor

@connortsui20 connortsui20 commented Mar 4, 2026

Rendered

We would like to add a Tensor type to Vortex as an extension over FixedSizeList. This RFC proposes the design of a fixed-shape tensor with contiguous backing memory.

Edit: We merged this early so opened a new followup PR: #25

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Copy link

@danking danking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems great over all. I'm excited for tensors to land in Vortex.


## Summary

We would like to add a Tensor type to Vortex as an extension over `FixedSizeList`. This RFC proposes
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps worth explicitly calling this FixedShapeTensor since it's not unreasonable to also want variable shape tensors (but of fixed dimension). For example, in genetics, we often want to take the ~100M rows of genetic variants and collapse into ~30K genes and, for each gene, construct a matrix of genotypes and run a regression. Those matrices always have the same dimensionality (2) but their shape varies (in this case, the sample axis is always the same, N_SAMPLES, but the genetic variant access depends on the size of that gene which varies from a few hundred base pairs (SRY) to 30,000 base pairs (TITIN).

In the future, I can imagine we'll have both FixedSizeTensor<f32, (a, b, c)> and Tensor<f32, 3> (named tbd).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I think we basically want to replicate both of Arrow's fixed and variable size tensors.


### Element Type

We restrict tensor element types to `Primitive` and `Decimal`. Tensors are fundamentally about dense
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why Decimal? That seems bizarre to me. Are there any fast implementation of matmul for arrays of Decimals?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does PyTorch support decimal?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, support for fast matmul of fixed-point types was also pretty garbage last time I looked. Does anyone need fixed-point matrices?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah if we're going to restrict it, let's just say Primitive for now.

#### Tensors in Vortex

In the current version of Vortex, there are two ways to represent fixed-shape tensors using the
`FixedSizeList` `DType`, and neither seems satisfactory.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I allowed to implement a SparseTensorArray whose dtype is Tensor but whose layout is not a FixedSizeList of the right size?

### Validity

We define two layers of nullability for tensors: the tensor itself may be null (within a tensor
array), and individual elements within a tensor may be null. However, we do not support nulling out
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why allow the elements to be null?

IMO, the main reason to use a Tensor type is so that you can define operations like matmul and I worry that we can't efficiently implement matmul on a nullable type like f32?.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW: I feel pretty strongly that we shouldn't support nullable elements of a tensor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's always something we can relax later, so I'm in favor of restricting this now

integers, ellipses, boolean arrays, etc.). It supports operations like canonicalization, shape
inference, and re-indexing onto array chunks. We will want to implement tensor compute expressions
in Vortex that are similar to the operations ndindex provides — for example, computing the result
shape of a slice or translating a logical index into a physical offset.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also worth noting xarray. That was where I first encountered the idea of named dimensions. It also has a notion of "coordinates" which are "marginal" arrays. For example, you might have a matrix of temperature values on the surface of the earth. The rows and columns of that matrix could have coordinate values that indicate the latitudes and longitudes associated with the rows and columns.


### Academic work

- **TACO (Tensor Algebra Compiler)** separates the tensor storage format from the tensor program.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taco is really great work! I guess I think of it more as a system for generating fast matmul kernels given the physical layout of two arrays.

http://tensor-compiler.org/publications.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, could be interesting to implement a tensor array that uses these sparse layouts though


## Summary

We would like to add a Tensor type to Vortex as an extension over `FixedSizeList`. This RFC proposes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I think we basically want to replicate both of Arrow's fixed and variable size tensors.


### Element Type

We restrict tensor element types to `Primitive` and `Decimal`. Tensors are fundamentally about dense
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah if we're going to restrict it, let's just say Primitive for now.

array), and individual elements within a tensor may be null. However, we do not support nulling out
entire sub-dimensions of a tensor (e.g., marking a whole row or slice as null).

The validity bitmap is flat (one bit per element) and follows the same contiguous layout as the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this sentence is saying? It sounds like tensors store additional validity on top of FSL. But actually we're just saying a tensor uses FSL as its storage type?

/// Optional names for each dimension. Each name corresponds to a dimension in the `shape`.
///
/// If names exist, there must be an equal number of names to dimensions.
dim_names: Option<Vec<String>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vec<Option<String>>? Not sure...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to do it this way since this is what arrow has, and also I personally do not want to deal with some dimensions being named and others not named.


### Academic work

- **TACO (Tensor Algebra Compiler)** separates the tensor storage format from the tensor program.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, could be interesting to implement a tensor array that uses these sparse layouts though

@gatesn gatesn merged commit 7a6f9c0 into develop Mar 4, 2026
3 checks passed
@gatesn gatesn deleted the ct/tensor branch March 4, 2026 22:41
gatesn pushed a commit that referenced this pull request Mar 5, 2026
[Rendered](https://github.com/vortex-data/rfcs/blob/ct/tensor-revise/accepted/0024-tensor.md)

Some revisions from #24

This also moves the RFC into the `accepted` directory.

I'll just keep this named `tensor` since future RFCs can be called
variable or sparse tensors.

The only change that was not directly because of the comments on the
last PR was a change to the strides section, because some of the
description was incorrect.

---------

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants