Fixed-Shape Tensor RFC revisions by connortsui20 · Pull Request #25 · vortex-data/rfcs

connortsui20 · 2026-03-05T14:35:44Z

Some revisions from #24

This also moves the RFC into the accepted directory.

I'll just keep this named tensor since future RFCs can be called variable or sparse tensors.

The only change that was not directly because of the comments on the last PR was a change to the strides section, because some of the description was incorrect.

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

connortsui20 · 2026-03-05T14:44:11Z

Still working on a change to the strides section, its a bit more complicated than it seems at first because arrow does not have a logical type system and "shape + permutation" can mean 2 different things

Edit: done in commit 53ad802

AdamGS · 2026-03-05T14:45:58Z

accepted/0024-tensor.md

+Nullability exists only at the tensor level: within a tensor array, an individual tensor may be
+null, but elements within a tensor may not be. This is because tensor operations like matmul cannot
+be efficiently implemented over nullable elements, and most tensor libraries (e.g., PyTorch) do not
+support per-element nulls either.


commenting here but maybe it should go on the previous PR?

IDK how arrow does it, but I don't think that's necessarily true.
Most vectorized compute just runs through null values that are zeroed out, IDK what's how you matmul the validity itself, but I think that's a reasonable thing

I think interpretation of NULLs is context dependent. If NULL means "there was no data observed at this position" and you're doing a weighted sum of the features, treating NULLs as zero is probably the right choice. The result is indeed the count of what you observed. You can't infer anything about things you did not observe.

On the other hand, if NULL means "there is some data here but for technical reasons it was unrecoverable" and you're doing a linear regression, you probably want to replace NULL by a mean value over some dimension(s). I don't have a good linear regression example, but suppose you flip one hundred coins and record heads as 1 and tails as 0. Suppose further that you lose 10 coins before observing them. If you compute the sum of this vector with NULL as zeros you'll conclude the coins are tails-biased! If you compute the sum of this vector with NULL as the sample mean, you'll have an unbiased estimate of the coin's heads/tails probability.

IMO, matmul, sum, etc. should only be defined on tensors with non-nullable elements. I suppose null elements are fine? if they're representable in torch (I think they are not?).

Numpy is able to represent them when you use the catchall-object-dtype, but if you request primitive types it converts them to NaNs.

In [8]: np.array([1., None]) Out[8]: array([1.0, None], dtype=object) In [9]: np.array([1., None], dtype=float) Out[9]: array([ 1., nan]) In [10]: np.array([1., None], dtype=np.dtype('f4')) Out[10]: array([ 1., nan], dtype=float32)

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

danking · 2026-03-05T15:18:38Z

accepted/0024-tensor.md

+
+Physical shape favors Arrow compatibility and simpler stride math. Logical shape favors
+NumPy/PyTorch compatibility and is arguably more intuitive for our users since Vortex has a logical
+type system.


FWIW, I think torch/numpy integration matters more for tensors than arrow compatibility. There's no linear algebra library that natively works on arrow arrays.

I agree, and the conversion will be cheap regardless

danking

looks great

connortsui20 added 2 commits March 5, 2026 09:34

revisions

1373a64

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

move

c283fae

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

connortsui20 requested review from danking and gatesn March 5, 2026 14:35

connortsui20 changed the title ~~Tensor RFC revisions~~ Fixed-Shape Tensor RFC revisions Mar 5, 2026

connortsui20 closed this Mar 5, 2026

connortsui20 reopened this Mar 5, 2026

AdamGS reviewed Mar 5, 2026

View reviewed changes

connortsui20 mentioned this pull request Mar 5, 2026

Vortex Tensor #24

Merged

fix errors wrt strides and logical/physical shape

53ad802

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

danking reviewed Mar 5, 2026

View reviewed changes

danking approved these changes Mar 5, 2026

View reviewed changes

gatesn merged commit 0ffa944 into develop Mar 5, 2026
3 checks passed

gatesn deleted the ct/tensor-revise branch March 5, 2026 15:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed-Shape Tensor RFC revisions#25

Fixed-Shape Tensor RFC revisions#25
gatesn merged 3 commits intodevelopfrom
ct/tensor-revise

connortsui20 commented Mar 5, 2026 •

edited

Loading

Uh oh!

connortsui20 commented Mar 5, 2026 •

edited

Loading

Uh oh!

AdamGS Mar 5, 2026

Uh oh!

danking Mar 5, 2026

Uh oh!

danking Mar 5, 2026

Uh oh!

connortsui20 Mar 5, 2026

Uh oh!

danking left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

connortsui20 commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

connortsui20 commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AdamGS Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

danking Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

danking Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

connortsui20 Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

danking left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

connortsui20 commented Mar 5, 2026 •

edited

Loading

connortsui20 commented Mar 5, 2026 •

edited

Loading