Skip to content

Conversation

@lukasz-antoniak
Copy link
Member

@lukasz-antoniak lukasz-antoniak commented Oct 17, 2025

The Cassandra Jira

CI link: Job 145

Comment on lines 963 to 975
// Fix for CASSANDRA-20979 altered behaviour of SimpleDateType and TimeType by correctly
// returning the fix-length size in valueLengthIfFixed(). This change impacted logic of
// isSerializationCompatibleWith(), which as a result complains about backward compatibility of:
// 1) [SimpleDateType isSerializationCompatibleWith Int32Type, {}] expected:<[fals]e> but was:<[tru]e>
// 2) [TimeType isSerializationCompatibleWith LongType, {}] expected:<[fals]e> but was:<[tru]e>
// 3) [BytesType isSerializationCompatibleWith SimpleDateType, {}] expected:<[tru]e> but was:<[fals]e>
// 4) [BytesType isSerializationCompatibleWith TimeType, {}] expected:<[tru]e> but was:<[fals]e>
ImmutableMap<AbstractType, Set<AbstractType>> exclusions = ImmutableMap.<AbstractType, Set<AbstractType>>builder()
.put(SimpleDateType.instance, ImmutableSet.of(Int32Type.instance))
.put(TimeType.instance, ImmutableSet.of(LongType.instance))
.put(BytesType.instance, ImmutableSet.of(SimpleDateType.instance, TimeType.instance))
.build();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of excluding the pairs from test, would it be correct if we mark those pairs with the previous compatibility values? The method isSerializationCompatibleWith is used to determine whether a dropped column can be re-added with a different type. If the value is still readable, we should be good with marking the types explicitly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good catch and it triggered me to review complete fix for the issue again.

I have tried changing the isSerializationCompatibleWith method as you suggest, but then the test was failing:

[junit-timeout] 1) [BytesType isSerializationCompatibleWith TimeType, {}] 
[junit-timeout] Expecting code not to raise a throwable but caught
[junit-timeout]   "java.lang.AssertionError: Property falsified after 1 example(s) 
[junit-timeout] Smallest found falsifying value(s) :-
[junit-timeout] 0
[junit-timeout] Cause was :-
[junit-timeout] org.junit.ComparisonFailure: [BytesType .deserialize TimeType, {}] expected:<...yteBuffer[pos=0 lim=[8 cap=8]]> but was:<...yteBuffer[pos=0 lim=[0 cap=0]]>
[junit-timeout] 	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
[junit-timeout] 	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
[junit-timeout] 	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
[junit-timeout] 	at org.apache.cassandra.db.marshal.AbstractTypeTest.verifySerializationCompatibilityForSimpleCells(AbstractTypeTest.java:1285)
[junit-timeout] 	at org.apache.cassandra.db.marshal.AbstractTypeTest.lambda$verifyTypesCompatibility$39(AbstractTypeTest.java:1132)
[junit-timeout] 	at org.quicktheories.dsl.TheoryBuilder.lambda$checkAssert$7(TheoryBuilder.java:145)
[junit-timeout] 	at org.quicktheories.impl.Property.tryFalsification(Property.java:23)

Updating TimeType#valueLengthIfFixed() to return concrete length, changes deserialization behaviour of empty byte array to return the 8 element array with zeros.

I have then tried to return to starting point and look at vector implementation. The choice between fix-length and variable-length serialization is done in two places - VectorCodec class and in VectorType:

public static <E> VectorCodec<E> of(VectorType type, TypeCodec<E> subtypeCodec)
{
    return subtypeCodec.isSerializedSizeFixed()
           ? new FixedLength<>(type, subtypeCodec)
           : new VariableLength<>(type, subtypeCodec);
}
private VectorType(AbstractType<T> elementType, int dimension)
{
    super(ComparisonType.CUSTOM);
    // ...
    this.serializer = elementType.isValueLengthFixed() ?
                      new FixedLengthSerializer() :
                      new VariableLengthSerializer();
}

The TypeCodec#serializedSize() is not matching AbstractType#valueLengthIfFixed() for those two types. The CI run after latest change looks good, but I need to compare the codec and type implementations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. In trunk (13a3ae9), org.apache.cassandra.db.marshal.TimeType#valueLengthIfFixed returns -1 but org.apache.cassandra.cql3.functions.types.TypeCodec.TimeCodec#serializedSize return 8.

org.apache.cassandra.db.marshal.SimpleDateType#valueLengthIfFixed returns -1 and org.apache.cassandra.cql3.functions.types.TypeCodec.DateCodec#serializedSize returns 8.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To serialize the vector, we use VectorCodec which relies on TypeCodec#serializedSize to decide is subtype is fixed or variable length:

public static <E> VectorCodec<E> of(VectorType type, TypeCodec<E> subtypeCodec)
{
    return subtypeCodec.isSerializedSizeFixed()
           ? new FixedLength<>(type, subtypeCodec)
           : new VariableLength<>(type, subtypeCodec);
}

When constructing VectorType, we actually relay on AbstractType#valueLengthIfFixed:

private VectorType(AbstractType<T> elementType, int dimension)
{
    super(ComparisonType.CUSTOM);
    // ...
    this.serializer = elementType.isValueLengthFixed() ?
                      new FixedLengthSerializer() :
                      new VariableLengthSerializer();
}

Looking at failed stack trace:

org.apache.cassandra.serializers.MarshalException: Expected 8 byte long for time (0)
	at org.apache.cassandra.serializers.TimeSerializer.validate(TimeSerializer.java:74)
	at org.apache.cassandra.db.marshal.VectorType$VariableLengthSerializer.unpack(VectorType.java:613)
	at org.apache.cassandra.db.marshal.VectorType.unpack(VectorType.java:142)
	at org.apache.cassandra.db.marshal.MultiElementType.unpack(MultiElementType.java:81)
	at org.apache.cassandra.cql3.terms.MultiElements$Value.fromSerialized(MultiElements.java:69)
	at org.apache.cassandra.cql3.terms.Marker.bind(Marker.java:85)
	at org.apache.cassandra.cql3.terms.Term$NonTerminal.bindAndGet(Term.java:304)
	at org.apache.cassandra.cql3.terms.Constants$Setter.execute(Constants.java:482)
	at org.apache.cassandra.cql3.statements.UpdateStatement.addUpdateForKey(UpdateStatement.java:126)
	at org.apache.cassandra.io.sstable.CQLSSTableWriter.rawAddRow(CQLSSTableWriter.java:314)
	at org.apache.cassandra.io.sstable.CQLSSTableWriter.addRow(CQLSSTableWriter.java:215)
	at org.apache.cassandra.io.sstable.CQLSSTableWriter.addRow(CQLSSTableWriter.java:185)
	at org.apache.cassandra.io.sstable.CQLSSTableWriterTest.testWritingVectorData(CQLSSTableWriterTest.java:1678)

The unpack operation is performed on MultiElementType, which in the end uses VectorType. If VectorCodec thinks that type is of fixed-length, whereas VectorType assumes it is variable-length, the unpack operation will fail.

Proposed changes do not modify serialization or deserialization of types, but just synchronize behaviour of TypeCodec#serializedSize and AbstractType#valueLengthIfFixed.

Copy link
Contributor

@yifan-c yifan-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patch looks good to me, especially from the perspective of fixing the implementation difference.
I think it is worth double-checking with @adelapena if the patch is doing the right thing. The serializedSize method was introduced in CASSANDRA-18613.

Comment on lines 1873 to 1879
@Override
public int serializedSize()
{
return VARIABLE_LENGTH;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add code comment to explain why it is different from its parent method.

@lukasz-antoniak lukasz-antoniak force-pushed the CASSANDRA-20979 branch 3 times, most recently from c506c0f to 458b6f2 Compare December 15, 2025 09:03
Copy link
Contributor

@adelapena adelapena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes around the time codec to make it match the time type LGTM. I see the ticket and PR titles also mention the date type?

* New release

-- Mick Semb Wever <mck@apache.org> Fri, 01 Sep 2023 11:22:35 +0200
>>>>>>> cassandra-5.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might better do it in a ninja fix.

@lukasz-antoniak
Copy link
Member Author

The changes around the time codec to make it match the time type LGTM. I see the ticket and PR titles also mention the date type?

Correct time codec and date codec. For date codec I have removed overridden serializedSize() (which returned 8), to make it variable-length, analogical to SimpleDateType.

Copy link
Contributor

@adelapena adelapena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me, +1.

CHANGES.txt Outdated
@@ -1,4 +1,5 @@
5.1
* Fix CQLSSTableWriter serialization of vector of date and time (CASSANDRA-20979)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs to go instead under the "Merged from 5.0:" section

patch by Lukasz Antoniak; reviewed by Andres de la Pena, Yifan Cai for CASSANDRA-20979
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants