-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Fix CQLSSTableWriter serialization of vector of date and time #4431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d9fd446 to
5320b46
Compare
| // Fix for CASSANDRA-20979 altered behaviour of SimpleDateType and TimeType by correctly | ||
| // returning the fix-length size in valueLengthIfFixed(). This change impacted logic of | ||
| // isSerializationCompatibleWith(), which as a result complains about backward compatibility of: | ||
| // 1) [SimpleDateType isSerializationCompatibleWith Int32Type, {}] expected:<[fals]e> but was:<[tru]e> | ||
| // 2) [TimeType isSerializationCompatibleWith LongType, {}] expected:<[fals]e> but was:<[tru]e> | ||
| // 3) [BytesType isSerializationCompatibleWith SimpleDateType, {}] expected:<[tru]e> but was:<[fals]e> | ||
| // 4) [BytesType isSerializationCompatibleWith TimeType, {}] expected:<[tru]e> but was:<[fals]e> | ||
| ImmutableMap<AbstractType, Set<AbstractType>> exclusions = ImmutableMap.<AbstractType, Set<AbstractType>>builder() | ||
| .put(SimpleDateType.instance, ImmutableSet.of(Int32Type.instance)) | ||
| .put(TimeType.instance, ImmutableSet.of(LongType.instance)) | ||
| .put(BytesType.instance, ImmutableSet.of(SimpleDateType.instance, TimeType.instance)) | ||
| .build(); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of excluding the pairs from test, would it be correct if we mark those pairs with the previous compatibility values? The method isSerializationCompatibleWith is used to determine whether a dropped column can be re-added with a different type. If the value is still readable, we should be good with marking the types explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good catch and it triggered me to review complete fix for the issue again.
I have tried changing the isSerializationCompatibleWith method as you suggest, but then the test was failing:
[junit-timeout] 1) [BytesType isSerializationCompatibleWith TimeType, {}]
[junit-timeout] Expecting code not to raise a throwable but caught
[junit-timeout] "java.lang.AssertionError: Property falsified after 1 example(s)
[junit-timeout] Smallest found falsifying value(s) :-
[junit-timeout] 0
[junit-timeout] Cause was :-
[junit-timeout] org.junit.ComparisonFailure: [BytesType .deserialize TimeType, {}] expected:<...yteBuffer[pos=0 lim=[8 cap=8]]> but was:<...yteBuffer[pos=0 lim=[0 cap=0]]>
[junit-timeout] at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
[junit-timeout] at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
[junit-timeout] at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
[junit-timeout] at org.apache.cassandra.db.marshal.AbstractTypeTest.verifySerializationCompatibilityForSimpleCells(AbstractTypeTest.java:1285)
[junit-timeout] at org.apache.cassandra.db.marshal.AbstractTypeTest.lambda$verifyTypesCompatibility$39(AbstractTypeTest.java:1132)
[junit-timeout] at org.quicktheories.dsl.TheoryBuilder.lambda$checkAssert$7(TheoryBuilder.java:145)
[junit-timeout] at org.quicktheories.impl.Property.tryFalsification(Property.java:23)
Updating TimeType#valueLengthIfFixed() to return concrete length, changes deserialization behaviour of empty byte array to return the 8 element array with zeros.
I have then tried to return to starting point and look at vector implementation. The choice between fix-length and variable-length serialization is done in two places - VectorCodec class and in VectorType:
public static <E> VectorCodec<E> of(VectorType type, TypeCodec<E> subtypeCodec)
{
return subtypeCodec.isSerializedSizeFixed()
? new FixedLength<>(type, subtypeCodec)
: new VariableLength<>(type, subtypeCodec);
}
private VectorType(AbstractType<T> elementType, int dimension)
{
super(ComparisonType.CUSTOM);
// ...
this.serializer = elementType.isValueLengthFixed() ?
new FixedLengthSerializer() :
new VariableLengthSerializer();
}
The TypeCodec#serializedSize() is not matching AbstractType#valueLengthIfFixed() for those two types. The CI run after latest change looks good, but I need to compare the codec and type implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. In trunk (13a3ae9), org.apache.cassandra.db.marshal.TimeType#valueLengthIfFixed returns -1 but org.apache.cassandra.cql3.functions.types.TypeCodec.TimeCodec#serializedSize return 8.
org.apache.cassandra.db.marshal.SimpleDateType#valueLengthIfFixed returns -1 and org.apache.cassandra.cql3.functions.types.TypeCodec.DateCodec#serializedSize returns 8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To serialize the vector, we use VectorCodec which relies on TypeCodec#serializedSize to decide is subtype is fixed or variable length:
public static <E> VectorCodec<E> of(VectorType type, TypeCodec<E> subtypeCodec)
{
return subtypeCodec.isSerializedSizeFixed()
? new FixedLength<>(type, subtypeCodec)
: new VariableLength<>(type, subtypeCodec);
}
When constructing VectorType, we actually relay on AbstractType#valueLengthIfFixed:
private VectorType(AbstractType<T> elementType, int dimension)
{
super(ComparisonType.CUSTOM);
// ...
this.serializer = elementType.isValueLengthFixed() ?
new FixedLengthSerializer() :
new VariableLengthSerializer();
}
Looking at failed stack trace:
org.apache.cassandra.serializers.MarshalException: Expected 8 byte long for time (0)
at org.apache.cassandra.serializers.TimeSerializer.validate(TimeSerializer.java:74)
at org.apache.cassandra.db.marshal.VectorType$VariableLengthSerializer.unpack(VectorType.java:613)
at org.apache.cassandra.db.marshal.VectorType.unpack(VectorType.java:142)
at org.apache.cassandra.db.marshal.MultiElementType.unpack(MultiElementType.java:81)
at org.apache.cassandra.cql3.terms.MultiElements$Value.fromSerialized(MultiElements.java:69)
at org.apache.cassandra.cql3.terms.Marker.bind(Marker.java:85)
at org.apache.cassandra.cql3.terms.Term$NonTerminal.bindAndGet(Term.java:304)
at org.apache.cassandra.cql3.terms.Constants$Setter.execute(Constants.java:482)
at org.apache.cassandra.cql3.statements.UpdateStatement.addUpdateForKey(UpdateStatement.java:126)
at org.apache.cassandra.io.sstable.CQLSSTableWriter.rawAddRow(CQLSSTableWriter.java:314)
at org.apache.cassandra.io.sstable.CQLSSTableWriter.addRow(CQLSSTableWriter.java:215)
at org.apache.cassandra.io.sstable.CQLSSTableWriter.addRow(CQLSSTableWriter.java:185)
at org.apache.cassandra.io.sstable.CQLSSTableWriterTest.testWritingVectorData(CQLSSTableWriterTest.java:1678)
The unpack operation is performed on MultiElementType, which in the end uses VectorType. If VectorCodec thinks that type is of fixed-length, whereas VectorType assumes it is variable-length, the unpack operation will fail.
Proposed changes do not modify serialization or deserialization of types, but just synchronize behaviour of TypeCodec#serializedSize and AbstractType#valueLengthIfFixed.
yifan-c
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The patch looks good to me, especially from the perspective of fixing the implementation difference.
I think it is worth double-checking with @adelapena if the patch is doing the right thing. The serializedSize method was introduced in CASSANDRA-18613.
| @Override | ||
| public int serializedSize() | ||
| { | ||
| return VARIABLE_LENGTH; | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add code comment to explain why it is different from its parent method.
c506c0f to
458b6f2
Compare
adelapena
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes around the time codec to make it match the time type LGTM. I see the ticket and PR titles also mention the date type?
| * New release | ||
|
|
||
| -- Mick Semb Wever <mck@apache.org> Fri, 01 Sep 2023 11:22:35 +0200 | ||
| >>>>>>> cassandra-5.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might better do it in a ninja fix.
Correct time codec and date codec. For date codec I have removed overridden |
adelapena
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me, +1.
e2b5bf2 to
ea5b194
Compare
CHANGES.txt
Outdated
| @@ -1,4 +1,5 @@ | |||
| 5.1 | |||
| * Fix CQLSSTableWriter serialization of vector of date and time (CASSANDRA-20979) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs to go instead under the "Merged from 5.0:" section
patch by Lukasz Antoniak; reviewed by Andres de la Pena, Yifan Cai for CASSANDRA-20979
ea5b194 to
692e3ba
Compare
The Cassandra Jira
CI link: Job 145