Add tests priorization to prevent testing all formats#155
Draft
Add tests priorization to prevent testing all formats#155
Conversation
bolinocroustibat
approved these changes
Aug 28, 2025
Contributor
There was a problem hiding this comment.
LGTM.
For the future, as a proposal, we could refactor in OOP, here are some organization ideas:
A parent class TestColumnType
- with a class method using
__subclasses__()to get all the childs - with a class method to get all the parents
Something like :
class TestColumnType:
...
@classmethod
def get_all_parents(cls):
return cls.__bases__ # Or we could return a sorted dict for parents tree?
@classmethod
def get_all_childs(cls):
return set(cls.__subclasses__()).union(
[s for c in cls.__subclasses__() for s in c.get_all_childs()])
def value_is(val: Any) -> bool:
return isinstance(val, str)And children classes like:
class TestColumnInt(TestColumnType):
def value_is(val: Any) -> bool:
return isinstance(val, int)
class TestColumnCodePostal(TestColumnInt):
def value_is(val: Any) -> bool:
return _code_postal.is_valid(val)
def header_is(header: str) -> float:
words_combinations_list = [
"code postal",
"postal code",
"postcode",
"post code",
"cp",
"codes postaux",
"location postcode",
]
return header_score(header, words_combinations_list)All child classes have:
- a method
value_is(value: Any) -> bool - a method
header_is(header: str) -> float
Collaborator
Author
|
Agreed that a clean refactor using classes should be done in an upcoming PR, let's brainstorm on that in the coming weeks :) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Instead of testing all formats for all columns, we introduce a parent-child relationship between tests, for instance:
float>latitude_wgs>latitude_wgs_fr_metropole, with thePARENTattribute in the children's__init__.pyfiles. This is then used as follows:This way if a column is
latitude_wgs_fr_metropolewe only perform a float test on the full column once (instead of three times).This is not perfect: setting the child's score as the parent score can be misleading (there are many float numbers that are invalid latitudes), but it cuts much time and processing. A safer approach could be to start with parent tests, and only perform the children if the parent is successful