Skip to content

Commit 71dce2c

Browse files
merge: Merge branch 'main' of https://github.com/NHSDigital/data-validation-engine into feature/gr-dep008-913_uplift_python_version
2 parents e973bf9 + adfc216 commit 71dce2c

File tree

75 files changed

+2226
-304
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

75 files changed

+2226
-304
lines changed

CHANGELOG.md

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,18 @@
1-
## 1.0.0 (2025-10-09)
1+
## 0.1.0 (2025-11-10)
2+
3+
### Feat
4+
5+
- Added ability to define custom error codes and templated messages for data contract feedback messages
6+
- Added new JSON readers
7+
- Added SparkCSVReader
8+
- Added PolarsToDuckDBCSVReader and DuckDBCSVRepeatingReader
9+
- Added quotechar option to DuckDBCSVReader
10+
11+
### Fix
12+
13+
- Fixed issues with refdata loader table implementations
14+
- Fixed duckdb try_cast statements in data contract phase
15+
- Allowed use of entity type in file transformation
216

317
### Refactor
418

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ Additionally, if you'd like to contribute a new backend implementation into the
2121

2222
## Installation and usage
2323

24-
The DVE is a Python package and can be installed using `pip`. As of release v1.0.0 we currently only supports Python 3.7, with Spark version 3.2.1 and DuckDB version of 1.1.0. We are currently working on upgrading the DVE to work on Python 3.11+ and this will be made available asap with version 2.0.0 release.
24+
The DVE is a Python package and can be installed using `pip`. As of release v0.1.0 we currently only supports Python 3.7, with Spark version 3.2.1 and DuckDB version of 1.1.0. We are currently working on upgrading the DVE to work on Python 3.11+ and this will be made available asap with version 1.0.0 release.
2525

2626
In addition to a working Python 3.7+ installation you will need OpenJDK 11 installed if you're planning to use the Spark backend implementation.
2727

@@ -30,7 +30,7 @@ Python dependencies are listed in `pyproject.toml`.
3030
To install the DVE package you can simply install using a package manager such as [pip](https://pypi.org/project/pip/).
3131

3232
```
33-
pip install git+https://github.com/NHSDigital/data-validation-engine.git@v1.0.0
33+
pip install git+https://github.com/NHSDigital/data-validation-engine.git@v0.1.0
3434
```
3535

3636
Once you have installed the DVE you are ready to use it. For guidance on how to create your dischema json document (configuration), please read the [documentation](./docs/).
@@ -48,8 +48,8 @@ If you have feature request then please follow the same process whilst using the
4848
Below is a list of features that we would like to implement or have been requested.
4949
| Feature | Release Version | Released? |
5050
| ------- | --------------- | --------- |
51-
| Open source release | 1.0.0 | Yes |
52-
| Uplift to Python 3.11 | 2.0.0 | No |
51+
| Open source release | 0.1.0 | Yes |
52+
| Uplift to Python 3.11 | 1.0.0 | No |
5353
| Upgrade to Pydantic 2.0 | Not yet confirmed | No |
5454
| Create a more user friendly interface for building and modifying dischema files | Not yet confirmed | No |
5555

docs/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ DVE configuration can be instantiated from a json (dischema) file which might be
1818
{
1919
"contract": {
2020
"cache_originals": true,
21-
"contract_error_codes": null,
21+
"error_details": null,
2222
"types": {},
2323
"schemas": {},
2424
"datasets": {

docs/detailed_guidance/data_contract.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ Lets look at the data contract configuration from [Introduction to DVE](../READM
44
{
55
"contract": {
66
"cache_originals": true,
7-
"contract_error_codes": null,
7+
"error_details": null,
88
"types": {},
99
"schemas": {},
1010
"datasets": {
@@ -78,7 +78,7 @@ Here we have only filled out datasets. We've added a few more fields such as `Pe
7878
{
7979
"contract": {
8080
"cache_originals": true,
81-
"contract_error_codes": null,
81+
"error_details": null,
8282
"types": {
8383
"isodate": {
8484
"description": "an isoformatted date type",
@@ -172,7 +172,7 @@ We can see here that the Activity has a number of fields. `startdate`, `enddate`
172172
{
173173
"contract": {
174174
"cache_originals": true,
175-
"contract_error_codes": null,
175+
"error_details": null,
176176
"types": {
177177
"isodate": {
178178
"description": "an isoformatted date type",
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
{
2+
"$schema": "https://json-schema.org/draft-07/schema",
3+
"$id": "data-ingest:contract/components/contract_error_details.schema.json",
4+
"title": "base_entity",
5+
"description": "A mapping of field names to the custom error code and message required if these fields were to fail validation during the data contract phase. For nested fields, these should be specified using struct '.' notation (eg. fieldA.fieldB.fieldC)",
6+
"type": "object",
7+
"additionalProperties": {
8+
"$ref": "field_error_type.schema.json"
9+
}
10+
}
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
{
2+
"$schema": "https://json-schema.org/draft-07/schema",
3+
"$id": "data-ingest:contract/components/field_error_detail.schema.json",
4+
"title": "field_error_detail",
5+
"description": "The custom details to be used for a field when a validation error is raised during the data contract phase",
6+
"type": "object",
7+
"properties": {
8+
"error_code": {
9+
"description": "The code to be used for the field and error type specified",
10+
"type": "string"
11+
},
12+
"error_message": {
13+
"description": "The message to be used for the field and error type specified. This can include templating (specified using jinja2 conventions). During templating, the full record will be available with an additional __error_value to easily obtain nested offending values.",
14+
"type": "string",
15+
"enum": [
16+
"record_rejection",
17+
"file_rejection",
18+
"warning"
19+
]
20+
}
21+
},
22+
"required": [
23+
"error_code",
24+
"error_message"
25+
],
26+
"additionalProperties": false
27+
}
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
{
2+
"$schema": "https://json-schema.org/draft-07/schema",
3+
"$id": "data-ingest:contract/components/field_error_type.schema.json",
4+
"title": "field_error_detail",
5+
"description": "The error type for a field when a validation error is raised during the data contract phase",
6+
"type": "object",
7+
"properties": {
8+
"error_type": {
9+
"description": "The type of error the details are for",
10+
"type": "string",
11+
"enum": [
12+
"Blank",
13+
"Bad value",
14+
"Wrong format"
15+
],
16+
"additionalProperties": {
17+
"$ref": "field_error_detail.schema.json"
18+
}
19+
}
20+
}
21+
}

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[tool.poetry]
22
name = "nhs_dve"
3-
version = "1.0.0"
3+
version = "0.1.0"
44
description = "`nhs data validation engine` is a framework used to validate data"
55
authors = ["NHS England <england.contactus@nhs.net>"]
66
readme = "README.md"

src/dve/core_engine/backends/base/core.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ def __init__(
7777
raise ValueError(f"Entity name cannot start with 'refdata_', got {entity_name!r}")
7878
self.entities[entity_name] = entity
7979

80-
self.reference_data = reference_data or {}
80+
self.reference_data = reference_data if reference_data is not None else {}
8181
"""The reference data mapping."""
8282

8383
@staticmethod
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
"""Implementation of duckdb backend"""
2+
from dve.core_engine.backends.implementations.duckdb.readers.json import DuckDBJSONReader
3+
from dve.core_engine.backends.readers import register_reader
4+
5+
from .contract import DuckDBDataContract
6+
from .readers import (
7+
DuckDBCSVReader,
8+
DuckDBCSVRepeatingHeaderReader,
9+
DuckDBXMLStreamReader,
10+
PolarsToDuckDBCSVReader,
11+
)
12+
from .reference_data import DuckDBRefDataLoader
13+
from .rules import DuckDBStepImplementations
14+
15+
register_reader(DuckDBCSVReader)
16+
register_reader(DuckDBCSVRepeatingHeaderReader)
17+
register_reader(DuckDBJSONReader)
18+
register_reader(DuckDBXMLStreamReader)
19+
register_reader(PolarsToDuckDBCSVReader)
20+
21+
__all__ = [
22+
"DuckDBDataContract",
23+
"DuckDBRefDataLoader",
24+
"DuckDBStepImplementations",
25+
]

0 commit comments

Comments
 (0)