Skip to content

Commit 8f33717

Browse files
committed
initial commit
0 parents  commit 8f33717

File tree

2 files changed

+77
-0
lines changed

2 files changed

+77
-0
lines changed

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2025 Crown Copyright NHS England.
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# Data Validation Engine
2+
3+
The Data Validation Engine (DVE) is a configuration driven data validation library built and utilised by NHS England.
4+
5+
As mentioned above, the DVE is "configuration driven" which means the majority of development for you as a user will be building a JSON document to describe how the data will be validated. The JSON document is also typically known as a `dischema` file and example files can be accessed [here](./tests/testdata/). If you'd like to learn more about JSON document and how to build one from scratch, then please read the documentation [here](./docs/).
6+
7+
Once a dischema file has been defined, you are ready to use the DVE. The DVE is typically orchestrated based on the four key "services". These are...
8+
9+
| Order | Service | Purpose |
10+
| ----- | ------- | ------- |
11+
| 1. | File Transformation | This service will take ingest submitted files and turn them into stringified parquet files to ensure that a consistent data structure can be passed through the DVE. |
12+
| 2. | Data Contract | This service will validate and cast a stringified parquet submission against a [pyantic model](https://docs.pydantic.dev/latest/). |
13+
| 3. | Business Rules | The business rules service will perform more complex validations such as comparisons between fields and tables, aggregations, filters etc to generate new entities. |
14+
| 4. | Error Reports | The error reports service will take all the errors raised in previous services and surface them into a readable format for a downstream users/service. Currently, this implemented to be an excel spreadsheet but could be reconfigure to meet other requirements/use cases. |
15+
16+
We have more detailed documentation around these services [here](./docs/).
17+
18+
## Installation and usage
19+
20+
The DVE is a Python package and can be installed using `pip`. As of release (version 1+) only supports Python 3.7, with Spark version 3.2.1 and DuckDB version of 1.1.0. We are currently working on upgrading the DVE to work on Python 3.11+ and this will be made available asap with version 2.0.0 release.
21+
22+
In addition to a working Python 3.7+ installation you will need OpenJDK 11 installed.
23+
24+
Python dependencies are listed in `pyproject.toml`.
25+
26+
To install the DVE package you can simply install using a package manager such as [pip](https://pypi.org/project/pip/).
27+
28+
```
29+
pip install https://github.com/nhsengland/Data-Validation-Engine
30+
```
31+
32+
Once you have installed the DVE you are ready to use it. For guidance on how to create your dischema json document (configuration), please read the [documentation](/docs/).
33+
34+
The long term aim is to make the DVE available via PyPi and Conda but we are not quite there yet. Once available this documentation will be updated to reflect the new installation options.
35+
36+
## Requesting new features and raising bug reports
37+
If you have spotted a bug with the DVE then please raise an issue [here](https://github.com/nhsengland/Data-Validation-Engine/issues). Same for any feature requests.
38+
39+
## Upcoming features
40+
Below is a list of features that we would like to implement or have been requested.
41+
| Feature | Release Version | Released? |
42+
| ------- | --------------- | --------- |
43+
| Open source release | 1.0.0 | Yes |
44+
| Uplift to Python 3.11 | 2.0.0 | No |
45+
| Upgrade to Pydantic 2.0 | Not yet confirmed | No |
46+
| Create a more user friendly interface for building and modifying dischema files | Not yet confirmed | No |
47+
48+
Beyond the Python upgrade, we cannot confirm the other features will be made available any time soon. Therefore, if you have the interest and desire to make these features available, then please feel free to read the [contributing section](#contributing) and get involved.
49+
50+
## Contributing
51+
Please see guidance [here](./CONTRIBUTE.md).
52+
53+
## Legal
54+
This codebase is released under the MIT License. This covers both the codebase and any sample code in the documentation.
55+
56+
Any HTML or Markdown documentation is [© Crown copyright](https://www.nationalarchives.gov.uk/information-management/re-using-public-sector-information/uk-government-licensing-framework/crown-copyright/) and available under the terms of the [Open Government 3.0 licence](https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/).

0 commit comments

Comments
 (0)