STRONG AYA's RDF Vantage6 tools

Purpose of this repository

This repository contains resource description framework (RDF) functionalities and tools for the STRONG AYA project. They are designed to be used with the Vantage6 framework for federated analytics and learning and are intended to facilitate and simplify the development of Vantage6 algorithms. The SPARQL queries and RDF functionalities are designed to be used in conjunction with the Flyover and Triplifier tools.

The code in this repository is available as a Python library here on GitHub or through direct reference with pip.

Structure of the repository

The various functions are organised in different sections, consisting of:

RDF Data Collection: Functions to formulate and execute a SPARQL query on an RDF/SPARQL endpoint;
Data Processing: Functions to process the output of an RDF/SPARQL endpoint (e.g. determine missing values, extract associated subclasses);
Query Templates: SPARQL query templates that the SPARQL data collection section uses

Usage

The library provides functions that can be included in a Vantage6 algorithm as the algorithm developer sees fit. The functions are designed to be modular and can be used independently or in combination with other functions.

The library can be included in your Vantage6 algorithm by listing it in the requirements.txt and setup.py file of your algorithm.

Including the library in your Vantage6 algorithm

For the requirements.txt file, you can add the following line to the file:

git+https://github.com/STRONGAYA/v6-tools-rdf.git@v1.0.1

For the setup.py file, you can add the following line to the install_requires list:

        "vantage6-strongaya-rdf @ git+https://github.com/STRONGAYA/v6-tools-rdf.git@v1.0.1",

The algorithm's setup.py, particularly the install_requirements, section file should then look something like this:

from os import path
from codecs import open
from setuptools import setup, find_packages

# We are using a README.md, if you do not have this in your folder, simply replace this with a string.
here = path.abspath(path.dirname(__file__))
with open(path.join(here, 'README.md'), encoding='utf-8') as f:
    long_description = f.read()
setup(
    name='v6-not-an-actual-algorithm',
    version="1.0.1",
    description='Fictive Vantage6 algorithm that performs general statistics computation.',
    long_description=long_description,
    long_description_content_type='text/markdown',
    url='https://github.com/STRONGAYA/v6-not-an-actual-algorithm',
    packages=find_packages(),
    python_requires='>=3.10',
    install_requires=[
        'vantage6-algorithm-tools',
        'numpy',
        'pandas',
        "vantage6-strongaya-rdf @ git+https://github.com/STRONGAYA/v6-tools-rdf.git@v1.0.1"
        # other dependencies
    ]
)

Central (aggregating) example

The functions included in this library focus on extracting RDF data from a SPARQL endpoint. It is not recommended to use these functions in the central (aggregating) section of a Vantage6 algorithm.

Node or local (participating) example

Example usage of the SPARQL data collection function in a node (participating) section of a Vantage6 algorithm:

# General federated algorithm functions
from vantage6_strongaya_general.miscellaneous import safe_log
from vantage6_strongaya_rdf.collect_sparql_data import collect_sparql_data


def partial_general_statistics(variables_to_analyse: dict) -> dict:
    """
    Execute the partial algorithm for some modelling using RDF data.

    Args:
        variables_to_analyse (list): List of variables to analyse.

    Returns:
        dict: A dictionary containing the computed general statistics.
    """
    safe_log("info", "Executing partial algorithm for some modelling using RDF data.")

    # Set datatypes for each variable
    df = collect_sparql_data(variables_to_analyse, query_type="single_column",
                             endpoint="http://localhost:7200/repositories/userRepo",
                             )

    # Ensure that the desired privacy measures are applied

    # Do some modelling of the data

    return result

The various functions are available through pip install for debugging and testing purposes. The library can be installed as follows:

pip install git+https://github.com/STRONGAYA/v6-tools-rdf.git

Testing

This repository includes a comprehensive testing framework to ensure the reliability and correctness of all functions, especially in whether RDF-data is queryable when the library is run as a Docker container within a Vantage6 node.

Test Structure

tests/
├── conftest.py                           # Common fixtures and test utilities
├── unit/                                 # Unit tests for individual functions
│   ├── test_library_functions.py         # Tests for library functions
├── integration/                          # Integration tests
│   └── test_vantage6_integration.py      # Data stratification workflows
│   └── test_rdf_algorithm_integration.py # Vantage6 algorithm integration tests
├── mock_algorithm/                       # Mock Vantage6 algorithm to be used for Vantage6 integration testing
│   └── ...                               
└── data/                                 # Test data and configurations
    └── additional_vantage6_*_config.yaml # Additional Vantage6 component configurations
    └── *.ttl                             # Triplified datasets for testing
    └── rdf_store.csv                     # RDF-store reference for the Vantage6 node

Running Tests

Prerequisites

Install test dependencies:

pip install pytest pytest-mock hypothesis faker

Basic Test Execution

# Run all tests
pytest

# Run unit tests only
pytest tests/unit/

# Run integration tests only
pytest tests/integration/

# Run specific test module
pytest tests/unit/test_library_functions.py

# Run with verbose output
pytest -v

Test Categories

Unit Tests: Test individual functions in isolation
Integration Tests: Test complete workflows and component interactions (whether data can be queried from the RDF-store in a Vantage6 node)
Edge Case Tests: Test behaviour with unusual data inputs

Test Data

The test suite uses a synthetic dataset that was triplified using the Triplifier tool.

Continuous Integration

Tests run automatically on every push and pull request via GitHub Actions:

Multiple Python and Vantage6 versions (starting with Python 3.10 and Vantage6 4.11 and 4.12)
Code coverage reporting
Performance benchmarking
Security scanning

Contributing to Tests

When contributing new functionality:

Add unit tests for all new functions
Add integration tests for complete workflows
Include edge case testing for robustness
Ensure new query templates have corresponding tests
Update test data if needed for new scenarios; ensure that this is triplified.
Ensure that the mock algorithm in tests/mock_algorithm covers the new functionality

Test Guidelines

Use descriptive test names that explain what is being tested
Include both positive and negative test cases and scenarios
Test edge cases and error conditions
Use realistic synthetic data
Validate both structure and values of results

Contributors

J. Hogenboom
V. Gouthamchand

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.github/workflows		.github/workflows
src/vantage6_strongaya_rdf		src/vantage6_strongaya_rdf
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

STRONG AYA's RDF Vantage6 tools

Purpose of this repository

Structure of the repository

Usage

Including the library in your Vantage6 algorithm

Central (aggregating) example

Node or local (participating) example

Testing

Test Structure

Running Tests

Prerequisites

Basic Test Execution

Test Categories

Test Data

Continuous Integration

Contributing to Tests

Test Guidelines

Contributors

References

About

Uh oh!

Releases 7

Uh oh!

Contributors 3

Uh oh!

Languages

License

STRONGAYA/v6-tools-rdf

Folders and files

Latest commit

History

Repository files navigation

STRONG AYA's RDF Vantage6 tools

Purpose of this repository

Structure of the repository

Usage

Including the library in your Vantage6 algorithm

Central (aggregating) example

Node or local (participating) example

Testing

Test Structure

Running Tests

Prerequisites

Basic Test Execution

Test Categories

Test Data

Continuous Integration

Contributing to Tests

Test Guidelines

Contributors

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 7

Uh oh!

Contributors 3

Uh oh!

Languages