A small, dependency-light CLI tool that derives SPARQL endpoint candidates for Open Data portals and optionally verifies them via real SPARQL protocol checks.
It is designed for research-grade exports, for example when you want a reproducible list of portals with validated SPARQL endpoints (instead of “/sparql” guesses that lead to HTML pages, redirects, or 404s).
For each portal record, SODPEST can:
- Collect explicit endpoints (if present in the input under
sparql.endpoint,sparql.url, orsparql.details) - Guess common endpoint paths by appending 16 well-known SPARQL endpoint suffixes to the portal base URL
- Optionally verify candidates over the network
ASK {}via HTTP GET (?query=...)ASK {}via HTTP POST (application/x-www-form-urlencoded)- SPARQL Service Description (RDF plus
sd:markers)
Candidates are always tagged with their origin:
source: "explicit"for curated input fieldssource: "guessed"for derived URL patterns
ASK {} is a minimal query that does not depend on specific data content and is typically accepted by SPARQL endpoints. This keeps the validation fast and robust across heterogeneous portals.
- Node.js 18+ (global
fetchis available) - No npm dependencies required
filter-sparql-portals.mjs(CLI script)input.json(default input file name)sparql_portals.json(default output file name)
Accepted input roots:
- An array of portal objects
- Or an object containing an array under one of:
openDataPortalsportalsitems
Minimal example:
{
"openDataPortals": [
{
"url": "https://opendata.example.org",
"inCountryEn": "Germany"
}
]
}Optional explicit SPARQL fields (any of these will be treated as source: "explicit" if they are valid HTTP(S) URLs):
{
"sparql": {
"endpoint": "https://opendata.example.org/sparql",
"url": "https://opendata.example.org/sparql",
"details": "https://opendata.example.org/sparql"
}
}Clone the repo and run directly with Node.js:
node filter-sparql-portals.mjsGeneral form:
node filter-sparql-portals.mjs <input.json> [output.json] [--check] [--strict] [--country Germany] [--timeout 8000] [--concurrency 10]Reads input.json, writes sparql_portals.json:
node filter-sparql-portals.mjsExports portals that have at least one candidate and selects a preferred endpoint (explicit preferred, otherwise first guess).
node filter-sparql-portals.mjs input.json sparql_portals_candidates.jsonOutput includes:
sparqlEndpointsparqlGuessedsparqlCandidates(withsourcetags)
Validates each candidate using ASK {} (GET, then POST) and falls back to Service Description checks.
node filter-sparql-portals.mjs input.json sparql_portals_checked.json --checkAdditional output fields include:
sparqlVerified: truesparqlVerifiedBy: "ask_get" | "ask_post" | "service_description"sparqlEndpointsVerifiedsparqlEndpointsVerifiedMeta
Strict mode requires that a portal has at least one explicit SPARQL URL in the input.
- Without
--check: portals are only exported if they contain at least one explicit candidate. - With
--check: portals are only exported if they contain at least one explicit candidate (verification still runs across all candidates, and an explicit verified endpoint is preferred if available).
node filter-sparql-portals.mjs input.json sparql_portals_strict.json --check --strictnode filter-sparql-portals.mjs input.json sparql_portals.json --check --country Germany-
--check
Enables network validation of endpoint candidates. -
--strict
Only export portals that have at least one explicit SPARQL URL in the input (curated metadata). -
--timeout <ms>
Per-request timeout in milliseconds (default:8000). -
--concurrency <n>
Number of parallel validation workers (default:10). -
--country <CountryNameEn>
Filters portals by exact case-insensitive match oninCountryEn(example:Germany).
If the portal has a base URL (e.g., https://example.org), SODPEST appends these suffixes as source: "guessed":
/sparql/sparql//sparql-endpoint/sparqlendpoint/sparqlEndpoint/sparql/endpoint/sparql/query/endpoint/sparql/api/sparql/rdf/sparql/query/query/sparql/virtuoso/sparql/blazegraph/sparql/bigdata/sparql/fuseki/sparql
A candidate is considered valid if any of the following succeeds:
-
ASK via GET
- Sends
?query=ASK {} - Rejects HTML responses
- Accepts JSON/XML SPARQL results (also tolerates generic JSON/XML if it contains a boolean ASK result)
- Sends
-
ASK via POST
- Sends
query=ASK {}asapplication/x-www-form-urlencoded - Same response checks as GET
- Sends
-
Service Description
- Requests RDF (Turtle, RDF/XML, JSON-LD, N-Triples, N-Quads, TriG, N3)
- Requires RDF content type and markers like:
http://www.w3.org/ns/sparql-service-description#sd:Service,sd:endpointvoid:sparqlEndpoint
Two modes:
Exports portals with candidates and chooses a preferred endpoint:
- Prefer
source: "explicit" - Otherwise first guessed candidate
Exports portals with at least one verified candidate:
- Prefer verified
source: "explicit" - Otherwise first verified candidate
Network validation is inherently time-dependent:
- endpoints can rate-limit, go down temporarily, or change behavior
- results can differ between runs and dates
For research exports, store alongside your output:
- run date and timezone
--timeoutand--concurrency- input dataset version or hash
- Some SPARQL services require authentication or custom headers and will be reported as not verified.
- Some endpoints only support specific SPARQL result formats; the validator focuses on common JSON/XML ASK patterns and RDF Service Description.
- Verified endpoints list order may vary across runs due to concurrency and response timing.
- SPARQL 1.1 Protocol: https://www.w3.org/TR/sparql11-protocol/
- SPARQL 1.1 Service Description: https://www.w3.org/TR/sparql11-service-description/
- SPARQL Service Description namespace: http://www.w3.org/ns/sparql-service-description#
@misc{SODPEST2026,
title = {SODPEST - SPARQL Open Data Portal Endpoint SPARQL Tester},
author = {Florian Hahn in the SODIC Research Group},
year = {2026},
howpublished = {\url{https://github.com/SOIDC-research/SODPEST}},
note = {Accessed: January 01, 2026}
}
Released under the MIT License. See LICENSE for details.
Florian Hahn
SODIC Research Group, TU Chemnitz
Website — Contact: florian.hahn@informatik.tu-chemnitz.de