Return matching raw values for multi-valued luc:query fields#3823
Closed
recalcitrantsupplant wants to merge 64 commits intoapache:mainfrom
Closed
Return matching raw values for multi-valued luc:query fields#3823recalcitrantsupplant wants to merge 64 commits intoapache:mainfrom
recalcitrantsupplant wants to merge 64 commits intoapache:mainfrom
Conversation
…d comprehensive tests - Add FacetValue.java: Immutable class representing facet value/count pairs - Add FacetedTextResults.java: Container for search results with faceting data - Add TestFacetedResults.java: Comprehensive test suite (6 test methods) - Add faceting_methods.txt: Implementation for queryWithFacets$ method This adds faceting capability to Lucene text indexing in Apache Jena. Code is production-ready but requires build fixes to run tests.
…o jena-text module - Deleted BUILD_FIX_GUIDE.md, PROJECT_STATUS.md, and PROJECT_TESTING.md as they are no longer needed. - Added dependency for lucene-facet in pom.xml to enable native faceting capabilities. - Enhanced TextIndexConfig and TextIndexLucene classes to support faceting, including methods for retrieving facet counts. - Updated TextQuery to register a new property function for facet counts. This commit streamlines the documentation and integrates faceting functionality into the jena-text module.
…xt module - Updated FEAT_FACETS_OUTPUT.md to reflect the new test results and added details about filtered facets. - Enhanced FEAT_FACETS_SPEC.md to include specifications for filtered facets in the text:facetCounts function. - Revised FEAT_FACETS_TESTING.md to add tests for filtered facets, including examples for multi-word and single-word queries. - Modified TextFacetCountsPF.java and TextIndexLucene.java to support detection of search queries in facet counts. - Updated test cases in TestTextFacetCountsPF.java to validate the new filtered facets functionality. This commit improves the documentation and testing framework for the newly implemented filtered facets feature, ensuring clarity and comprehensive coverage.
- Replaced the `text:queryWithFacets` and `text:facetCounts` functions with `text:query` and `text:facet`, streamlining the API for clarity and usability. - Updated documentation in FEAT_FACETS_SPEC.md and FEAT_FACETS_TESTING.md to reflect the new API structure and usage examples. - Improved the implementation in TextIndexLucene and related classes to support the new faceting methods. - Removed obsolete classes and methods related to the previous faceting implementation. This commit enhances the faceting capabilities in the jena-text module, ensuring a more intuitive API and comprehensive documentation.
- Introduced new methods for faceting in the jena-text module, replacing outdated functions for better clarity. - Updated FEAT_FACETS_SPEC.md and FEAT_FACETS_TESTING.md to include new usage examples and specifications. - Improved implementation in TextIndexLucene to align with the updated API. - Removed deprecated classes and methods to streamline the codebase. This commit refines the faceting capabilities, ensuring a more intuitive API and comprehensive documentation for users.
…mentation - Introduced a new section in PHASE2_DESIGN.md detailing how to construct JSON filter arguments dynamically with Jena's Composite Datatype (CDT) extension. - Provided a SPARQL example demonstrating the creation of a JSON filter from `VALUES` clauses, enhancing user understanding of programmatic filter construction. - Clarified that the CDT `FOLD` function is a Jena extension, emphasizing its utility for users seeking to build filters without hardcoding values. This update improves the documentation by offering practical usage patterns for dynamic JSON filter creation in SPARQL queries.
- Deleted outdated documentation files related to previous implementations, including `2026-01-23-david-review.md`, `2026-01-27-david-recommendation.md`, `2026-02-09-next-steps.md`, and others. - Consolidated and updated the faceting API, replacing `text:queryWithFacets` and `text:facetCounts` with `text:query` and `text:facet` for improved clarity and usability. - Enhanced documentation to reflect the new API structure and added examples for the updated faceting functionality. - Ensured that the implementation aligns with the latest design decisions and user requirements. This commit streamlines the documentation and finalizes the faceting API changes, enhancing usability and clarity for users.
David/review
- Revised user guide to clarify the differences between Classic and SHACL modes, including updated examples for `text:query` and `luc:query`. - Enhanced SPARQL API reference to include detailed syntax and examples for `luc:query` and `luc:facet`, reflecting the new functionality. - Improved configuration documentation to outline properties for both indexing modes, emphasizing the use of `text:shapes` and `text:entityMap`. - Added sections on deploying with Fuseki and testing, ensuring comprehensive guidance for users. This commit enhances the clarity and usability of the documentation, aligning it with recent API changes and user needs.
- Updated the Lucene Classic Query Parser documentation link to the latest version. - Corrected the number of pre-existing tests from 327 to 303 in the testing documentation, ensuring accuracy in test coverage reporting. These changes enhance the clarity and accuracy of the documentation, reflecting recent updates and maintaining consistency.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- New docs/08-use-cases.md: building-block format showing each feature with mermaid diagrams and practical application examples - Reorder proposed features: inverse/sequence paths, spatial, then deferrable group (DrillSideways, hierarchical, range, grouping, suggest) - Add inverse/sequence paths to feature status table and timeline - Group deferrable extensions in future work with API impact notes - Fix diagram readability: transparent subgraph backgrounds, theme-aware text Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Explicit instruction to only create issues/PRs on the fork (aiworkerjohns/jena), never on upstream (apache/jena). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Keeps README high-level with component overview and roadmap. Detailed sequence and flowchart diagrams now live in 04-architecture.md where they complement the existing technical content. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
update example
When luc:query and luc:facet appear in the same BGP, SPARQL cross-joins the results (N×M rows). Updated docs to recommend separate queries or UNION, and added a design decision documenting the analysis of handle-based alternatives. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Closes #15. Includes Fuseki SHACL-mode config with three shapes (MiningReport, Borehole, Site), 21 hand-crafted entities, 5 demo queries covering luc:query and luc:facet patterns, a Python synthetic data generator, and a go-task Taskfile for build/serve/load/query workflows. All queries verified end-to-end against a running Fuseki instance. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Quote desc values containing colons to fix YAML parse error - Add Apache License headers to Taskfile.yml and all .rq query files (required by RAT plugin) - Add task stop for clean server shutdown (SIGTERM) - Add git commit style rule to CLAUDE.md
Add Dockerfile, docker-compose.yml, and Taskfile tasks for building and pushing the demo image to GitHub CR or Azure CR. Includes DockerReadme.md documenting the full workflow.
Extend SHACL entity-per-document indexing to support complex SHACL property paths (sequence, inverse, alternative) in addition to simple predicate paths. This enables indexing values that are reachable via multi-hop traversals or reverse relationships in the RDF graph. Implementation: - ShaclIndexAssembler: parse full SHACL path syntax into jena-arq Path objects, extract leaf predicates for change listener compatibility - ShaclTextDocProducer: use PathEval for complex paths, keep fast direct triple match for simple predicates - ShaclIndexMapping: add Path field to FieldDef with backward-compatible constructor Demo: add authors with sequence path (ex:authoredBy/ex:name) and inverse path (^ex:authored) fields, new queries, and README with expected results. Tests: 8 new tests (6 in TestShaclPathSupport, 2 in TestShaclAssembler). Closes #7 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Quote desc values containing colons to fix YAML parse error - Add Apache License headers to Taskfile.yml and all .rq query files (required by RAT plugin) - Add task stop for clean server shutdown (SIGTERM) - Update README with stop step Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Merge DockerReadme.md content into README.md covering image build, GHCR/ACR push, and docker compose workflows. Add missing Apache license header to .dockerignore.
- Add stats.html page with live dataset statistics from Fuseki - Add project-overview/ interactive presentation (slide deck) - Move mining test data into demo/test/ subfolder - Add drillhole config (GSWA geochemistry shapes) - Update Taskfile with drillhole serve/load and test-* tasks - Add Stats nav tab to all pages - Update .gitignore for new folder structure
New fuseki-loader image for bulk data operations: loads N-Quads/Turtle/NT files into TDB2 via tdb2.tdbloader, then builds the SHACL Lucene index via shacltextindexer. Configured via environment variables and volume mounts, sharing volumes with the fuseki-ai server image.
Loader entrypoint now supports MODE=all|load|index to run steps independently. Added Apache license headers to loader files. Documented GHCR auth account requirement in CLAUDE.md.
Add SHACL bulk reindexer and offline loader Docker image
Replace the flat Map<String, List<String>> filter format in SHACL mode with OGC CQL2-JSON as the single filter contract. This enables arbitrary boolean trees (and/or/not), comparisons, set membership (in), range queries (between), pattern matching (like), and spatial stubs. Key changes: - CQL AST model (sealed interface with records) and JSON parser - CQL-to-Lucene compiler with pushdown/residual split - TextIndexRegistry for named multi-index support - CompositeTextDocProducer for multi-index change routing - SortSpec/SortSpecParser for sort pushdown to Lucene - luc:query and luc:facet now require indexId as first argument - SearchExecution updated for CQL filter + sort + indexId in cache key - TextDatasetAssembler supports text:indexes (RDF list) config - Demo queries and app updated for new CQL2-JSON syntax Closes #22, closes #24. Progress on #21, #8 (spatial stub only). 430 tests pass (0 failures).
Add text:TextIndexShacl RDF type with dedicated ShaclTextIndexAssembler, removing SHACL branching from TextIndexLuceneAssembler. Each assembler now handles exactly one mode — classic entityMap or SHACL shapes.
Move all SHACL entity-per-document logic (faceting, CQL filters, sort pushdown, document building) into ShaclTextIndexLucene subclass. TextIndexLucene now contains only classic triple-per-document code with zero isShaclMode() branching.
Reorganise demo task runners so each dataset has consistent command names (serve, load, app, clean) in its own Taskfile: - demo/Taskfile.yml → mining dataset (port 3031) + shared infra - demo/drillhole/Taskfile.yml → drillhole dataset (port 3030)
… entities Backend: Add LatLonField type to SHACL index with LatLonShape triangle indexing. CqlToLuceneCompiler handles s_intersects for both bbox and GeoJSON Polygon geometries. WKT parsing supports EPSG:4326 and CRS84 axis orders. Frontend: Add interactive bbox and polygon drawing tools on the map panel. Polygon draw uses click-to-place vertices with Done button. Both spatial filters integrate with CQL2-JSON query pipeline. Demo: Scale generate.py to produce 500 entities across 14 weighted Australian mining regions with realistic coordinates, commodity distributions, and mixed point/polygon geometries.
CQL2-JSON filters, multi-index registry, and sort pushdown
Data-driven from tests.json — 34 test cases across 11 groups covering full-text search, faceted filtering, spatial bbox, and combined queries. Generates a Markdown report with embedded screenshots.
Add spatial filtering with bbox and polygon support
* Return typed values and field IRIs from luc:query and luc:facet (#31, #34) - ?field binding returns a URI (auto-generated urn:jena:lucene:index#field/{name} for blank node fields, or the config resource IRI for named fields) - ?literal binding populated from Lucene stored values with type determined by field type: KEYWORD with URI values → IRI node, TEXT → string literal, numeric → typed literal. Non-URI KEYWORD values fall back to string literals. - luc:facet ?value follows the same type mapping - Demo data migrated from string literals to IRIs for KEYWORD fields (commodity, state, operator, status) - Demo app updated to handle URI-typed facet values and field bindings * Fix invalid Turtle syntax in demo data and add parsing tests Use dedicated prefixes (commodity:, state:, operator:, status:) instead of slashes in prefixed local names (ex:commodity/Copper) which is invalid Turtle syntax. Add TestDemoDataParsing to validate demo data files parse correctly as a regression guard. * Change field IRI prefix to urn:jena:lucene:field#{name} The previous prefix urn:jena:lucene:index#field/{name} caused shortName() extraction to return "field/{name}" instead of just "{name}" because # is parsed before /. The new scheme allows standard URI local-name extraction to work correctly. * Add minResults assertions to demo test cases Update tests.json with expected minimum result counts for each test case and IRI-encoded facet values. Playwright spec now asserts resultCount >= minResults when specified. * Fix race condition in search and improve error handling Add AbortController to cancel in-flight SPARQL requests when a new search starts, preventing stale responses from causing errors during rapid facet toggling. Fix error catch to only show "Cannot connect" for actual network errors, not TypeErrors from JS code bugs. * Use CQL filter param for URL state instead of individual facet params Replace per-field URL params (commodity=X&state=Y&bbox=...) with a single ?filter= param containing the CQL2-JSON string. Simplifies URL handling — buildCqlFilter serializes state to URL, parseCqlFilter deserializes back. Also fix WKT map marker parsing (was passing property value object instead of raw string) and remove debug logging. * Fix Leaflet fitBounds animation error on hidden map Disable animation on fitBounds to prevent TypeError when the map container is not fully rendered during search updates. * Add SPARQL editor, CQL viewer, named field IRIs, and UI improvements - SPARQL editor popup: click log entries to open, editable endpoint, table/JSON result views, drag/resize, click-outside-to-close - CQL viewer popup: collapsible JSON tree with expand/collapse all, object/raw toggle - Named field IRIs: config uses named resources instead of blank nodes, field IRI column added to config page, raw TTL view toggle - entityType field changed from sequence path (rdf:type rdfs:label) to direct rdf:type path for IRI-based faceting - Spatial overlays set to interactive:false so map markers stay clickable - Log renamed from "SPARQL Log" to "Log", CQL filter entries added - Active pill color changed to steel blue - Test dropdown URLs properly percent-encoded - Screenshot baselines updated for all UI changes * Hide non-facetable literal values from result card pills Skip properties that are not facet-mapped and contain only literal values (e.g., depth, year) from rendering as tag pills on result cards. * Fix stats page facet breakdown with named field IRIs Extract resolveFieldName as shared utility function and use it in statsApp to resolve field URIs back to field names. * Show short names for facet values on stats page * Add short/full name toggle on stats page facet breakdown * Rename stats toggle buttons to Short Name / Full Name * Use field IRIs instead of field names in luc:query and luc:facet APIs Resolve field IRIs to Lucene field names in ShaclTextIndexLucene so that queries and facet requests can use IRIs as field identifiers. ShaclIndexMapping.findField() now accepts both field names and IRIs, matching by local name for IRI lookups. * Update docs for field IRIs, CQL2-JSON filters, spatial, and paths - Move inverse/sequence paths and spatial filtering from Proposed to Done - Update all filter examples from old JSON format to CQL2-JSON syntax - Add named field resource config examples showing field IRI preservation - Add sequence and inverse path examples to configuration reference - Add LatLonField to field types table with link to spatial docs - Add forward chaining replacement section to use cases - Add field IRI and spatial entries to feature status table and roadmap - Update Classic vs SHACL comparison table * Remove classic mode references from documentation Focus docs on SHACL mode only. Classic mode (text:entityMap / text:query) is upstream Jena — a brief note points readers to Apache Jena docs. Removes duplicate config examples, comparison tables, and dual-mode framing that added noise without value. * Enforce field IRIs as sole identifier in SPARQL APIs and docs findField() now matches exact field IRIs only; findFieldByName() added for internal Lucene field name lookups. All external-facing APIs (luc:query field specs, luc:facet facet arrays, CQL2-JSON property values, sort specs) require field IRIs. Demo config migrated to absolute PREFIX field: <urn:jena:lucene:field#>. All documentation updated to use field IRIs consistently.
* refactor(demo): unify on port 3030 and simplify Docker workflow - Update Taskfile and config to use port 3030 for all mining tasks - Switch Docker build to multi-stage in-container Maven builds - Add docker-start-ghcr and docker-serve-ghcr for remote images - Improve Taskfile dependency management with task dependencies - Update documentation and add .dockerignore for cleaner builds * Add GitHub Actions workflow for Docker image builds Builds and pushes fuseki-ai and fuseki-loader images to GHCR on push to main and on manual dispatch. Builds multi-arch (amd64 + arm64) with GHA build cache. Uses GITHUB_TOKEN for auth. * Make loader image extend fuseki-ai instead of duplicating build Loader Dockerfile now uses fuseki-ai as base image and just adds entrypoint.sh. Eliminates duplicate Maven build — loader-build depends on image-build and layers on top. Updated GH Actions workflow to pass base image args. Updated entrypoint.sh jar path to match fuseki image layout. --------- Co-authored-by: aiworkerjohns <aiworker.johns@gmail.com>
* Support wildcard ["*"] in luc:facet to request all facetable fields When ["*"] is passed as the facet field list, resolveFacetFieldNames() expands it to all configured facet fields, avoiding the need for clients to enumerate every field IRI explicitly. * Add integration tests for demo mining scenarios 22 tests covering multi-entity-type indexing, facet wildcard, CQL filters, sequence paths, and combined query+facet patterns modelled on the demo mining dataset. --------- Co-authored-by: aiworkerjohns <aiworker.johns@gmail.com>
The loader Docker build needs demo/loader/entrypoint.sh but it was excluded by the blanket demo/ ignore rule.
- Make FUSEKI_PORT and APP_PORT configurable (defaults: 3030, 8000) - Add serve_app.py reverse proxy for CORS-free frontend development - Add task refresh: one-command clean/restart/reload/serve workflow - Update task app to use proxy instead of simple HTTP server
* Add per-field query analyzer support (idx:queryAnalyzer) Allow fields to specify separate analyzers for indexing and querying. This enables patterns like edge n-gram indexing with keyword querying for prefix/typeahead search on identifier fields. * Add EdgeNGramAnalyzer and demo identifier field with prefix search - Add text:EdgeNGramAnalyzer assembler type for prefix/typeahead indexing - Add ex:identifier field to demo config with idx:queryAnalyzer example - Add identifier data to all mining.ttl entities (sites, boreholes, reports) - Add assembler test verifying idx:queryAnalyzer wiring * Improve identifier prefix search demo * Fix review issues and add coverage tests - Fix LowerCaseFilter import (Lucene 10 compatibility) - Add log.warn when global text:queryAnalyzer shadows per-field overrides - Revert default ports to 3030/8000 (keep parameterization) - Add 4 tests: normal field alongside prefix field, field isolation, FieldDef without queryAnalyzer fallback, EntityDefinition wiring --------- Co-authored-by: aiworkerjohns <aiworker.johns@gmail.com>
Author
|
apologies, wrong branch again |
Member
|
Would you mind unlinking your repo so that it isn't a fork? You can still have Jena as an upstream repo and pull changes. |
Author
|
the repo has been unlinked now |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
luc:queryraw-value binding for multi-valued SHACL fieldsTesting
TestShaclLucQueryRawValueOnMultiValuedFielddemo/test/queries/09-matchraw-multivalue.rq