Skip to content
This repository was archived by the owner on Sep 28, 2022. It is now read-only.
This repository was archived by the owner on Sep 28, 2022. It is now read-only.

RDF2RDFStar: Jena throws subject cannot be null on partial reification statements #25

@schivmeister

Description

@schivmeister

I am working with split files [1][2] and this is really a showstopper:

Exception
java.lang.UnsupportedOperationException: subject cannot be null
        at org.apache.jena.graph.Triple.<init>(Triple.java:44)
        at se.liu.ida.rdfstar.tools.conversion.RDF2RDFStar.printTriples(RDF2RDFStar.java:167)
        at se.liu.ida.rdfstar.tools.conversion.RDF2RDFStar.convert(RDF2RDFStar.java:95)
        at se.liu.ida.rdfstar.tools.ConverterRDF2RDFStar.exec(ConverterRDF2RDFStar.java:165)
        at jena.cmd.CmdMain.mainMethod(CmdMain.java:93)
        at jena.cmd.CmdMain.mainRun(CmdMain.java:58)
        at jena.cmd.CmdMain.mainRun(CmdMain.java:45)
        at se.liu.ida.rdfstar.tools.ConverterRDF2RDFStar.main(ConverterRDF2RDFStar.java:43)

Reproduce with:

_:bnode001 <http://www.w3.org/1999/02/22-rdf-syntax-ns#object> <http://foobar.com/classes#Type> .

_:bnode001 <http://foobar.com/properties#Property> <http://foobar.com/values#Value1> .

But there is seemingly no problem with:

_:bnode001 <http://www.w3.org/1999/02/22-rdf-syntax-ns#object> <http://foobar.com/classes#Type> .

_:bnode002 <http://foobar.com/properties#Property> <http://foobar.com/values#Value1> .

where bnode001 is simply ignored (why choke in the first scenario and not second?).

I don't know if this relates to #18 and #17 as that's for the inverse conversion.

A bit of direction on how to go about solving this the right way would be good. Is there a similar flag as in #18 ? Or is there another pattern we can use other than reification?

Anyway, thanks for your awesome work in this space @hartig!

[1] In N-Triples as the syntax preserves a triple statement boundary, as opposed to Turtle where breaking up parts of a statement with ; and , and even arbitrary line breaks, is common or de facto practice.

[2] Why I'm working with split files is another story and perhaps another issue to be reported, pending some more debugging. Some big (1-3GB) .NT and .TTL files (derived from smaller raw CSV data 1-300MB) are getting aborted midway in certain situations (I lost the stack trace as of this moment). (edit: this seemingly solves itself if you have a lot more RAM/IOPS/CPU than is necessary to read each file, but an issue wrt unexpectedly large files has been reported nonetheless in #26)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions