-
Notifications
You must be signed in to change notification settings - Fork 11
RDF2RDFStar: Jena throws subject cannot be null on partial reification statements #25
Description
I am working with split files [1][2] and this is really a showstopper:
Exception
java.lang.UnsupportedOperationException: subject cannot be null
at org.apache.jena.graph.Triple.<init>(Triple.java:44)
at se.liu.ida.rdfstar.tools.conversion.RDF2RDFStar.printTriples(RDF2RDFStar.java:167)
at se.liu.ida.rdfstar.tools.conversion.RDF2RDFStar.convert(RDF2RDFStar.java:95)
at se.liu.ida.rdfstar.tools.ConverterRDF2RDFStar.exec(ConverterRDF2RDFStar.java:165)
at jena.cmd.CmdMain.mainMethod(CmdMain.java:93)
at jena.cmd.CmdMain.mainRun(CmdMain.java:58)
at jena.cmd.CmdMain.mainRun(CmdMain.java:45)
at se.liu.ida.rdfstar.tools.ConverterRDF2RDFStar.main(ConverterRDF2RDFStar.java:43)
Reproduce with:
_:bnode001 <http://www.w3.org/1999/02/22-rdf-syntax-ns#object> <http://foobar.com/classes#Type> .
_:bnode001 <http://foobar.com/properties#Property> <http://foobar.com/values#Value1> .
But there is seemingly no problem with:
_:bnode001 <http://www.w3.org/1999/02/22-rdf-syntax-ns#object> <http://foobar.com/classes#Type> .
_:bnode002 <http://foobar.com/properties#Property> <http://foobar.com/values#Value1> .
where bnode001 is simply ignored (why choke in the first scenario and not second?).
I don't know if this relates to #18 and #17 as that's for the inverse conversion.
A bit of direction on how to go about solving this the right way would be good. Is there a similar flag as in #18 ? Or is there another pattern we can use other than reification?
Anyway, thanks for your awesome work in this space @hartig!
[1] In N-Triples as the syntax preserves a triple statement boundary, as opposed to Turtle where breaking up parts of a statement with ; and , and even arbitrary line breaks, is common or de facto practice.
[2] Why I'm working with split files is another story and perhaps another issue to be reported, pending some more debugging. Some big (1-3GB) .NT and .TTL files (derived from smaller raw CSV data 1-300MB) are getting aborted midway in certain situations (I lost the stack trace as of this moment). (edit: this seemingly solves itself if you have a lot more RAM/IOPS/CPU than is necessary to read each file, but an issue wrt unexpectedly large files has been reported nonetheless in #26)