How to model "pure" information content? #396
Replies: 4 comments
-
|
Toni, thanks for this. It's a great question. The short answer is no, the Entity -> ICE -> IBE -> data value pattern is not intended as being required for every piece of information in a domain but it is the recommended pattern for handling literal values. Using CCO doesn't require that you designate a temporal region with a timestamp. For that matter, the CCO doesn't require that instance level events are related to temporal regions, but if they are so related, then as you describe it would be accomplished by using an instance of a BFO Temporal Region (or one of its subtypes) as the object of this connection: event_x occurs_on temporal_region_y. This might be enough for some applications. For example, you can search for all the events that occur on the day of the moon landing without using a date literal: SELECT ?event If you want to search for the event by date string or use that string as input to a temporal reasoner, then you have to put it somewhere into your knowledge base. Let me try to soften the objections you mention about the options that are provided by CCO (i.e. IBE and is_tokenized_by). It doesn't seem too unrealistic to suggest that days are designated by ICE's which are in turn carried by IBE's. After all if you've chosen a day, then you've chosen a calendar (Gregorian, Julian) and calendars have date IBE's as parts. If the number of "hops" is still bothersome you could create a property chain to shorten their number (temporal_region designated_by ICE generically depends on IBE -> temporal_region identified_by IBE) If none of these are convincing and you still want to use a datatype property, then my advice is to limit yourself to using datatype properties that have a minimum of implicit content. So while the use of has_datetime_value as in the following is OK: person_x participates_in birth_event_x has_datetime_value 1969-07-21 to shorten this further by using has_birthdate is not: person_x has_birthdate 1969-07-21 The reason is that event has become embedded into the relation and can't be linked to other participants or locations making integration to other data problematic. I hope this was helpful and if not, please post a follow up. |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for the response @rorudn it is more clear now to me. Just a couple of questions, when you say
Here I'm thinking more about performance and storage of the triple store so I'm trying to avoid extra storage or query processing when there is no clear benefit. Even if I can define that there is a IBE that holds value But then I keep thinking about it, and I see that there is room for more compaction. In the modelling I'm doing here there isn't any "Information Content", what there is is a specific time on which everyone agrees on what does it mean "July 21st, 1969 according to the Gregorian calendar". So I could have: With this modelling I see two benefits, one is that the entity I understand that by removing indirections I constraint myself to not being able to assert facts about ICEs or IBEs, but in this particular situation, working with "pure" facts decreases storage, makes queries easier to write and understand and alleviates reasoning pressure on the triple store. So long story short, do you see any drawbacks on the last approach I've presented for the modelling of the date at which Armstrong landed in the Moon? |
Beta Was this translation helpful? Give feedback.
-
|
Yes, there are drawbacks. In the end, your approach gives you: Please note that this semantic loss is not specifically due to the decision to not use ICEs or IBEs; rather, it is primarily due to the fact that you: I'm assuming that your plan is to bury some or all of this content in the IRIs of the 2 entities in your triple store. You can certainly do this if you want and you can even make it work for a limited use case; however, it is going to become a nightmare if you attempt to significantly scale up the number of entities you put in your triple store. Based on your concerns about avoiding bloat in the triple store, I'm assuming that you do in fact intend to populate it with a great many such entities -- perhaps on the order of billions of triples. If this is not the case, and you construct your triples well, you should have minimal to no performance issues when using the recommended more robust CCO representations. If, instead, you are working with a huge number of triples (or even just a modest number) and have chosen to implement the bare-bones approach you describe, you are going to have a terrible time getting meaningful content back out. In particular, you will need to rely on string matching to get content out of the entity IRIs (or out of the literals if you have included them). This can certainly be done in SPARQL, but SPARQL only has limited string matching functionality built in and using REGEX in SPARQL will definitely cause performance issues in a large triple store. Perhaps I'm incorrect in my understanding that your proposed representation includes no literals. In this case, you must be using custom data properties to link the literals to the 2 individuals in your example. This approach is taken by many other graph database users and, while it does in your case improve the semantic content and query-ability of your triple store, it has significant limitations. As Ron mentioned previously, it buries semantic content, limits the expressiveness of your representation, and impedes data integration. That being said, the specifics of your use case may allow you to use your proposed minimalist representation schema without any drawbacks, provided of course that your use case does not change. |
Beta Was this translation helpful? Give feedback.
-
|
Actually I plan to use data properties but when I was presenting the example I noted or if I would be using The actual store of data properties doesn't matter for the question here. So no, no string matching on the IRIs since that would be worse for performance. Maybe the word "pure" was wrongly picked here. Maybe a longer description would be "concepts in a domain for which everyone in that domain agrees on the unique entity it points to". I would say concepts that could be considered "closed world" in a domain, people could only say one thing about them. In the example I'm presenting here For your concern on:
I do, the event itself is because Just as a side note, it would be interesting whether as part of the CCO project a complete dataset using CCO from a complex domain could be published to see how others are fitting data to the conceptualisation provided by CCO. |
Beta Was this translation helpful? Give feedback.

Uh oh!
There was an error while loading. Please reload this page.
-
This is a general modelling question and I wasn't able to find a proper Slack channel or discussion forum for CCO, so sorry for hijacking.
I understand that one of the goals of CCO is integrating several datasources into a common ontology but what happens when we are creating data from "within" CCO that hasn't any provenance attached. Let me illustrate with an example. Say we are modelling the landing on the moon event:
so for me the individual
:1969-07-21is a member of classcco:Dayand we all agree on that. But then if I need to access associated data I have several alternatives:cco:InformationContentEntityand then acco:InformationBearingEntityand finally use acco:has_datetime_value. But for me it is unnatural to have acco:InforamationBearingEntitybecause in my domain there isn't any material object bearing that information. It is a pure fact.cco:is_tokenized_byto the Information Content Entity but since this is an annotation I loose the fact that this is a datetime and I cannot reason about it. And I still have the problem of two individuals (the instance ofcco:Dayand the instance ofcco:InformationContentEntitypointing to the exact same concept in my domain).:1969-07-21both an instance ofcco:Dayandtime:ProperInterval. Then I solve the problem by having a canonical point in time described using OWL Time but then I worry that by deviating from the semantics of CCO I will face problems in the future.So my question is, what's the best approach for this modelling problem? Or more generally, is the pattern
Entity -> InformationContentEntity -> InformationBearingEntity -> data propertiesjust intended for data consolidation from different sources or is it something thought for modelling every piece of information in my domain? When or what is the situation in which I should create a simpleowl:DatatypePropertyfor storing data related to an individual when working with CCO?Beta Was this translation helpful? Give feedback.
All reactions