semantic weltbild 2.0: Principles of Boundaries in the Semantic Web

Wednesday, 28. July 2004

Principles of Boundaries in the Semantic Web

Introduction
While hacking here at www.dfki.de we came to some "knee deep in the dirt" problems of Semantic Web querying and triple transmission.

We have the opinion that a Semantic Web server does not have "models" or you query by "passing a model uri". That is not feasible in a world that goes towards a global triplespace. So what we do instead is have one big virtual model that is inside build out of many different models that contain data. These models can be made out of adatpers (like gnowsis adapter or think of something like D2RQ ).

So from the outside you have a "Semantic Web Server" that answers your queries. The queries are in three different forms:

find (s p o) patterns
RDQL queries
Chatty Bounded Descriptions

Find(s p o) is easy to understand, every hacker has called one of those before. RDQL is also well known, you pass a few patterns and get a result as RDF subgraph or variable binding. The third, Chatty Bounded Descriptions are the gnowsis way of handling "Concise Bounded Descriptions". In short, you ask for data about a resource (by passing the url of the resource) and get back a subgraph of RDF around the resource, mostly literals and links to other resources.

The problem is: When you have anonymous nodes in your result, what do you do?

In case (3) it is no problem, as ChBD return a subgraph that has a closure around anonymous resources.

But in (2) and (1) you have a problem. Consider yourself querying a remote store and the store returns an anonymous resource as part of the result. like
"find (?person ?foaf:name "Leo")
and the result is an anon identifier
?person = "234234:234234:243234"

Ok, if the server has just one big model then no problem - but what if the server is an aggregation engine, embedded in an enterprise integration environment?

We have this problem right now: we implemented above search and return an anonymour resource, but just by looking at the anonymous resource it is not possible to guess where to look for more information about it. Sven Schwarz and I thought about writing a buffering system, that holds the triples with anon resources but implementing a buffering system of outgoing triples would be the source of much bug.

So we decided to create the principles of boundaries

Principles of Boundaries
In the Semantic Web we always talk about models or chunks of RDF to do something with them.

The principle of boundary is, that a Semantic Server only returns closed boundaries, with no anonymous resources at the end. If you ask a "find (s p o)" question and get an anonymous resource at the end (o=anonid), that is your problem. The server does not have to answer to "find (anonid p1 o1)". You have instead to ask the question again in RDQL, with "SELECT (s p o) (o p1 o2)".

So no anonymous resources are part in communication between servers. They may still be passed in models, but only in chunks of rdf.

This approach of Boundaries does help us very much here to implement our Semantic Web Service. If you understand what I mean, you are a real hacker.

Semantic Web is alive!

leobard - 28. Jul, 15:28

1 comment - add comment