semantic weltbild 2.0: SemWeb

SemWeb

Austrian Science Minister publicly announces to leave CERN - mountainfolks withdraw from the suisse caves into their own

The Austrian Science Minister Johannes Hahn wants to free 20mio EUR budget per year by leaving the CERN consortium. This would allow him to move the money directly to his buddies at local universities to support "european research".

Well, then please also close down your WWW servers at http://www.bmwf.gv.at/ because we are currently celebrating 20 years of this CERN invention.

In their own words: "Durch die frei werdenden CERN-Mittel bieten wir den Universitäten eine europäische Forschungsperspektive". - "By freeing funds from CERN we offer the Universities a european Research Perspective". Well, I wonder why the Universities are currently cut off from EU funds - because they are not writing enough proposals? And CERN is international, its not Europen. Even better to stay there.

Here is the press release (7.5.2009).

Here is the protest platform:
http://sos.teilchen.at/

"Der wissenschaftliche Output ist unbestritten, aber die Sichtbarkeit kleiner Staaten in Experimenten mit über 2.000 Mitgliedern eher gering." - meaning: with over 2000 scientists, the Austrians don't always appear in front row, and this is not enough great publicity for my ministry. What the fuck? Its about science, not about a minister showing off in front of cameras. Be happy if you can send your students to meet the other 2000 top phycicists in the world, in a highly competitive atmosphere.

So, lets leave the underground caves in switzerland, filled with mysterious magnets, to go back to our own caves at the Austrian Research Centers, a political wonderland of science funding.

leobard - 12. May, 10:34

0 comments - add comment

QR barcode by i-nigma.com/CreateBarcodes

- add comment - 0 trackbacks

Wednesday, 22. April 2009

Annotating files - but where to store the metadata?

An interesting thread about file metadata for KDE got my attention: Portable Meta-Information. I waited a month until it cooled down and re-read it to draw my own conclusions.

The author, zwabel, correclty identified the problem that the Semantic Desktop must be compatible with the past - and with the future!.

I think, for the future, we need to find a way to keep the users data together, so it is as persistent and approachable as the files themselves:
- When the user copies his photo archive or backs it up to a CD, no matter what application he uses, meta-information like ratings, comments, or tags, have to move together with the photos
- When the user has a fresh install, and copies his photo archive from a CD to the disk, the meta-information for the photos should be just there
- User-generated meta-data should _never_ be lost just because a file/directory was renamed, a mount-point changed, or whatever
- User-generated meta-data should not be lost when a file completely unrelated to the item is damaged or deleted(Database)
- In 20 years, when KDE4 is history for a long time, and I find an old photo backup CD, the meta-data should still be readable

zwabel then suggested to store the metadata additionally to the central store (which NEPOMUK needs for the search engine and is essential anyway) in a multitude of ".meta" files, which are stored in the same directory as the files. For the file picture1.png, the metadata would be in picture1.png.meta. I think this is a pragmatic idea and would say:

Lets store it in picture1.png.rdf

As serialization, I suggest the W3C RDF standard, which we use in the central NEPOMUK store anyway (in the database) and which has a well-readable standardized serialization format in either XML or a plain-text format. To achieve linux-geek compability, I suggest the plaintext format. For example, to add authorship information about picture1.png, it would be:

@prefix dc: <http://purl.org/dc/elements/1.1/>.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.  
<>  dc:creator "Dave Beckett";
dc:date "2002-07-31";
dc:publisher "ILRT, University of Bristol";
dc:title "Dave Beckett's Home Page" .

Note that the <> is a known shortcut for "this", the equivalent rdf/xml is: rdf:about="".

Sebastian Trüg also argues in a way that also leaves both ways open for the future, database and filesystem:
"you need a database anyway. Thus, in the end, the only solution I see at the moment is a kind of copy wrapper that makes sure metadata is copied with the file. Then one could also send information like a person or a project to a friend and the system would pick up all interesting metadata."

So - how do we format the metadata inside the files? The same way we do as in the RDF repository of nepomuk. There we use the NIE and NAO ontologies. But Pushing Dublin Core is also a good way to do, but do it the W3C way, standardized.

Using the RDF encoding of Dublin Core and for example Turtle/N3 as serialization format gives a rock-solid W3C industry standardized (or at least well implemented) way.

Because the world is not perfect and needs many possible ways to evolve, we can store the metadata in redundancy now in as many places as possible - but in one format. For freedesktop and nepomuk RDF is the best choice, in my (not so humble) opinion. It is serializeable, it can be stored in a database, it can be hosted on the web. No other standard has this. It is embedded in PDF already in the XMPP format.

I propose ".turtle" files to indicate that its RDF/Turtle serialization, but if you insist, ".rdf" is also fine with me (but implying RDF/XML storage, which is a bit sluggish), and ".meta" is also fine with me if you store RDF/turtle inside. Making up a new micro format would be stupid.

My Summary:

storing it in the filesystem is nice, but not a killer-argument. It works (tm) by just storing it in the central nepomuk repository for 90% of all use cases, so start hacking applications that help the users save time and improve their user experience with what is there today.
do not store it in .meta, but in .turtle, which is the rock-solid industry standard by W3C and human-readable and a simple microformat-like text format (smoother than xml)
do also store it however possible in the files themselves, not to block out others. Use EXIF fields, use XMPP fields in PDF, use ID3v2 fields, use those metedata!
do also index it in the central search engine, be it nepomuk or beagle++ (beagle++ is the rdf-enabled beagle, check it out if you are not aware of it)
storing it in metadata file attributes (xattr/channels/...) is the goal, but I propose to extend these standards with RDF to achieve cross-system compability. What worked for the web, may also work here.

leobard - 22. Apr, 16:52

3 comments - add comment

Caouette - 6. Sep, 09:09

storing files

Please tell me more on how to best store my files in the net. One that is sure and safer way and which I can easily understand, is accessible and hassle free.

Caouette from Prix pose carrelage

miketyson986 - 12. Dec, 12:05

storing it in metadata file attributes (xattr/channels/...) is the goal, but I propose to extend these standards with RDF to achieve cross-system compability. What worked for the web, may also work here.......ccna certification / cisco exam / ibm training / cisco ccna exam / citrix certification / comptia a+ / comptia a+ certification / comptia exams /

alton100 - 13. Dec, 05:38

I used to be more than happy to seek out this internet-site.I wanted to thanks in your time for this glorious read!!
motorcycle gloves review \\\ waterproof motorcycle gloves \\\ motorcycle riding gloves \\\ heated motorcycle gear \\\ kids motorcycle clothing \\\ motorcycle clothing for men \\\ cheap motorcycle clothing \\\ discount motorcycle clothing \\\

- add comment - 0 trackbacks

Tuesday, 21. April 2009

OrganiK project: working on testdata collection

As blogged in January, Gunnar, Remzi and I are working for DFKI on the Organik-Project. As true hard bloggin' scientists, we keep on reporting.

In the next two weeks, I will gather an exhaustive test-data collection of texts that we use for ontology learning. I hope to gather around 10.000 documents from various sources that have a topic overlap. We need e-mails, office documents (contracts, etc) and news documents. There are a lot of test data sets out there, the question is now to pick the right one. Also, in OrganiK we have SME partners who could provide some data.

After this, the next step will be to create a taxonomy learning module that analyses the documents and semi-automatically (or fully automatically) creates a taxonomy out of it for future classification. If its fully automatic, I expect that the taxonomy will have probabilistic elements in it ("it thinks that this is a customer, but only 60%"). If we work with a probabilistic model throughout the whole project, we can rank everything all the time, maybe this will reduce human work. We will see.
Anyone has experience with taxonomies that have a weight added? Its similar to a TF/IDF rank.

leobard - 21. Apr, 13:37

2 comments - add comment

swatbolish - 20. Feb, 14:17

That would be great, but collection is not an easy job. registry cleaner reviews

Martink23 - 21. Feb, 06:07

Wow, I commend you on setting such a high goal and am looking forward to seeing it all when you get it all together. I couldn't imagine trying to gather 10,000 documents from various sources! Registry Cleaner

- add comment - 0 trackbacks

Tuesday, 7. April 2009

See me speak at webinale 2009

Webinale 09 is the premier german conference about web 2.0. 70 speakers on two days, on all relevant topics: Web technology, scaling, running services, marketing, business, future trends, ria, mobile web, social networks and communities. Various hands-on sessions to learn about building iPhone apps, Air/Flex, etc. A startup day to see and meet the next xing or facebook. And facebook and xing groups to do some boo-haa already today.

two years ago I spoke about the state of the semantic web, this year again I will speak 40 minutes on the current state of the semantic web, or, as we call it, "the web of linked data".

see me speak on 26.5.2009 from 10:30 to 11:30 at the ufo-lookalike congress center in berlin about the fact that even Barack Obama's new administration does the semantic web now, and other bits. Free your data!

The ufo-lookalike congress center:
berlin congress center

leobard - 7. Apr, 20:05

0 comments - add comment

- add comment - 0 trackbacks

Tuesday, 31. March 2009

deadline surfing

A colleague from a related research institute just expressed the pressure we all experience when facing EU proposal deadlines:
"Sorry, can we reschedule to later? I am currently deadline surfing for the call deadline tomorrow".

Deadline surfing, of course, means: To have around 20 man-days of work build up behind you, and 5 workdays in front of you. While you wade through the doable tasks in front of you, more work piles up behind you faster and faster, pushing you towards the deadline. Then, the wave breaks, either you surf straight out of it (unbelievable) or you crash and fall into the whitewater (which experienced deadline surfers call the "stuck inside a washing machine mayhem"). The deadline arrives, washes every crashed surfer on shore, while the experienced riders swim out to catch the next set. Once the debris is washed from the beach, the wildlife of scientific work continues.

Let me illustrate the process:

sources, cc-by

In the graph, we compare two typical people being approached by a deadline which they are going to surf. Orang is the prepared and experienced surfer: when he sees the work coming, he gets on top of it early and then rides it at the bottom of the curve, gaining momentum and keeping the work well behind him. Finally, he elegantly finishes before the deadline and turns his board around, before the whitewater of accusations and last-minute panic crushes him. Not so the blue surfer. He waits a bit too long at the beginning, is taken by work to fast which tips him over. Unable to stay in front of the work, he ends up in the whitewater of accusations and last-minute panic.

Further illustrations:

A knowledge worker riding the perfect deadline, excellent sports:

(c) dude crush, flickr

Waited to long to start working, now trying to get away from the deadline, clearly visible for everyone still working (not a good exit, you should dive underwater so that they don't notice your wipeout):

(c) vaguely artistic

Even a small deadline can trip you (the wave is about the size of a local gov funding contract, or a NOE):

(c) coast guard bm

A team of two knowledge workers stuck right on the deadline. Bob, the lower one is tripped by the tasks slipping away under him, David, the upper, is crashing over him because he depended on Bob's input for the cost calculation:

(c) localsurfer

A sole knowledge project manager writing the final deliverable for a 15mio EUR IP project that is under close surveillance by the PO already, the double tripping wave means that half the project members invested their money into stocks and expensive mediterrian "research visits" which makes it impossible to meet cost statements (and all accounts receivable):

(c) soulsurfer3 on flickr

I conclude:
"I love deadlines, I love the sound they make when they swoosh by".
Douglas Adams

p.s.: this is of course related to the deadline of IST calls tomorrow.

leobard - 31. Mar, 16:58

5 comments - add comment

leobard - 1. Apr, 19:03

and on urbandictionary

Deadline surfing is now also on urban dictionary:
http://www.urbandictionary.com/define.php?term=deadline%20surfing

NathalieD - 2. Jun, 17:34

cool post. I'm guilty of deadline surfing :)

Nathalie
Becoming Pregnant - Acupuncture and getting pregnant. Are you kidding me?

tradetuber - 16. Jun, 15:20

T5 Led tube

Thousands of Factory Audited China Suppliers, China Manufacturers, China Products are seeking Trusted Importers and Exporters on tradetuber.com

T5 Led tube

coach001 - 17. Nov, 13:02

coach

louis vuitton
[url=http://www.louisvuissttonoutletbbh.com]louis vuitton[/url]

george22 - 1. Dec, 07:48

washing machine mayhem"). The deadline arrives, washes every crashed surfer on shore, while the experienced riders swim out to catch the next set. Once the debris is washed from the beach, the wildlife of scientific work continues.dating confidence

- add comment - 0 trackbacks

Thursday, 26. March 2009

PhD step5: burning the last draft

After submitting my PhD in January, I continue my long-term effort to blog about my phd.

When you submitted your phd, according to an old scottish tradition, you burn the last printed draft in the woods. Gunnar Grimnes and I did adhere to that scottish tradition on the 10th of January 2009, it also includes drinking a lot of alcoholic beverages.

The tradition also includes defending the thesis quickly, you say something like "I made a phd on helping people remember, and it is great." - the attackers (your dudes) then shout "It is shit - burn it!". Do that, and drink alcohol.

Please comment below, blog it, or contact me if you also burnt your phd. use the flickr tag phdburning.

leobard - 26. Mar, 11:28

0 comments - add comment

- add comment - 0 trackbacks

Wednesday, 25. March 2009

SemVox DFKI Startup combines Ontologies with voice interaction

"So, computer, please find me all documents that contain research information about a drug that can cure cancer, developed anywhere in the world" - this is a classic question we would like to ask a computer. Actually, its so classic that it is defined as an example in the 1992 version in the TREC test data.

The DFKI Spin-Off SemVox may provide something that helps realizing this. They are combining ontologies with speech interaction:
The SemVox technology enables the user to employ various applications without having to resort to traditional operating concepts such as keyboards or remote controls. Using our technology the user is free to choose between a number of modalities such as speech, gestures, keyboard or mouse or a combination thereof.

Their technology incorporates a heterogenous set of modules that can be remixed to allow different application scenarios. Part of their demos is to tell the computer to "find me an action film". Nice side-effect: using the speech-synthesis module offered by SVOX, the computer will talk back to you (press release in german).

So - this is a next step to the semantic web,as Vint Cerf has put it:
I’m almost certain you’ll see products emerging that will allow you to orally interact with the network
Sure, it is nearly here, and you can buy the tools for it off-the-shelf. And I guess SemVox is open for investors :-)

What is really funny, is that today we are very close to actually answer the questions defined as scientific goals in 1992 (for example here, page 64, I was not able to find the original TREC-1 set).

I have seen the SemVox system live at CeBit, I was demoing NEPOMUK (and advertising my gnowsis.com startup) 5 meters away from them and we had great fun demoing our products to each other. Here is a picture of Jan, one of the founders:

leobard - 25. Mar, 19:29

0 comments - add comment

- add comment - 0 trackbacks

Sunday, 8. March 2009

The IJCAI-09 workshop on Identity and Reference in web-based Knowledge Representation (IR-KR2009) deadline extended: 16.3.

Here the CFP from the IR-KR workshop. All who think there are cooler uris than those, submit something.

=======================
CALL FOR PAPERS

IR-KR2009 at IJCAI-09

July 11-13, 2009
Pasadena, CA, USA
=======================

The IJCAI-09 Workshop on
"Identity and Reference in web-based Knowledge Representation" (IR-KR2009)
http://ir-kr.okkam.org/

July 11-13, 2009
Pasadena, California, USA

held at the International Joint Conference on Artificial Intelligence
(IJCAI-09)
http://ijcai-09.org

-- IR-KR2009 goals --

The goal of this workshop, which in past years was mainly organised
within the Web and Semantic Web (SW) communities (see past editions at
WWW2006, WWW2007, ESWC200q8), is to widen debate on the impact and the
challenges that the notions of *identity* and *reference* in
web-oriented KR poses to some of the core concepts of AI.

-- Background & Description of the workshop --

The Semantic Web initiative advances the idea that the web may become
a space not only for publishing and interlinking documents (through
HTML hyperlinks), but also knowledge bases (e.g. in the form of RDF
graphs) in an open and fully decentralized environment.

Even though models and languages used to implement the nascent
Semantic Web have been taken from long-standing research in AI, SW and
AI have different priorities. While traditionally a strong focus
within AI has been developing theories and code to support sound and
complete reasoning, web-oriented KR has a primary concern of web-wide
information interoperability and integration.

Perhaps the most central issue in reconciling these concernd is the
Principle of Global Identifiers: "global naming leads to global
network effects" (see Architecture of the World Wide Web, Volume One,
2004, at http://www.w3.org/TR/2004/REC-webarch-20041215/). As for the
web of documents, the overall value of such open and distributed
network of truly interlinked knowledge sources, based on global names,
would be immensely bigger than the sum of the value of the components.

This central role of identity and reference for a web-scale KR poses
new challenges to traditional KR, and many researchers have suggested
that the concept of URI may deeply affect the notions of language
(e.g. the semantics of using the "same" URI in different models),
reference (e.g. rigid vs. non rigid designation), interpretation
(e.g. the meaning of "links" across knowledge bases) & reasoning
(e.g. distributed reasoning across theories) in traditional
logic-based KR in AI. This workshop addresses these challenges.

-- Expected outcome --

The anticipated outcome of the workshop is to assess the state of the
art in the area of Identity and Reference in AI and the SW, and to go
beyond the limited scope of the current Semantic Web, as well as to
discuss and critically evaluate approach and next steps in
implementing and reasoning about identity and reference. It is
expected that the workshop will provide a valuable opportunity for
cross-fertilization across different research communities.

-- Workshop format --

Based on the successful experience in the past workshops on this
topic, the format of IR-KR2009 will be the following:

* a keynote talk that illustrates the importance of the topic
* very short presentations of the accepted papers, to give
participants an overview of the research work of the main
workshop contributors
* presentation of a detailed list of topics to discuss, by the
workshop chair
* extensive, moderated plenary discussion
* collaborative write-up of conclusions and next steps

-- Submissions --

The workshop aims at collecting contributions which can roughly be
grouped as follows:

* Foundations: formal and conceptual theories of identity and
reference for web-oriented KR
* Formal theories: semantics for KR on the web, soundness and
completeness of web-oriented reasoning, semantics of interlinked
data
* Vision papers: visionary solutions to the problems of identity
and reference in KR
* Project papers: descriptions of research & development projects
in this area
* Experiences: contributions from research and industry that
illustrate case studies or approaches to deal with the issues of
identity and reference on a web-scale
* Critical viewpoints: discussions of advantages and disadvantages
of the proposed approaches

We especially encourage contributions from groups or organizations
which are working on assembling large knowledge-based data collections
in order to compare the different practical solutions which were found
for integrating semantic data from multiple sources.

-- Submission Requirements and Dates --

IR-KR2009 will accept submissions for full papers, posters and
demonstrations. The selection will be based on the significance and
the quality of submissions as well as oriented towards fostering
cross-pollination and discussions during the event. All selected
abstracts will be included in the IJCAI-09 Working Notes. Authors are
kindly requested to provide keywords upon submission. The format for
submissions is the same as that of IJCAI-09. Please check
http://ijcai-09.org for the style files. Submissions should be no
longer than 5 pages.

- Submission deadline (papers, posters, demos): March 6, 2009
- Notification to authors: April 17, 2009
- Camera-ready version: May 8, 2009
- Workshop dates: July 11-13, 2009

Submissions will be managed through EasyChair.org at:

https://www.easychair.org/login.cgi?conf=irkr2009

-- Attendance --

Following IJCAI-09 policy, the total number of participants in
IR-KR2009 will be limited to 75 people. This includes organizers, PC
members, invited speakers, authors and attendees. Authors will be
selected based on the significance of their submission and will be
preferred during registration to non-presenting
attendees. Non-presenting attendees will be selected on a
first-come-first-served basis. Please refer to http://ijcai-09.org for
the application procedure and fees.

-- Workshop Chair --

Paolo Bouquet, University of Trento [PRIMARY CONTACT]
bouquet@disi.unitn.it

-- Workshop Organizers --

Marko Grobelnik, IJS, Slovenia
marko.grobelnik@ijs.si

Harry Halpin, University of Edinburgh
hhalpin@ibiblio.org

Frank van Harmelen. VU Amsterdam
Frank.van.Harmelen@cs.vu.nl

Heiko Stoermer. University of Trento
stoermer@dit.unitn.it

Giovanni Tummarello. DERI Galway
giovanni.tummarello@deri.org

Michael Witbrock, Cycorp Inc
witbrock@cycorp.eu

-- Program Committee --

Confirmed members:

Bo Andersson
Karl Aberer
Michael K. Bergman
Dan Brickley
Werner Ceusters
Kendall Clark
Richard Cyganiak
Hugh Glaser
Nicola Guarino
Gregor Hackenbroich
Tom Heath
Alexander Löser
Antonio Maña
Larry Masinter
Bijan Parsia
Peter F. Patel-Schneider
Valentina Presutti
Marta Sabou
Leo Sauermann
Luciano Serafini
Dagobert Soergel
Andraz Tori
Bernard Vatant

leobard - 8. Mar, 09:24

1 comment - add comment

benostill - 25. May, 21:19

This central role of identity and reference for a web-scale KR poses
new challenges to traditional KR, and many researchers have suggested
that the concept of URI may deeply affect the notions of language
(e.g. the semantics of using the "same" URI in different models),
reference (e.g. rigid vs. non rigid designation), interpretation
(e.g. the meaning of "links" across knowledge bases) & reasoning
(e.g. distributed reasoning across theories) in traditional
logic-based KR in AI. This workshop addresses these challenges.
movies
upcoming movies
download movie
imbd
tv series

- add comment - 0 trackbacks

Friday, 6. March 2009

Visit me on the CeBit 2009

I am at the DFKI booth, presenting ALOE, NEPOMUK, and EyeBook. Until Sunday, 8th March. We also present facts about our new Semantic Desktop Spin-Off gnowsis.com.

Hall 9, Booth B45

http://www.dfki.de/web/aktuelles/cebit2009

leobard - 6. Mar, 14:26

0 comments - add comment