Data re-use

Examples of data re-use and semantic web

Crédit illustration : Florian Lissot

The first thing we can notice in data re-use is that ​​there are two points of view:

  • The tools one
  • The users one

Users/Tools in data re-use

When users make their own selection in search results, the search tools' goal is to allow more efficient application implementations. The first limitation of this is the adaptation to the search engines. Any change causes an application update in order to adapt to the new API. From this observation there is no re-use possibility. The search engines' API don't match with any standard.

Unlike search engine API, RSS technology is standardized, which makes it a re-usable tool. RSS technology is based on the XML description language (Extended Markup Language) and makes this technology completely flexible and re-usable. However, there is a limitation in the form of reuse of RSS feeds. Indeed, sites like Wolframe (alpha), that implements RSS technology, require that RSS feeds are formatted in a certain way. Indeed, any RSS request that doesn't meet the formatting rules won't be supported by the website.

Query example in Wolframe
Query example in Wolframe

Further more, for one request, it returns many answers. On Google, the query words aren't related each others: it's a polysemy problem (Polysemy is a word capacity to have multiple meanings). That's why answers may be unintelligible and unreasonable.

To resolve that problem, there is semantic. Its main goal is to assign meaning to the query keywords. Semantic allows to link the subject, the verb and the complement. We now have a single meaning for the triplet, allowing to get a more relevant answer. However, it has a cost.

How to assign meaning to data

  • Knowledge in specific areas to identify data
  • Knowledge shared by communities

Improve the formalism of the request: dbpedia is the equivalent of wikipedia for semantic web.

Semantic and Web

In a first phase, information access was via http and html standards. The URL standard used to retrieve information with a geographic independence separates presentation and location.

In a second phase, XML organized the data/content (meta language), it also allows to type the document via DTD (ensures interoperability). XML separated structure/presentation and content.

Finally, and in response to the problem of semantics presented above, the third phase separated the semantics and RDF/RDFS structure. It is a relational model. The binary relations OWL as a new language are an extension of RDF to adapt to new needs.

That ticket was written in french w/ @y3ty and two others, during a “Mobile, social, semantic, and pervasive Web” course, in Telecom Bretagne.