|
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500 |
- title: Linked Data - Design Issues
- url: https://www.w3.org/DesignIssues/LinkedData.html
- hash_url: 8df6b7af9ac944275a13f0d0e97ad7d7
-
-
- <a href="http://www.cafepress.com/w3c_shop"><img alt="Get a 5* mug" border="none" src="https://www.w3.org/DesignIssues/diagrams/lod/597992118v2_350x350_Back.jpg" align="right"/></a>
-
- <h1>Linked Data</h1>
-
- <p>The Semantic Web isn't just about putting data on the web. It
- is about making links, so that a person or machine can explore
- the web of data. With linked data, when you have some of
- it, you can find other, related, data.</p>
-
- <p>Like the web of hypertext, the web of data is constructed with
- documents on the web. However, unlike the web of hypertext,
- where links are relationships anchors in
- hypertext documents written in <small>HTML</small>, for data they
- links between arbitrary things described by
- <small>RDF</small>,. The <small>URI</small>s identify any
- kind of object or concept. But for
- <small>HTML</small> or <small>RDF</small>, the same expectations
- apply to make the web grow:</p>
-
- <ol>
- <li>
- <p>Use <small>URI</small>s as names for things</p>
- </li>
-
- <li>
- <p>Use <small>HTTP</small> <small>URI</small>s so that people
- can look up those names.</p>
- </li>
-
- <li>
- <p>When someone looks up a <small>URI</small>, provide useful
- information, using the standards (RDF*, SPARQL)</p>
- </li>
-
- <li>
- <p>Include links to other <small>URIs</small>. so that they
- can discover more things.</p>
- </li>
- </ol>
-
- <p>Simple. In fact, though, a surprising amount of data
- isn't linked in 2006, because of problems with one or more of the
- steps. This article discusses solutions to these problems,
- details of implementation, and factors affecting choices about
- how you publish your data.</p>
-
- <h2>The four rules</h2>
-
- <p>I'll refer to the steps above as rules, but they are
- expectations of behavior. Breaking them does not destroy
- anything, but misses an opportunity to make data
- interconnected. This in turn limits the ways it can later
- be reused in unexpected ways. It is the unexpected re-use
- of information which is the value added by the web.</p>
-
- <p>The first rule, to identify things with
- <small>URI</small>s, is pretty much understood by most
- people doing semantic web technology. If it doesn't use the
- universal <small>URI</small> set of symbols, we don't call it
- Semantic Web.<br/>
- <br/>
- The second rule, to use <small>HTTP</small>
- <small>URI</small>s, is also widely understood. The
- only deviation has been, since the web started, a constant
- tendency for people to invent new <small>URI</small> schemes (and
- sub-schemes within the <span>urn:</span> scheme) such as
- <small>LSID</small>s and handles and <small>XRI</small>s and
- <small>DOI</small>s and so on, for various reasons.
- Typically, these involve not wanting to commit to the
- established Domain Name System (<small>DNS</small>) for
- delegation of authority but to construct something under separate
- control. Sometimes it has to do with not understanding
- that <small>HTTP</small> <small>URI</small>s are names (not
- addresses) and that <small>HTTP</small> name lookup is a complex,
- powerful and evolving set of standards. This issue discussed at
- length elsewhere, and time does not allow us to delve into it
- here. [ @@ref TAG finding, etc])</p>
-
- <p>The third rule, that one should serve information on the web
- against a <small>URI</small>, is, in 2006, well followed for most
- ontologies, but, for some reason, not for some major datasets.
- One can, in general, look up the properties and
- classes one finds in data, and get information from the
- <small>RDF</small>, <small>RDFS</small>, and <small>OWL</small>
- ontologies including the relationships between the terms in the
- ontology.</p>
-
- <p>The basic format here for RDF/XML, with its popular
- alternative serialization N3 (or Turtle). Large datasets provide
- a SPARQL query service, but the basic linked data should br
- provided as well.</p>
-
- <p>Many research and evaluation projects in the few years of the
- Semantic Web technologies produced ontologies, and significant
- data stores, but the data, if available at all, is buried in a
- zip archive somewhere, rather than being accessible on the web as
- linked data. The Biopax project, the CSAktive data on
- computer science research people and projects were two examples.
- [The CSAktive data is now (2007) available as linked data]</p>
-
- <p>There is also a large and increasing amount of
- <small>URI</small>s of non-ontology data which can be looked up.
- <a href="http://ontoworld.org/wiki/Semantic_wiki">Semantic
- wikis</a> are one example. The "Friend of a friend"
- (<small>FOAF</small>) and <span>Description of a Project</span>
- (<small>DOAP</small>) ontologies are used to build social
- networks across the web. Typical <a href="http://en.wikipedia.org/wiki/List_of_social_networking_websites">
- social network portals</a> do not provide links to other sites,
- nor expose their data in a standard form.</p>
-
- <p>LiveJournal and Opera Community are two portal web sites which
- do in fact publish their data in <small>RDF</small> on the web.
- (Plaxo has a trail scheme, and I'm not sure
- whether they support <span>knows</span> links). This means that I can
- write in my <small>FOAF</small> file that I know Håkon Lie by
- using his <small>URI</small> in the Opera Community data, and a
- person or machine browsing that data can then follow that link
- and find all his friends. <i>[Update:]</i> Also, the Opera
- Community site allows you to register the RDF URI for yourelf on
- another site. This means that public data about you from
- different sites can be linked together into one web, and a person
- or machine starting with your Opera identity can find the others.
- </p>
-
- <p>The fourth rule, to make links elsewhere, is necessary
- to connect the data we have into a web, a serious, unbounded web
- in which one can find al kinds of things, just as on the
- hypertext web we have managed to build.</p>
-
- <p>In hypertext web sites it is considered generally rather bad
- etiquette not to link to related external material. The
- value of your own information is very much a function of what it
- links to, as well as the inherent value of the information within
- the web page. So it is also in the Semantic Web.</p>
-
- <p>So let's look at the ways of linking data, starting with the
- simplest way of making a link.</p>
-
- <h3>Basic web look-up</h3>
-
- <p>The simplest way to make linked data is to use, in one file, a
- <small>URI</small> which points into another.</p>
-
- <p>When you write an <small>RDF</small> file, say
- <http://example.org/smith>, then you can use local
- identifiers within the file, say #albert, #brian and
- #carol. In N3 you might say</p>
- <pre>
- <#albert> fam:child <#brian>, <#carol>.
- </pre>
-
- <p>or in <small>RDF/XML</small></p>
- <pre>
- <rdf:Description about="#albert"<br/> <fam:child rdf:Resource="#brian"><br/> <fam:child rdf:Resource="#carol"><br/></rdf:Description>
- </pre>
-
- <p>The <small>WWW</small> architecture now gives a global
- identifier "http://example.org/smith#albert" to Albert.
- This is a valuable thing to do, as anyone on the planet can
- now use that global identifier to refer to Albert and give more
- information. </p>
-
- <p>For example, in the
- document <http://example.org/jones> someone might
- write:</p>
- <pre>
- <#denise> fam:child <#edwin>, <smith#carol>.
- </pre>
-
- <p>or in <small>RDF/XML</small></p>
- <pre>
- <rdf:Description about="#denise"<br/> <fam:child rdf:Resource="#edwin"><br/> <fam:child rdf:Resource="http://example.org/smith#carol"><br/></rdf:Description>
- </pre>
-
- <p><br/>
- Clearly it is reasonable for anyone who comes across the
- identifier 'http://example.org/smith#carol" to:</p>
-
- <ol>
- <li>Form the <small>URI</small> of the document by truncating
- before the hash</li>
-
- <li>Access the document to obtain information about #carol</li>
- </ol>
-
- <p>We call this dereferencing the <small>URI</small>. This
- is basic semantic web. </p>
-
- <p>There are several variations.</p>
-
- <h3>Variation: URIs without Slashes and HTTP 303</h3>
-
- <p>There are some circumstances in which dividing identifiers
- into documents doesn't work very well. There may logically
- be one global symbol per document per document, and there is a
- reluctance to include a # in the <small>URI</small> such
- as </p>
-
- <p>
- http://wordnet.example.net/antidisesablishmentarianism#word</p>Historically,
- the early Dublin Core and <small>FOAF</small> vocabularies did
- not have # in their URIs. In any event when
- <small>HTTP</small> <small>URI</small>s without hashes are used
- for abstract concepts, and there is a document that carries
- information about them, then:<br/>
-
- <ol>
- <li>An <small>HTTP</small> <small>GET</small> request on
- the <small>URI</small> of the concept returns <span>303 See Also</span> and gives in the
- Location: header, the <small>URI</small> of the
- document. </li>
-
- <li>The document is retrieved as normal</li>
- </ol>
-
- <p>This method has the advantage that <small>URI</small>s can be
- made up of all forms. It has the disadvantage that an
- <small>HTTP</small> request mBrowse-ableust be made for every
- single one. In the case of Dublin Core, for example,
- dc:title and dc:creator etc are in fact served by the same
- ontology document, but one does not know until they have
- each been fetched and returned HTTP redirections.</p>
-
- <h3>Variation: FOAF and rdfs:seeAlso</h3>
-
- <p>The <a href="http://foaf-project.org/">Friend-Of-A-Friend</a> convention
- uses a form of data link, but not using either of the two
- forms mentioned above. To refer to another person in a
- <small>FOAF</small> file, the convention was to give two
- properties, one pointing to the document they are described in,
- and the other for identifying them within that document.</p>
- <pre>
- <#i> foaf:knows [<br/> foaf:mbox <mailto:joe@example.com>;<br/> rdfs:seeAlso <http://example.com/foaf/joe> ].
- </pre>
-
- <p>Read, "I know that which has email joe@example.com and
- about which more information is in
- <http://example.com/foafjoe>".</p>
-
- <p>In fact, for privacy, often people don't put their email
- addresses on the web directly, but in fact put a one-way hash
- (<small>SHA-1</small>) of their email address and give that. This
- clever trick allows people who know their email address already
- to work out that it is the same person, without giving the email
- away to others.</p>
- <pre>
- <#i> foaf:knows [<br/> foaf:mbox_sha1sum "2738167846123764823647"; # @@ dummy<br/> rdfs:seeAslo <http://example.com/foaf/joe> ].
- </pre>
-
- <p>This linking system was very successful, forming a
- growing social network, and dominating, in 2006, the linked
- data available on the web.</p>
-
- <p>However, the system has the snag that it does not give
- <small>URI</small>s to people, and so basic links to them cannot
- be made.</p>
-
- <p>I recommend (e.g in weblogs on <a href="http://dig.csail.mit.edu/breadcrumbs/node/62">Links on the
- Semantic Web</a> , <a href="http://dig.csail.mit.edu/breadcrumbs/node/71">Give yourself a
- URI</a>, and and <a href="http://dig.csail.mit.edu/breadcrumbs/node/72">Backward and
- Forward links in RDF just as important</a>) that those making a
- <small>FOAF</small> file give themselves a <small>URI</small> as
- well as using the <small>FOAF</small> convention.
- Similarly, when you refer to a <small>FOAF</small>
- file which gives a <small>URI</small> to a person,
- use it in your reference to that person, so that clients which
- just use <small>URI</small>s and don't know about the
- <small>FOAF</small> convention can follow the link.</p>
-
- So now we have looked at ways of making a link,
- let's look at the choices of when to make a link.<br/>
-
- <p>One important pattern is a set of data which you can explore
- as you go link by link by fetching data. Whenever one
- looks up the URI for a node in the RDF graph, the server returns
- information about the arcs out of that node, and the arcs in.
- In other words, it returns any RDF statements in which the
- term appears as either subject or object.</p>
-
- <p>Formally, call a graph G <span>browsable</span> if, for the URI of
- any node in G, if I look up that URI I will be returned
- information which describes the node, where describing a node
- means:</p>
-
- <ol>
- <li>Returning all statements where the node is a subject or
- object; and</li>
-
- <li>Describing all blank nodes attached to the node by one
- arc.</li>
- </ol><br/>
-
- <p class="detail">(The subgraph returned has been referred to as
- "minimum Spanning Graph (MSG [@@ref] ) or RDF molecule
- [@@ref], depending on whether nodes are considered identified if
- they can be expressed as a path of function, or reverse inverse
- functional properties. A concise bounded description, which only
- follows links from subject to object, does not work.)</p>
-
- <p>In practice, when data is stored in two documents, this means
- that any <small>RDF</small> statements which relate things in the
- two files must be repeated in each. So, for example, in my
- <small>FOAF</small> page I mention that I am a member of the
- <small>DIG</small> group, and that information is repeated on the
- <small>DIG</small> group data. Thus, someone starting from the
- concept of the group can also find out that I am a member.
- In fact, someone who starts off with my <small>URI</small>
- can find all the people who are in the same group.</p>
-
- <h3>Limitations on browseable data</h3>
-
- <p>So statements which relate things in the two documents must be
- repeated in each. This clearly is against the first rule of data
- storage: don't store the same data in two different places: you
- will have problems keeping it consistent. This is indeed an
- issue with browsable data. A set of of completely
- browsable data with links in both directions has to be completely
- consistent, and that takes coordination, especially if different
- authors or different programs are involved.</p>
-
- <p>We can have completely browsable data, however, where it is
- automatically generated. The <a href="http://dig.csail.mit.edu/2006/dbview/dbview.py">dbview</a>
- server, for example, provides a browsable virtual
- documents containing the data from any arbitrary relational
- database.</p>
-
- <p>When we have a data from multiple sources, then we have
- compromises. These are often settled by common sense,
- asking the question,</p>
-
- <blockquote>
- <p>"If someone has the URI of that thing, what relationships to
- what other objects is it useful to know about?"</p>
- </blockquote>
-
- <p>Sometimes, social questions determine the answer.
- I have links in my <small>FOAF</small> file that I know
- various people. They don't generally repeat that
- information in their <small>FOAF</small> files. Someone may say
- that they know me, which is an assertion which, in the
- <small>FOAF</small> convention, is theirs to assert, and the
- reader's to trust or not. </p>
-
- <p>Other times, the number of arcs makes it impractical. A
- <small>GPS</small> track gives thousands of times at which my
- latitude, longitude are known. Every person loading my
- <small>FOAF</small> file can expect to get my business card
- information, but not all those trackpoints. It is reasonable to
- have a pointer from the track (or even each point) to the person
- whose position is represented, but not the other way. </p>
-
- <p>One pattern is to have links of a certain property in a
- separate document. A person's homepage doesn't list all
- their publications, but instead puts a link to it a separate
- document listing them. There is an understanding
- that <span>foaf:made</span>
- gives a work of some sort, but <span>foaf:pubs</span> points to a document
- giving a list of works. Thus, someone searching for
- something <span>foaf:made</span>
- link would do well to follow a <span>foaf:pubs</span> link. It might
- be useful to formalize the notion with a statement like</p>
- <pre>
- foaf:made link:listDocumentProperty foaf:pubs.
- </pre>
-
- <p>in one of the ontologies.</p>
-
- <h3>Query services</h3>
-
- <p>Sometimes the sheer volume of data makes serving it as lots of
- files possible, but cumbersome for efficient remote queries over
- the dataset. In this case, it seems reasonable to provide a
- <small>SPARQL</small> query service. To make the data be
- effectively linked, someone who only has the
- <small>URI</small> of something must be able to find their
- way the <small>SPARQL</small> endpoint. </p>
-
- <p>Here again the <small>HTTP</small> 303 response can be used,
- to refer the enquirer to a document with metadata about which
- query service endpoints can provide what information about which
- classes of <small>URI</small>s.</p>Vocabularies for doing
- this have not yet been standardized.<br/>
-
- (Added 2010). This year, in order to encourage
- people -- especially government data owners -- along the road to
- good linked data, I have developped this star rating system.
-
- <p>Linked Data is defined above. Linked <em>Open</em> Data (LOD)
- is Linked Data which is released under an open licence, which
- does not impede its reuse for free. Creative Commons CC-BY is an
- example open licence, as is the UK's <a href="http://www.nationalarchives.gov.uk/doc/open-government-licence/">
- Open Government Licence</a>. Linked Data does not of course in
- general have to be open -- there is a lot of important use of
- lnked data internally, and for personal and group-wide data. You
- can have 5-star Linked Data without it being open. However, if it
- claims to be Linked Open Data then it does have to be open, to
- get any star at all.</p>Under the star scheme, you get one (big!)
- star if the information has been made public at all, even if it
- is a photo of a scan of a fax of a table -- if it has an open
- licence. The you get more stars as you make it progressively more
- powerful, easier for people to use.
-
- <table>
- <tr>
- <td class="stars">★</td>
-
- <td>Available on the web (whatever format) <i>but with an
- open licence, to be Open Data</i></td>
- </tr>
-
- <tr>
- <td class="stars">★★</td>
-
- <td>Available as machine-readable structured data (e.g. excel
- instead of image scan of a table)</td>
- </tr>
-
- <tr>
- <td class="stars">★★★</td>
-
- <td>as (2) plus non-proprietary format (e.g. CSV instead of
- excel)</td>
- </tr>
-
- <tr>
- <td class="stars">★★★★</td>
-
- <td>All the above plus, Use open standards from W3C (RDF and
- SPARQL) to identify things, so that people can point at your
- stuff</td>
- </tr>
-
- <tr>
- <td class="stars">★★★★★</td>
-
- <td>All the above, plus: Link your data to other people’s
- data to provide context</td>
- </tr>
- </table>
-
- <p>How well does your data do? You can buy <a href="http://www.cafepress.co.uk/w3c_shop.480759174">5 star data
- mugs</a>, T-shirts and bumper stickers from the W3C shop at
- cafepress: use them to get your colleages and fellows
- conference-goers thinking 5 star linked data. (Profits also help
- W3C :-).</p>
-
- <p>Now in 2010, people have been pressing me, for governmet data,
- to add a new requirement, and that is there should be metadata
- about the data itself, and that that metadata should be availble
- from a major catalog. Any open dataset (or even datasets which
- are not but should be open) can be regisetreed at ckan.net.
- Government datasets from the UK and US hsould be regisetred at
- data.gov.uk or data.gov respectively. Other copuntries I expect
- to develop their own registries. Yes, there should be metadata
- about your dataset. That may be the subject of a new note in this
- series.</p>
-
- <br/>
-
- <p>Linked data is essential to actually connect the semantic web.
- It is quite easy to do with a little thought, and becomes
- second nature. Various common sense considerations
- determine when to make a link and when not to.</p>
-
- <p>The <a href="http://dig.csail.mit.edu/2005/ajar/ajaw/tab">Tabulator</a>
- client (running in a suitable browser) allows you to browse
- linked data using the above conventions, and can be used to check
- that your linked data works.</p>
-
- <p>References</p>
-
- <p>[Ding2005] Li Ding, et. al., <a href="http://ebiquity.umbc.edu/paper/html/id/240/"><span>Tracking RDF Graph Provenance using RDF
- Molecules</span></a>, UMBC Tech Report TR-CS-05-06</p>
- <hr/>
-
- <h2>Followup</h2>
-
- <p>2006-02 Rob Crowell adapts Dan Connolly's DBView (2004) which
- maps SQL data into linked RDF, adding backlinks.</p>
-
- <p>2006-09-05 Chris Bizer et al adapt <a href="http://sites.wiwiss.fu-berlin.de/suhl/bizer/d2r-server/">D2R
- Server</a> to provide a linked data view of a database.</p>
-
- <p>2006-10-10 Chris Bizer et al produce the <a href="http://sites.wiwiss.fu-berlin.de/suhl/bizer/ng4j/semwebclient/">Semantic
- Web Client Library</a>, "Technically, the library represents the
- Semantic Web as a single Jena RDF graph or Jena Model." The code
- feteches web documents as needed to answer queries.</p>
-
- <p>2007-01-15 Yves Raimond has produced a <a href="http://moustaki.org/swic/">Semantic Web client for SWI
- prolog</a> wit similar functionality.</p>
-
- <p>I have a talk at the 2009 O'Reilly eGovernment 2.0 conference
- in Washington DC, talking about "Just a Bag of Chips" @@ref, and
- talking about the 5 star scheme. Following that, From InkDroid
- blogged summary (and CSS) of my 5 star sceheme adapted here</p>
|