[redland-dev] Entities in RDFa

Richard Smith richard at ex-parrot.com
Thu Jan 5 13:06:26 EST 2012


There was a thread last June about XML entities in RDF:

   http://lists.librdf.org/pipermail/redland-dev/2011-June/002306.html

The conclusion seemed to be that they were rarely used and 
not high on the list of things to support.

However, I've come across situation in RDFa instead of RDF 
where the use of entities is much more common practice. 
Here is a complete test case:

   <?xml version="1.0" encoding="ASCII"?>
   <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
       "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
   <html xmlns="http://www.w3.org/1999/xhtml"
         version="XHTML+RDFa 1.0" xml:lang="en">
     <head>
       <title>Test</title>
     </head>
     <body>
       <p>This page was written by
         <span xmlns:dc="http://purl.org/dc/elements/1.1/"
               property="dc:creator">José</span>.</p>
     </body>
   </html>

Other than the declaration of the dc namespace, this is 
entirely valid XML and similar examples are found all over 
the web.  However rapper parses it incorrectly because of 
the entity in the RDF literal:

   richard at nevis:~$ rapper --version
   2.0.6

   richard at nevis:~$ rapper -q -i rdfa test.html
   rapper: Error -  - XML parser error: Entity 'eacute' not defined
   <file:///home/richard/test.html>
     <http://purl.org/dc/elements/1.1/creator> "Jos"@en .

Note the missing e-acute in the name.

I think fixing this (at least of builds using libxml2) is as 
simple as adding the XML_PARSE_DTDLOAD flag to 
libxml_options in raptor_grddl.c and raptor_sax2.c. 
Probably it should be done by way of a new raptor option 
that by default is disabled, much like RAPTOR_OPTION_NO_NET 
is.

Does this seem a worthwhile change?  And would it help if I 
knocked up a patch for it?



On an unrelated issue, the property attribute in RDFa is 
defined as a CURI rather than a QName.  In other words

   <span property="http://purl.org/dc/elements/1.1/creator">

ought to be equivalent to

   <span xmlns:dc="http://purl.org/dc/elements/1.1/"
         property="dc:creator">

but it seems that full URIs are not supported, only QNames. 
I'm not necessarily volunteering to write a patch for that 
as it's not inconveniencing me too much, but I thought I'd 
report it anyway.  (The advantage of full URIs is that they 
can make the document valid against the DTD.)

Richard


More information about the redland-dev mailing list