[redland-dev] Entities in RDFa
Richard Smith
richard at ex-parrot.com
Thu Jan 5 13:06:26 EST 2012
There was a thread last June about XML entities in RDF:
http://lists.librdf.org/pipermail/redland-dev/2011-June/002306.html
The conclusion seemed to be that they were rarely used and
not high on the list of things to support.
However, I've come across situation in RDFa instead of RDF
where the use of entities is much more common practice.
Here is a complete test case:
<?xml version="1.0" encoding="ASCII"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
"http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
version="XHTML+RDFa 1.0" xml:lang="en">
<head>
<title>Test</title>
</head>
<body>
<p>This page was written by
<span xmlns:dc="http://purl.org/dc/elements/1.1/"
property="dc:creator">José</span>.</p>
</body>
</html>
Other than the declaration of the dc namespace, this is
entirely valid XML and similar examples are found all over
the web. However rapper parses it incorrectly because of
the entity in the RDF literal:
richard at nevis:~$ rapper --version
2.0.6
richard at nevis:~$ rapper -q -i rdfa test.html
rapper: Error - - XML parser error: Entity 'eacute' not defined
<file:///home/richard/test.html>
<http://purl.org/dc/elements/1.1/creator> "Jos"@en .
Note the missing e-acute in the name.
I think fixing this (at least of builds using libxml2) is as
simple as adding the XML_PARSE_DTDLOAD flag to
libxml_options in raptor_grddl.c and raptor_sax2.c.
Probably it should be done by way of a new raptor option
that by default is disabled, much like RAPTOR_OPTION_NO_NET
is.
Does this seem a worthwhile change? And would it help if I
knocked up a patch for it?
On an unrelated issue, the property attribute in RDFa is
defined as a CURI rather than a QName. In other words
<span property="http://purl.org/dc/elements/1.1/creator">
ought to be equivalent to
<span xmlns:dc="http://purl.org/dc/elements/1.1/"
property="dc:creator">
but it seems that full URIs are not supported, only QNames.
I'm not necessarily volunteering to write a patch for that
as it's not inconveniencing me too much, but I thought I'd
report it anyway. (The advantage of full URIs is that they
can make the document valid against the DTD.)
Richard
More information about the redland-dev
mailing list