[redland-dev] [Raptor RDF Syntax Library 0000402]: Parser does not respect Content-Location header when parsing RDFa from web

Mantis Bug Tracker mantis-bug-sender at librdf.org
Tue Nov 30 21:20:24 CET 2010


The following issue has been SUBMITTED. 
====================================================================== 
http://bugs.librdf.org/mantis/view.php?id=402 
====================================================================== 
Reported By:                normang
Assigned To:                
====================================================================== 
Project:                    Raptor RDF Syntax Library
Issue ID:                   402
Category:                   api
Reproducibility:            always
Severity:                   major
Priority:                   normal
Status:                     new
Syntax Name:                RDFa 
====================================================================== 
Date Submitted:             2010-11-30 20:20
Last Modified:              2010-11-30 20:20
====================================================================== 
Summary:                    Parser does not respect Content-Location header when
parsing RDFa from web
Description: 
RFC 2616 section 14.14 says: "The value of Content-Location also defines the
base URI for the entity" (this is the second of only two mentions of "base URI"
in the document).

HTML 4 section 12.4.1
<http://www.w3.org/TR/1999/REC-html401-19991224/struct/links.html#h-12.4.1> says
that "The base URI is given by meta data discovered during a protocol
interaction, such as an HTTP header (see [RFC2616])" (this puts it in priority
below the <base> element, and above the document's URI).

I've listed this as a 'major' bug, because of my prejudices about standards
conformance (but I'm getting therapy), but I won't be offended (!) if you class
it instead as 'minor'....

Steps to Reproduce: 
1. Configure a web document to include RDFa, and to send a content-location
header on retrieval.  For example (not a persistent URI):

% curl -i http://text.nxg.me.uk/temp/test.html
HTTP/1.1 200 OK
Date: Tue, 30 Nov 2010 20:16:49 GMT
Server: Apache/1.3.41
content-location: http://text.nxg.me.uk/elsewhere/foo.html
Last-Modified: Tue, 30 Nov 2010 19:41:44 GMT
ETag: "3c1a097-182-4cf55378"
Content-Length: 386
Connection: close
Content-Type: text/html; charset=utf-8

<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
"http://www.w3.org/MarkUp/DTD/xhtml
-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:dcterms="http://purl.org/dc/terms/">
<head>
<title>Test document</title>
</head>
<body>
<div about='wibble.html'>
<h1 property='dcterms:title'>Test number one</h1>
</div>
</body>
</html>

2. Use rapper to parse this

% rapper --version
1.9.0
% rapper -irdfa -oturtle http://text.nxg.me.uk/temp/test.html
rapper: Parsing URI http://text.nxg.me.uk/temp/test.html with parser rdfa
rapper: Serializing with serializer turtle
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix : <http://www.w3.org/1999/xhtml> .
@prefix dcterms: <http://purl.org/dc/terms/> .

<http://text.nxg.me.uk/temp/wibble.html>
    dcterms:title "Test number one" .

rapper: Parsing returned 1 triple
%

Because of the Content-Location header, the about='wibble.html' should have been
resolved relative to <http://text.nxg.me.uk/elsewhere/foo.html>, so that the
subject of the single RDFa statement should have been
<http://text.nxg.me.uk/elsewhere/wibble.html>

Additional Information: 
It gets slightly more complicated with HTML5.

For the definition of the base URI, HTML5 defers to the XML Base specification
(sect. 2.6.1, step 4 <http://www.w3.org/TR/html5/urls.html#document-base-url>). 
The XML Base specification <http://www.w3.org/TR/xmlbase/> sect. 4.1 defers to
RFC 3986.  Section 5.1.2 of that says that "If no base URI is embedded, the base
URI is defined by the representation's retrieval context", and goes on to give
an example involving MIME.

It's not completely clear what this means, but I believe it is most naturally
interpreted as referring to a mechanism like that in sect. 14.14 of RFC 2616,
meaning that Content-Location still trumps retrieval-URI (and is trumped by
xml:base).

It's not a knock-down case, but all the above does strongly suggest to me that
the RFC 2616 intention for the Content-Location header is clear: downstream
processors should regard the Content-Location header's URI as the effective base
URI for the document, irrespective of the URI it was actually retrieved from.

And raptor doesn't.
====================================================================== 

Issue History 
Date Modified    Username       Field                    Change               
====================================================================== 
2010-11-30 20:20 normang        New Issue                                    
======================================================================



More information about the redland-dev mailing list