[redland-dev] [Raptor RDF Syntax Library 0000495]: RDFa parser produces unexpected results with CDATA sections and entity references

Mantis Bug Tracker mantis-bug-sender at librdf.org
Sun Feb 19 16:21:58 EST 2012


The following issue has been SUBMITTED. 
====================================================================== 
http://bugs.librdf.org/mantis/view.php?id=495 
====================================================================== 
Reported By:                normang
Assigned To:                
====================================================================== 
Project:                    Raptor RDF Syntax Library
Issue ID:                   495
Category:                   api
Reproducibility:            always
Severity:                   major
Priority:                   normal
Status:                     new
Syntax Name:                RDFa & Turtle 
====================================================================== 
Date Submitted:             2012-02-19 21:21
Last Modified:              2012-02-19 21:21
====================================================================== 
Summary:                    RDFa parser produces unexpected results with CDATA
sections and entity references
Description: 
Consider the examples below.

Tests content1, 2, 4 and 5 are, I think wrong.

For content1, 2, 4 and 5, the CDATA marked section is simply omitted.  Although
http://www.w3.org/TR/rdfa-syntax/ doesn't mention CDATA marked sections, there's
nothing there that seems to warrant ignoring them.

Tests content1, 2 and 5 produce XMLLiteral data which includes both elements and
entities.  However in each of the three cases, the Turtle output has the
characters denoted by entities (the &<>) appearing literally in the
rdf:XMLLiteral, making it not valid XML.  Ie they're not escaped in any way.  I
can't find anything, in either http://www.w3.org/TR/REC-rdf-syntax/ (which I
suppose is the definition of rdf:XMLLiteral) or
http://www.w3.org/TeamSubmission/turtle/ which spells out what the content of an
rdf:XMLLiteral should be, but I would be surprised if invalid XML is allowed.

I don't know whether this is an RDFa parsing error or a Turtle serialisation
error.

Steps to Reproduce: 
% cat /tmp/try.xml
<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE html PUBLIC '-//W3C//DTD XHTML+RDFa 1.0//EN'
'http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd'>
<html xmlns='http://www.w3.org/1999/xhtml' xmlns:ns='urn:ns#'
xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'>
<head>
<title property='ns:title'>T</title>
<meta about='' property='ns:abstract' content='Abstract <>&%' />
</head>
<body>
<!-- for cases below, see http://www.w3.org/TR/rdfa-syntax/ Sect. 6.3.1.3 -->
<!-- explicit XMLLiteral @datatype -->
<p property='ns:content1'
   datatype='rdf:XMLLiteral'
   >content1: <![CDATA[cdata<>&]]> <span>not</span>&<></p>
<!-- no @datatype, presence of elements implies it -->
<p property='ns:content2'
   >content2: <![CDATA[cdata<>&]]> <span>not</span>&<></p>
<!-- no @datatype, but no XML elements, so plain literal -->
<p property='ns:content3'
   >content3: plain content</p>
<!-- explicit empty @datatype, so interpreted as a plain literal -->
<p property='ns:content4'
   datatype=''
   >content4: <![CDATA[cdata<>&]]> <span>not</span>&<></p>
<!-- basically same as content2 above -->
<div property='ns:content5'
     ><p>content5: <![CDATA[cdata<>&]]>
<span>not</span>&<></p></div>
</body></html>
% rapper -irdfa -oturtle /tmp/try.xml
rapper: Parsing URI file:///tmp/try.xml with parser rdfa
rapper: Serializing with serializer turtle
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix : <http://www.w3.org/1999/xhtml> .
@prefix ns: <urn:ns#> .

<file:///tmp/try.xml>
    ns:abstract "Abstract <>&%" ;
    ns:content1 "content1:  <span xmlns=\"http://www.w3.org/1999/xhtml\"
xmlns:ns=\"urn:ns#\"
xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\">not</span>&<>"^^rdf:XMLLiteral
;
    ns:content2 "content2:  <span xmlns=\"http://www.w3.org/1999/xhtml\"
xmlns:ns=\"urn:ns#\"
xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\">not</span>&<>"^^rdf:XMLLiteral
;
    ns:content3 "content3: plain content" ;
    ns:content4 "content4:  not&<>" ;
    ns:content5 "<p xmlns=\"http://www.w3.org/1999/xhtml\" xmlns:ns=\"urn:ns#\"
xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\">content5: 
<span>not</span>&<></p>"^^rdf:XMLLiteral ;
    ns:title "T" .

rapper: Parsing returned 7 triples
% rapper --version
2.0.4
% 

====================================================================== 

Issue History 
Date Modified    Username       Field                    Change               
====================================================================== 
2012-02-19 21:21 normang        New Issue                                    
======================================================================



More information about the redland-dev mailing list