[redland-dev] Raptor Turtle Parse Error

Dave Beckett dave.beckett at bristol.ac.uk
Sat Apr 24 23:02:36 BST 2004


On Thu, 22 Apr 2004 00:52:29 +0100
Ian Davis <lists at internetalchemy.org> wrote:

> Hi,
> 
> I'm implementing a Turtle parser for an application and came across an
> edge case in one of my unit tests. UriRefs are delimited by angle
> brackets yet the delimited string may contain unescaped angle
> brackets (any Unicode character in the range U+0 to U+10FFFF).

Sort of.  More below

> 
> I checked this with rapper (freshly built from raptor-1.3.0.tar.gz)
> using the following as input:
> 
> [ian at leif raptor-1.3.0]$ cat test.ttl
> <http://foo.example.com/?q=>&p=1> <http://foo.example.com/prop>
> <http://foo.example.com/obj> .
> 
> [ian at leif raptor-1.3.0]$ ./rapper -i turtle -o ntriples test.ttl
> lt-rapper: Parsing file test.ttl
> lt-rapper: Error - URI
> file:///home/ian/temp/raptor/raptor-1.3.0/test.ttl:1 - syntax error at
> '&' lt-rapper: Error - URI
> file:///home/ian/temp/raptor/raptor-1.3.0/test.ttl:1 - syntax error
> lt-rapper: Error - URI
> file:///home/ian/temp/raptor/raptor-1.3.0/test.ttl:1 - syntax error at
> '=' lt-rapper: Error - URI
> file:///home/ian/temp/raptor/raptor-1.3.0/test.ttl:1 - syntax error at
> '>' lt-rapper: Parsing returned 0 statements
> 
> I think the relativeUri production in the Turtle grammar is ambiguous
> here. A possible solution is to add a further string escape of \> in
> the relativeUri production. I think this is also a problem with
> NTriples but I haven't seen a negative test for this.

The relativeUri production:
  http://www.ilrt.bris.ac.uk/discovery/2004/01/turtle/#relativeURI
refers to encoding as in N-Triples 3.3:
  http://www.w3.org/TR/rdf-testcases/#sec-uri-encoding
which encodes an RDF URI reference:
  http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#dfn-URI-reference

This section goes on to explain the sets of allowed characters in an
encoded RDF URI Reference, which is done via the URI RFC2396, as
amended by RFC2732.

& is one of the disallowed characters in 2.4 of
  http://www.isi.edu/in-notes/rfc2396.txt
(in reserved) others include < and > (in delims) which is why the
<URI> form works.

> 
> The equivilent RDF/XML parses ok:
> 
> [ian at leif raptor-1.3.0]$ cat test.rdf
> <?xml version="1.0"?>
> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>          xmlns:ex="http://foo.example.com/">
>   <rdf:Description rdf:about="http://foo.example.com/?q=>&amp;p=1">
>     <ex:prop rdf:resource="http://foo3.example.com/obj"/>
>   </rdf:Description>
> </rdf:RDF>
> 
> [ian at leif raptor-1.3.0]$ ./rapper -i rdfxml -o ntriples test.rdf
> lt-rapper: Parsing file test.rdf
> <http://foo.example.com/?q=>&p=1> <http://foo.example.com/prop>
> <http://foo3.example.com/obj> . lt-rapper: Parsing returned 1
> statements

Yeah.  It's nice to get back to well defined XML formats, eh? :):)

Dave



More information about the redland-dev mailing list