[redland-dev] Possible bug: parsing with 'guess' parser and supplying a base URI always parses as RDF/XML?

Arjan Wekking a.wekking at synantics.nl
Fri May 5 13:48:58 BST 2006


Hi Redland developers (Dave ;),

I've found something when using Redland that might be a bug, or not,  
since I'm not 100% sure whether it is, so I thought i'd send a mail  
to the dev list first before I open an actual bug report.

What happens is that I want to 'guess' parse a file, a turtle (.ttl)  
file in this case, which works fine, as long as I do not supply a  
base URI. When I do supply, the parser assumes it is RDF/XML (or XML  
at least) and the whole 'guess' parser seems to be ignored. A  
workaround is for me to guess for the parser (filename ends  
with .ttl, assume 'turtle', etc) but that kinda makes the whole  
'guess' parser rather useless.

The thing i'm not sure of is whether this is normal behaviour or not  
(when supplying a base URI, assume RDF/XML, or something like that),  
because there might be some arcane reason for doing this (dont have  
the time to investigate further).

Example file (something.ttl):

> @prefix ex: <http://www.example.org/rdf/> .
> _:foo a ex:Something .
> _:bar ex:baseURI <.> .

When parsing without supplying a base URI things work as expected:

> rapper -g file:./example.ttl

> rapper: Parsing URI file:something.ttl
> rapper: Guessed parser name 'turtle'
> _:foo <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http:// 
> www.example.org/rdf/Something> .
> _:bar <http://www.example.org/rdf/baseURI> <file:///Users/awekking/ 
> test/> .
> rapper: Parsing returned 2 statements

But when I supply a base URI, things do not work as expected (by me,  
anyway):

> rapper -g file:something.ttl http://www.example.org/base/

> rapper: Parsing URI file:something.ttl with base URI http:// 
> www.example.org/base/
> rapper: Error - URI http://www.example.org/base/:1 - XML parser  
> error - Document is empty
> rapper: Failed to parse URI file:something.ttl guess content
> rapper: Parsing returned 0 statements

When I supply the format explicitly, things work fine again:

> rapper -i turtle file:something.ttl http://www.example.org/base/'

> rapper: Parsing URI file:something.ttl with base URI http:// 
> www.example.org/base/
> _:foo <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http:// 
> www.example.org/rdf/Something> .
> _:bar <http://www.example.org/rdf/baseURI> <http://www.example.org/ 
> base/> .
> rapper: Parsing returned 2 statements

The same thing happens when I try to parse from within Python, and  
i'm observing even stranger behaviour there now:

> >>> import RDF
> >>> parser = RDF.Parser(name='guess')
> >>> [str(t) for t in parser.parse_as_stream('file:something.ttl',  
> base_uri=RDF.Uri('http://www.example.org/base/'))]
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
>   File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ 
> python2.4/site-packages/RDF.py", line 1702, in parse_as_stream
>     uri._reduri, base_uri._reduri)
> RDF.RedlandError: 'XML parser error - Document is empty'
> >>>

Because if I parse the same file without a base URI first, and then  
re-use the same parser to parse *with* a base URI, things suddenly do  
work:

> >>> parser = RDF.Parser(name='guess')
> >>> [str(t) for t in parser.parse_as_stream('file:something.ttl')]
> ['{(r1146832436r19414r12), [http://www.example.org/rdf/baseURI],  
> [file:s.]}', '{(r1146832436r19414r11), [http://www.w3.org/ 
> 1999/02/22-rdf-syntax-ns#type], [http://www.example.org/rdf/ 
> Something]}']
> >>> [str(t) for t in parser.parse_as_stream('file:something.ttl',  
> base_uri=RDF.Uri('http://www.example.org/base/'))]
> ['{(r1146832436r19414r12), [http://www.example.org/rdf/baseURI],  
> [http://www.example.org/base/]}', '{(r1146832436r19414r11), [http:// 
> www.w3.org/1999/02/22-rdf-syntax-ns#type], [http://www.example.org/ 
> rdf/Something]}']

But if I try the reverse (parse with base URI first, then try  
without) it seems that the (wrongly) guessed format 'sticks' in the  
parser, which kinda makes sense if you supply it when constructing  
the Parser() object in the first place, but kinda caught me by  
surprise when using the 'guess' parser:

> >>> parser = RDF.Parser(name='guess')
> >>> [str(t) for t in parser.parse_as_stream('file:something.ttl',  
> base_uri=RDF.Uri('http://www.example.org/base/'))]
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
>   File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ 
> python2.4/site-packages/RDF.py", line 1702, in parse_as_stream
>     uri._reduri, base_uri._reduri)
> RDF.RedlandError: 'XML parser error - Document is empty'
> >>> [str(t) for t in parser.parse_as_stream('file:something.ttl')]
> []

Anyway, not sure whether we are dealing with two bugs here, or just  
one or none. I'm just a tad confused at the moment. Can anyone  
correct me if I'm making some faulty assumptions here? :]

Regards,
-Arjan


More information about the redland-dev mailing list