[redland-dev] Possible bug: parsing with 'guess' parser and
supplying a base URI always parses as RDF/XML?
Arjan Wekking
a.wekking at synantics.nl
Fri May 5 13:48:58 BST 2006
Hi Redland developers (Dave ;),
I've found something when using Redland that might be a bug, or not,
since I'm not 100% sure whether it is, so I thought i'd send a mail
to the dev list first before I open an actual bug report.
What happens is that I want to 'guess' parse a file, a turtle (.ttl)
file in this case, which works fine, as long as I do not supply a
base URI. When I do supply, the parser assumes it is RDF/XML (or XML
at least) and the whole 'guess' parser seems to be ignored. A
workaround is for me to guess for the parser (filename ends
with .ttl, assume 'turtle', etc) but that kinda makes the whole
'guess' parser rather useless.
The thing i'm not sure of is whether this is normal behaviour or not
(when supplying a base URI, assume RDF/XML, or something like that),
because there might be some arcane reason for doing this (dont have
the time to investigate further).
Example file (something.ttl):
> @prefix ex: <http://www.example.org/rdf/> .
> _:foo a ex:Something .
> _:bar ex:baseURI <.> .
When parsing without supplying a base URI things work as expected:
> rapper -g file:./example.ttl
> rapper: Parsing URI file:something.ttl
> rapper: Guessed parser name 'turtle'
> _:foo <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://
> www.example.org/rdf/Something> .
> _:bar <http://www.example.org/rdf/baseURI> <file:///Users/awekking/
> test/> .
> rapper: Parsing returned 2 statements
But when I supply a base URI, things do not work as expected (by me,
anyway):
> rapper -g file:something.ttl http://www.example.org/base/
> rapper: Parsing URI file:something.ttl with base URI http://
> www.example.org/base/
> rapper: Error - URI http://www.example.org/base/:1 - XML parser
> error - Document is empty
> rapper: Failed to parse URI file:something.ttl guess content
> rapper: Parsing returned 0 statements
When I supply the format explicitly, things work fine again:
> rapper -i turtle file:something.ttl http://www.example.org/base/'
> rapper: Parsing URI file:something.ttl with base URI http://
> www.example.org/base/
> _:foo <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://
> www.example.org/rdf/Something> .
> _:bar <http://www.example.org/rdf/baseURI> <http://www.example.org/
> base/> .
> rapper: Parsing returned 2 statements
The same thing happens when I try to parse from within Python, and
i'm observing even stranger behaviour there now:
> >>> import RDF
> >>> parser = RDF.Parser(name='guess')
> >>> [str(t) for t in parser.parse_as_stream('file:something.ttl',
> base_uri=RDF.Uri('http://www.example.org/base/'))]
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> File "/Library/Frameworks/Python.framework/Versions/2.4/lib/
> python2.4/site-packages/RDF.py", line 1702, in parse_as_stream
> uri._reduri, base_uri._reduri)
> RDF.RedlandError: 'XML parser error - Document is empty'
> >>>
Because if I parse the same file without a base URI first, and then
re-use the same parser to parse *with* a base URI, things suddenly do
work:
> >>> parser = RDF.Parser(name='guess')
> >>> [str(t) for t in parser.parse_as_stream('file:something.ttl')]
> ['{(r1146832436r19414r12), [http://www.example.org/rdf/baseURI],
> [file:s.]}', '{(r1146832436r19414r11), [http://www.w3.org/
> 1999/02/22-rdf-syntax-ns#type], [http://www.example.org/rdf/
> Something]}']
> >>> [str(t) for t in parser.parse_as_stream('file:something.ttl',
> base_uri=RDF.Uri('http://www.example.org/base/'))]
> ['{(r1146832436r19414r12), [http://www.example.org/rdf/baseURI],
> [http://www.example.org/base/]}', '{(r1146832436r19414r11), [http://
> www.w3.org/1999/02/22-rdf-syntax-ns#type], [http://www.example.org/
> rdf/Something]}']
But if I try the reverse (parse with base URI first, then try
without) it seems that the (wrongly) guessed format 'sticks' in the
parser, which kinda makes sense if you supply it when constructing
the Parser() object in the first place, but kinda caught me by
surprise when using the 'guess' parser:
> >>> parser = RDF.Parser(name='guess')
> >>> [str(t) for t in parser.parse_as_stream('file:something.ttl',
> base_uri=RDF.Uri('http://www.example.org/base/'))]
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> File "/Library/Frameworks/Python.framework/Versions/2.4/lib/
> python2.4/site-packages/RDF.py", line 1702, in parse_as_stream
> uri._reduri, base_uri._reduri)
> RDF.RedlandError: 'XML parser error - Document is empty'
> >>> [str(t) for t in parser.parse_as_stream('file:something.ttl')]
> []
Anyway, not sure whether we are dealing with two bugs here, or just
one or none. I'm just a tad confused at the moment. Can anyone
correct me if I'm making some faulty assumptions here? :]
Regards,
-Arjan
More information about the redland-dev
mailing list