[redland-dev] Possible bug: parsing with 'guess' parser and supplying a base URI always parses as RDF/XML?

Arjan Wekking a.wekking at synantics.nl
Sat May 6 10:45:06 BST 2006


On 5-mei-2006, at 14:48, Arjan Wekking wrote:

> Hi Redland developers (Dave ;),
>
> I've found something when using Redland that might be a bug, or  
> not, since I'm not 100% sure whether it is, so I thought i'd send a  
> mail to the dev list first before I open an actual bug report.
>
> What happens is that I want to 'guess' parse a file, a turtle  
> (.ttl) file in this case, which works fine, as long as I do not  
> supply a base URI. When I do supply, the parser assumes it is RDF/ 
> XML (or XML at least) and the whole 'guess' parser seems to be  
> ignored. A workaround is for me to guess for the parser (filename  
> ends with .ttl, assume 'turtle', etc) but that kinda makes the  
> whole 'guess' parser rather useless.
>
> The thing i'm not sure of is whether this is normal behaviour or  
> not (when supplying a base URI, assume RDF/XML, or something like  
> that), because there might be some arcane reason for doing this  
> (dont have the time to investigate further).

Well, I looked around a bit in the guess parser's gut, and it appears  
that the URI on which the format is guessed (one that ends in .ttl to  
get a turtle parser) is replaced by the base URI when there is one  
present, otherwise the original source URI is used. Apparently this  
is by design since raptor_parse_uri() has a description that implies  
the same thing:

> Parse the URI according to the base URI base_uri, or NULL if not  
> needed. If no base URI is given, the uri is used. This method  
> depends on the raptor_www subsystem (see WWW Class section below)  
> and an existing underlying URI retrieval implementation such as  
> libcurl, libxml or BSD libfetch to retrieve the content.

A simple test with rapper confirmed this:

> rapper -g file:./test.ttl file:./this/doesnt/really/exist.ttl

> rapper: Parsing URI file:./test.ttl with base URI file:./this/ 
> doesnt/really/exist.ttl
> rapper: Guessed parser name 'turtle'
> <file:this/doesnt/really/foo.txt> <http://www.w3.org/1999/02/22-rdf- 
> syntax-ns#type> <file:this/doesnt/barbar.txt> .
> rapper: Parsing returned 1 statements

I guess I should have looked around in the documentation better :] ..  
still it's a bit strange behaviour, since you dont expect the base  
URI to be indicative of the format at all, especially since the  
source URI doesnt change (and neither does it's format). Probably i'm  
missing something essential here in my understanding of base URI's ;)

> But if I try the reverse (parse with base URI first, then try  
> without) it seems that the (wrongly) guessed format 'sticks' in the  
> parser, which kinda makes sense if you supply it when constructing  
> the Parser() object in the first place, but kinda caught me by  
> surprise when using the 'guess' parser:

This apparently is caused by a parser keeping 'state' w.r.t. the  
guessed format. Whether or not the guess parser should reset this  
after parsing one source URI is a design issue, since you can create  
a new guess parser for each URI that you parse as well. Still it  
makes the reusability of Parser objects a bit dubious in some cases.

Not sure if these issues need further action or not, I'd like to see  
other people's opinions about this I guess.

Regards,
-Arjan


More information about the redland-dev mailing list