[redland-dev] Possible bug: parsing with 'guess' parser and
supplying a base URI always parses as RDF/XML?
Dave Beckett
dave at dajobe.org
Mon May 22 21:44:12 BST 2006
Arjan Wekking wrote:
> On 5-mei-2006, at 14:48, Arjan Wekking wrote:
>
>> Hi Redland developers (Dave ;),
>>
>> I've found something when using Redland that might be a bug, or not,
>> since I'm not 100% sure whether it is, so I thought i'd send a mail to
>> the dev list first before I open an actual bug report.
>>
>> What happens is that I want to 'guess' parse a file, a turtle (.ttl)
>> file in this case, which works fine, as long as I do not supply a base
>> URI. When I do supply, the parser assumes it is RDF/XML (or XML at
>> least) and the whole 'guess' parser seems to be ignored. A workaround
>> is for me to guess for the parser (filename ends with .ttl, assume
>> 'turtle', etc) but that kinda makes the whole 'guess' parser rather
>> useless.
>>
>> The thing i'm not sure of is whether this is normal behaviour or not
>> (when supplying a base URI, assume RDF/XML, or something like that),
>> because there might be some arcane reason for doing this (dont have
>> the time to investigate further).
>
> Well, I looked around a bit in the guess parser's gut, and it appears
> that the URI on which the format is guessed (one that ends in .ttl to
> get a turtle parser) is replaced by the base URI when there is one
> present, otherwise the original source URI is used. Apparently this is
> by design since raptor_parse_uri() has a description that implies the
> same thing:
>
>> Parse the URI according to the base URI base_uri, or NULL if not
>> needed. If no base URI is given, the uri is used. This method depends
>> on the raptor_www subsystem (see WWW Class section below) and an
>> existing underlying URI retrieval implementation such as libcurl,
>> libxml or BSD libfetch to retrieve the content.
>
> A simple test with rapper confirmed this:
>
>> rapper -g file:./test.ttl file:./this/doesnt/really/exist.ttl
>
>> rapper: Parsing URI file:./test.ttl with base URI
>> file:./this/doesnt/really/exist.ttl
>> rapper: Guessed parser name 'turtle'
>> <file:this/doesnt/really/foo.txt>
>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> <file:this/doesnt/barbar.txt> .
>> rapper: Parsing returned 1 statements
>
> I guess I should have looked around in the documentation better :] ..
> still it's a bit strange behaviour, since you dont expect the base URI
> to be indicative of the format at all, especially since the source URI
> doesnt change (and neither does it's format). Probably i'm missing
> something essential here in my understanding of base URI's ;)
At least in raptor 1.4.9, rapper does as you'd expect [On OSX]. I confirm
that in 1.4.8 it does as you reported. You didn't mention a version number
in your analysis - naughty!
$ utils/rapper --version
1.4.8
$ utils/rapper -g file:./something.ttl http://www.example.org/base/
rapper: Parsing URI file:./something.ttl with base URI
http://www.example.org/base/
rapper: Error - URI http://www.example.org/base/:1 - XML parser error -
Document is empty
rapper: Failed to parse URI file:./something.ttl guess content
rapper: Parsing returned 0 statements
and with SVN raptor (aka 1.4.9 released):
$ utils/rapper -g file:./something.ttl http://www.example.org/base/
rapper: Parsing URI file:./something.ttl with base URI
http://www.example.org/base/
rapper: Guessed parser name 'turtle'
_:foo <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.example.org/rdf/Something> .
_:bar <http://www.example.org/rdf/baseURI> <file:///Users/awekking/test/> .
rapper: Parsing returned 2 statements
>> But if I try the reverse (parse with base URI first, then try without)
>> it seems that the (wrongly) guessed format 'sticks' in the parser,
>> which kinda makes sense if you supply it when constructing the
>> Parser() object in the first place, but kinda caught me by surprise
>> when using the 'guess' parser:
>
> This apparently is caused by a parser keeping 'state' w.r.t. the guessed
> format. Whether or not the guess parser should reset this after parsing
> one source URI is a design issue, since you can create a new guess
> parser for each URI that you parse as well. Still it makes the
> reusability of Parser objects a bit dubious in some cases.
>
> Not sure if these issues need further action or not, I'd like to see
> other people's opinions about this I guess.
The guess parser does make a once, and once only guess the first time it is
run with some content, then it turns into the parser it guesses. Maybe that
is unexpected, so you each time you run it, it should do a new guess?
Dave
More information about the redland-dev
mailing list