[redland-dev] Possible bug: parsing with 'guess' parser and
supplying a base URI always parses as RDF/XML?
Arjan Wekking
a.wekking at synantics.nl
Tue May 23 10:30:27 BST 2006
On 22-mei-2006, at 22:44, Dave Beckett wrote:
> Arjan Wekking wrote:
>> On 5-mei-2006, at 14:48, Arjan Wekking wrote:
>>
>>> Hi Redland developers (Dave ;),
>>>
>>> I've found something when using Redland that might be a bug, or not,
>>> since I'm not 100% sure whether it is, so I thought i'd send a
>>> mail to
>>> the dev list first before I open an actual bug report.
>>>
>>> What happens is that I want to 'guess' parse a file, a turtle (.ttl)
>>> file in this case, which works fine, as long as I do not supply a
>>> base
>>> URI. When I do supply, the parser assumes it is RDF/XML (or XML at
>>> least) and the whole 'guess' parser seems to be ignored. A
>>> workaround
>>> is for me to guess for the parser (filename ends with .ttl, assume
>>> 'turtle', etc) but that kinda makes the whole 'guess' parser rather
>>> useless.
>>>
>>> The thing i'm not sure of is whether this is normal behaviour or not
>>> (when supplying a base URI, assume RDF/XML, or something like that),
>>> because there might be some arcane reason for doing this (dont have
>>> the time to investigate further).
>>
>> Well, I looked around a bit in the guess parser's gut, and it appears
>> that the URI on which the format is guessed (one that ends in .ttl to
>> get a turtle parser) is replaced by the base URI when there is one
>> present, otherwise the original source URI is used. Apparently
>> this is
>> by design since raptor_parse_uri() has a description that implies the
>> same thing:
>>
>>> Parse the URI according to the base URI base_uri, or NULL if not
>>> needed. If no base URI is given, the uri is used. This method
>>> depends
>>> on the raptor_www subsystem (see WWW Class section below) and an
>>> existing underlying URI retrieval implementation such as libcurl,
>>> libxml or BSD libfetch to retrieve the content.
>>
>> A simple test with rapper confirmed this:
>>
>>> rapper -g file:./test.ttl file:./this/doesnt/really/exist.ttl
>>
>>> rapper: Parsing URI file:./test.ttl with base URI
>>> file:./this/doesnt/really/exist.ttl
>>> rapper: Guessed parser name 'turtle'
>>> <file:this/doesnt/really/foo.txt>
>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>> <file:this/doesnt/barbar.txt> .
>>> rapper: Parsing returned 1 statements
>>
>> I guess I should have looked around in the documentation better :] ..
>> still it's a bit strange behaviour, since you dont expect the base
>> URI
>> to be indicative of the format at all, especially since the source
>> URI
>> doesnt change (and neither does it's format). Probably i'm missing
>> something essential here in my understanding of base URI's ;)
>
> At least in raptor 1.4.9, rapper does as you'd expect [On OSX]. I
> confirm
> that in 1.4.8 it does as you reported. You didn't mention a
> version number
> in your analysis - naughty!
Oops. (*shamed*)
> $ utils/rapper --version
> 1.4.8
> $ utils/rapper -g file:./something.ttl http://www.example.org/base/
> rapper: Parsing URI file:./something.ttl with base URI
> http://www.example.org/base/
> rapper: Error - URI http://www.example.org/base/:1 - XML parser
> error -
> Document is empty
> rapper: Failed to parse URI file:./something.ttl guess content
> rapper: Parsing returned 0 statements
>
> and with SVN raptor (aka 1.4.9 released):
>
> $ utils/rapper -g file:./something.ttl http://www.example.org/base/
> rapper: Parsing URI file:./something.ttl with base URI
> http://www.example.org/base/
> rapper: Guessed parser name 'turtle'
> _:foo <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
> <http://www.example.org/rdf/Something> .
> _:bar <http://www.example.org/rdf/baseURI> <file:///Users/awekking/
> test/> .
> rapper: Parsing returned 2 statements
Yeah, I am indeed using 1.4.8 here. Time to upgrade!
>>> But if I try the reverse (parse with base URI first, then try
>>> without)
>>> it seems that the (wrongly) guessed format 'sticks' in the parser,
>>> which kinda makes sense if you supply it when constructing the
>>> Parser() object in the first place, but kinda caught me by surprise
>>> when using the 'guess' parser:
>>
>> This apparently is caused by a parser keeping 'state' w.r.t. the
>> guessed
>> format. Whether or not the guess parser should reset this after
>> parsing
>> one source URI is a design issue, since you can create a new guess
>> parser for each URI that you parse as well. Still it makes the
>> reusability of Parser objects a bit dubious in some cases.
>>
>> Not sure if these issues need further action or not, I'd like to see
>> other people's opinions about this I guess.
>
> The guess parser does make a once, and once only guess the first
> time it is
> run with some content, then it turns into the parser it guesses.
> Maybe that
> is unexpected, so you each time you run it, it should do a new guess?
Well, I guess the issue here is that of reusing parser objects; when
you know
the format beforehand, it makes sense to create a parser of that
format and
reuse it for multiple resources of the same format. When using a
guess parser
though, you are implying that you do not know the format of the
resource and
it is possible that every resource is of a different format
(afterall, I dont know
what the format is ;).
Of-course, this is not really an issue because you can create a new
'guess'
parser for each resource that you want to parse, but it makes the
reuse-ability
of the 'guess' parser rather.. awkward. It would make sense if the
guessed
parser was reset after each run, for me anyway ;) ... I suppose the
only issue
with doing this is that people that relied on the guess parser to
stick with
their 'guess', not sure if there's anyone out there doing that (seems
unlikely).
Anyway, not sure if I ever mentioned this, but just wanted you to
know that
I've grown rather fond of the whole Redland suite (I mostly use the
Python
bindings, but the tools are extremely useful as well) and that it has
become
the RDF library I use and recommend for practically all things RDF
related.
I'm impressed by and definitely value the work you have put into it,
especially
since you are more or less the only core developer of it. I've been
trying to
help left and right with things I am confident in helping with (the
Python
binding bugs in the bug-tracker) and I hope that further uptake of
RDF and
the SW will draw more developers to keep Redland one of the better RDF
libraries out there.
Ok, back to work for me ;)
Regards,
- Arjan
More information about the redland-dev
mailing list