[redland-dev] Parsing ntriples from a string

Dave Beckett dave at dajobe.org
Mon Aug 21 04:24:12 UTC 2006


DeeJay-G615 wrote:
> After a fair bit of digging through redland and raptor I found the
> source of my issues.
> 
> ntriples_parse.c :: raptor_ntriples_parse_chunk
> 
>   if(is_end) {
>       if(ntriples_parser->offset != ntriples_parser->line_length)
>           raptor_parser_error(rdf_parser, "Junk at end of input.\"");
>       return 0;
>   }
> 
> I guess I should have read the grammar for N-Triples
> (http://www.w3.org/TR/rdf-testcases/#ntriples) as I would have
> discovered that each triple must be terminated by an end of line character.
> 
> So
> 
> "<http://example/q?abc=1&def=2>
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#value> \"xxx\" ."
> 
> should be
> 
> "<http://example/q?abc=1&def=2>
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#value> \"xxx\" .\n"
> 
> Going back to the code exerpt above: surely it should be
> 
>   if(is_end) {
>       if(ntriples_parser->offset != ntriples_parser->line_length) {
>           raptor_parser_error(rdf_parser, "Junk at end of input.\"");
>           return 1;
>       } else {
>           return 0;
>       }
>   }
> 
> As the procedure should return non zero on failure.

Yes, I've made this change to raptor.

> 
> Something else I wanted to bring up as a result of my travels through
> the parsing parts of the API...
> 
> rdf_parser_raptor.c :: librdf_parser_raptor_parse_into_model_common

in redland

> which encapsulates the functionality of parsing from a uri, from a
> string and from a counted.
> 
> the parameter list is
> (void *context,
>  librdf_uri *uri,
>  const unsigned char *string,
>  size_t length,
>  librdf_uri *base_uri,
>  librdf_model* model)
> 
> Procedures that use this to parse from a uri, pass the URI, null for the
> string and 0 for the length. Functions that parse from a string, pass
> null for the uri, pass the string and then 0 if they don't have the length.

and the number of parameters show is a bit of a mess, hence why it's
internal only.

> How that calling convention interacts with the following code is my
> point of interest.
> 
>   if(!base_uri)
>     base_uri=uri;
> 
>   /* No base URI given, cannot proceed */
>   if(!base_uri)
>     return 1;
> 
> If you don't pass a base uri when you are parsing from a URI you are ok,
> as it sets the base_uri to the uri as a default.
> However if you are parsing from a string, and you don't supply
> base_uri... base_uri (which is already NULL) gets set to NULL, and then
> the second if block executes and it returns a failure.
> 
> Was this the intended behaviour? I'm guessing yes, as back up in
> rdf_parse.c, the 'parse from string' procedures check for the base URI
> being null, however the 'parse from uri' procedures don't.
> 
> What is the ryhme or reason to this?

It's called a bug, software has them.

The underlying problem is that some syntaxes require a base URI and some
don't.  The safe way is to always pass one in, however you call a parser.
That will never fail.  If you want to be lazy and hope that the parser
doesn't need it, some of the tests above - which I agree are too strict -
will fail.

I'll add a function to raptor to test when a parser needs a base URI,
so that the test above can be more specific.

Dave


More information about the redland-dev mailing list