[redland-dev] Querying using SPARQL - misc questions

Dave Beckett dave at dajobe.org
Tue Feb 7 17:41:40 GMT 2006


Anahide Tchertchian wrote:
> Hi,
> 
> I've been trying some queries using the SPARQL language, using the W3C
> paper to get information, and ran into a few problems.
> I'm using the Python binding, and version 1.2.0.1.
> 
> 1. Querying trying to match literal nodes, I cannot find a way to match
> literals with language and datatype set:
> - matching Node(literal='test') works using "test" in the query
> - matching Node(literal='test', language='en') works using "test"@en
> - matching Node(literal='test',
> datatype=Uri("<http://cps-project.org/2005/data/>") works using
> "test"^^<http://cps-project.org/2005/data/>
> - now matching Node(literal='test', language='en,
> datatype=Uri("<http://cps-project.org/2005/data/>") does not work using
> "test"@en^^<http://cps-project.org/2005/data/> for instance, and I ended
> up using a regexp to find nodes with a language AND a datatype set.

RDF literals cannot have both a language and a datatype, it's an
either/or.

It seems having tried it out, that Redland's APIs allow both, when it
shouldn't:

$ python2.4
Python 2.4.2 (#2, Nov 21 2005, 02:24:28)
[GCC 4.0.3 20051111 (prerelease) (Debian 4.0.2-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import RDF
>>> a=RDF.Node(literal='test',
language='en',datatype=RDF.Uri("<http://cps-project.org/2005/data/>"))
>>> a
<RDF.Node object at 0x2aaaab866dd0>
>>> str(a)
'test at en^^<<http://cps-project.org/2005/data/>>'
>>>

which is a bug.  It should fail to create such a Node when both params
are given.

The query failing is probably a consequence of this, as somewhere in the
system the assumption that datatypes & language cannot both appear is
probably being hit.

>
> 2. I have a use case where I "clean" the graph deleting literal nodes
> that where found thanks to a query. How come I cannot use nodes from the
> result directly? (I have to create them again cloning the result nodes,
> otherwise they're not found in the graph).

Can you give a code example of what you mean by reuse?  They are new
nodes returned and should be ok to use in any way you like.

> 3. I noticed that the way the query is written has effects on the
> answering time. For instance, writing:
> 
> SELECT DISTINCT ?node
> WHERE {
>     ?node rdf:type <MyRDFType>
>     ?art dcterms:references ?node
> }
> 
> will be much more efficient than writing:
> 
> SELECT DISTINCT ?node
> WHERE {
>     ?art dcterms:references ?node
>     ?node rdf:type <MyRDFType>
> }
> 
> I benched it on a graph containing more than 100,000 statements: the
> first query was processed in 1 second, and the second one took more than
> 15 min (did not wait till the end). This was done using a graph in
> memory, so I suppose that's why it may have been so slow.

Anything with DISTINCT is suspicious as rasqal 0.9.10 (in redland 1.0.2)
had bugs in that which I fixed in 0.9.11.

> Are there any tricks to know about writing efficient queries?

There is no query optimising phase, at the triple pattern level, so the
order of querying is primarily the order you give the triple patterns.
In this case, the number of MyRDFType-typed nodes is likely much less
than the number of dcterms:references triple answers.

Or, put the triples with the least number of variables first.  That's
mostly good advice, but the former is better - put the triple patterns
with the least number of expected *answers* first.  ?x rdf:type
<constant> are good choices.

> I'd be very grateful if someone could give me any indication about one
> of these problems. I've got other questions but I'm saving them for
> later :)

Dave


More information about the redland-dev mailing list