[redland-dev] What is the expected performance of a sparql Query?

Lauri Aalto laalto at gmail.com
Thu Aug 21 08:39:07 BST 2008


Hi all,

some thoughts regarding Rasqal query performance.

The query engine is not particularly optimized. Some queries have
horrible performance, some don't.

The triple storage abstraction works only by matcing (s,p,o,c)
patterns where any component may be null that matches anything.
There's currently no way to push further constraints down to the
storage layer. For example, when querying for { ?s ?p ?o filter(?o > a
&& ?o < b) } ranges (e.g. Lou's birthday ranges), it pulls all ?s ?p
?o triples from the storage and discards those triples for which the
filter expression evaluates to false. (The filter expressions are also
always evaluated fully for each triple, there's no optimization there
yet.) For some queries, this may mean that all triples in storage have
to be fetched and checked, even if the result set only contains a
handful of bindings or triples.

Graph patterns are matched left-to-right. It's usually best to write
those graph patterns first that restrict the result set the most so
that there's less triples to match the remaining graph patterns
against.

Usually for non memory-based storages, most of the query time is spent
waiting for I/O. Not for that reason alone, you should aim to minimize
the number of storage lookups and the amount of triples returned by
each lookup.

And as with all performance issues, run a profiler to see where the
time is spent and whether your changes actually increased the
performance.

Lauri


More information about the redland-dev mailing list