[redland-dev] raptor turtle serializer scalability patch

Chris Cannam cannam at all-day-breakfast.com
Mon Nov 9 15:36:51 CET 2009


Attached is a patch for raptor svn r15635 (also works with the 1.4.19
release) which addresses a scalability problem in Turtle serialization
by replacing raptor_sequence with raptor_avltree for the subject and
blanks containers passed to raptor_abbrev_node_lookup.  This attempts
to fix the "this should really be a hash, not a list" FIXME previously
noted in that function.

The tricky bit is handling the way blank nodes were removed from the
sequence when written, if they were not going to be needed again.
Because you can't just replace an item in the tree with NULL (as was
done in the sequence) without breaking tree ordering, and you can't
remove an item from the tree while iterating over it, I've instead
added a "valid" flag to the subject struct itself which is reset on
writing and subsequently tested to prevent duplicate writes.  I'm not
hugely keen on this, should anyone have any better ideas.

This patch reduces the runtime of my own test case (c. 400K triples
constructed and serialized) from about 25 minutes to about 14 seconds.

It also reduces the runtime for the Turtle test suite from about 12.1
to 10.4 seconds on this machine in release trim, and it passes
valgrind --leak-check=full with no errors or leaks.

The bad news is that it causes a number of unit tests to fail because
of changes to the ordering in output.  One test (ex-38) in the rdfxml
and turtle test suites fails (I think because rdfdiff is wrongly
seeing a difference that isn't there), and the entire feeds test suite
fails (I think because it doesn't use rdfdiff at all).  I haven't
spotted anything that looks like a "real" failure, but I might be
missing something.  Thoughts welcome.


Chris
-------------- next part --------------
A non-text attachment was scrubbed...
Name: raptor-subject-avltree.diff
Type: text/x-diff
Size: 21907 bytes
Desc: not available
Url : http://lists.librdf.org/pipermail/redland-dev/attachments/20091109/686813b0/attachment-0001.diff 


More information about the redland-dev mailing list