[redland-dev] turtle serializer isn't scalable at all
Alexander Gordeev
lasaine at lvk.cs.msu.su
Wed May 13 23:44:20 CEST 2009
Hi All!
I tried to use raptor-utils a while ago to convert very big files in ntriples
format to turtle but it went too slow. I've discovered in the code that
turtle serializer collects all triples in memory and outputs once in the end.
(If this is not true please correct me.) I think this is done to make
subjects appear only once. You shouldn't do this IMO because this means the
performance is really BAD!
Turtle is really meant to be a stream format i.e. the serializer should not
collect lots of triples. Collect triples while the subject is the same and
write them down as soon as the subject changes. This is IMO the right way to
do. If you want to optimize the output you can just use 'sort' on ntriples
file before the conversion. sort does this job MUCH better.
Sorry, I don't have a patch and I'm not going to write it because I don't use
rapper anymore. But I decided to write about this issue because it was the
only shortcoming I've noticed. Thanks for the great software!
--
Alexander
More information about the redland-dev
mailing list