[redland-dev] turtle serializer isn't scalable at all

Alexander Gordeev lasaine at lvk.cs.msu.su
Wed May 13 23:44:20 CEST 2009


Hi All!

I tried  to use raptor-utils a while ago to convert very big files in ntriples 
format to turtle but it went too slow. I've discovered in the code that 
turtle serializer collects all triples in memory and outputs once in the end. 
(If this is not true please correct me.) I think this is done to make 
subjects appear only once. You shouldn't do this IMO because this means the 
performance is really BAD!

Turtle is really meant to be a stream format i.e. the serializer should not 
collect lots of triples. Collect triples while the subject is the same and 
write them down as soon as the subject changes. This is IMO the right way to 
do. If you want to optimize the output you can just use 'sort' on ntriples 
file before the conversion. sort does this job MUCH better.

Sorry, I don't have a patch and I'm not going to write it because I don't use 
rapper anymore. But I decided to write about this issue because it was the 
only shortcoming I've noticed. Thanks for the great software!

-- 
  Alexander


More information about the redland-dev mailing list