[redland-dev] slow model creation with ruby bindings
John Fieber
jfieber at adobe.com
Wed Sep 8 20:09:47 CEST 2010
On Sep 8, 2010, at 6:57 AM, Martin Guetlein wrote:
> thanks for your answer. I had a look at http://rdf.rubyforge.org,
> looks very nice, I am going to give it a try.
>
[...]
> Can I use Raptor somehow to avoid building the graph, and just print
> an rdf-file with my triples?
If you don't need to filter/manipulate the statements as they go through, I'd suggest simply calling out to the rapper command line utility.
If you want to manipulate the data in Ruby, the RDF.rb stuff will be the least painful and most ruby-like, though you can accomplish it with the SWIG bindings that come with redland. The rdf-raptor gem uses raptor directly and can be used to stream from the parser statement-at-a-time. Of course, some serializers will queue up an entire model before emitting any serialized data, but ntriples will shove out statements as soon as you provide them.
I'd advise against using the Ruby classes in the redland bindings distributions than build upon the SWIG bindings due to a variety of memory management problems and consequent leaks and/or crashes.[1] Use the SWIG bindings directly, grit your teeth and firmly fix your mind in C-style memory management when you do, and keep the C API documentation handy. You will be setting up a parser, parsing as a stream, and then loop pulling statements out of the stream and then outputting them.
To the extent that you can keep the data path almost entirely in the C realm, directly driving the SWIG bindings could be faster than rdf-raptor. The rdf-raptor gem will bring all parsed data across the C-Ruby bridge into Ruby objects, even if you only turn around and pass them back to the C code for serializing.
-john
[1] The ruby classes in the bindings distribution depend on finalizers for cleaning up C resources, but the order the finalizers are run is unpredictable which causes problems when the finalizer is responsible for a C object which is part of a C-land object graph that the Ruby GC either doesn't know about, or worse, doesn't care about because of when in an object's lifecycle the finalizer is called.
More information about the redland-dev
mailing list