[redland-dev] Raptor Turle parser memory usage

Dave Beckett dave at dajobe.org
Mon Jul 9 12:01:21 EDT 2012


Nick is correct about the serializer but the question was about the turtle
parser, and it is also valid.

The Raptor turtle (n3, trig) parser relies on flex and bison (aka lex+yacc)
of which  bison:
a) has to have the entire input in memory in one block in order to parse
b) uses 32 bit unsigned int offsets

So Raptor has to assemble the input in memory (lots of alloc / realloc) and
end up with a max 2G size.  A 5G file is not going to parse.

I have looked at fixing this several times but writing a streaming lexer
and parser is damn hard - months of work.  Using ANTLR and other things
that do the same job looks like it would make things a lot more complex
(it's C++).  I've also tried looking at sqlite's lemon but it doesn't stream
so it seems the only road to this is a lot of work.

Dave


On 7/9/12 1:30 AM, Nicholas Humfrey wrote:
> Hello,
> 
> Yes, the Turtle serialiser puts everything into RAM, in order to build a tree of the data and out a nice pretty file, with all the triples with the same subject next to each other.
> 
> If you output as ntriples, then output will be much faster and it won't try and load everything into RAM.
> 
> nick.
> 
> 
> On 9 Jul 2012, at 02:15, Medha Atre wrote:
> 
>> Hello,
>>
>> I am trying to use the Raptor RDF parser library to parse a very large RDF/XML file of LUBM dataset (synthetically generated) and convert it into Turle representation. The gzipped format of RDF/XML file itself is 5.1 GB (I am reading its input through a fifo and "rapper" reads from this fifo).
>>
>> When I run "rapper" command to convert RDF/XML into Turtle on this file, the memory utilization shoots up very high (it consumes almost all of my RAM leaving me unable to do anything else on the computer).
>>
>> I was wondering if there is any option to restrict the memory used by "rapper" tool? I checked "configure" and "rapper --help", but didn't find any such option.
>>
>> Can someone please let me know what the best and easiest workaround for this?
>>
>> Thanks.
>>
>> Medha
>>
>> _______________________________________________
>> redland-dev mailing list
>> redland-dev at lists.librdf.org
>> http://lists.librdf.org/mailman/listinfo/redland-dev
> 
> _______________________________________________
> redland-dev mailing list
> redland-dev at lists.librdf.org
> http://lists.librdf.org/mailman/listinfo/redland-dev
> 



More information about the redland-dev mailing list