[redland-dev] Raptor Turle parser memory usage
"Martin J. Dürst"
duerst at it.aoyama.ac.jp
Tue Jul 10 02:06:54 EDT 2012
Hello Dave,
On 2012/07/10 1:01, Dave Beckett wrote:
> Nick is correct about the serializer but the question was about the turtle
> parser, and it is also valid.
>
> The Raptor turtle (n3, trig) parser relies on flex and bison (aka lex+yacc)
> of which bison:
> a) has to have the entire input in memory in one block in order to parse
This is really the first time I hear something like this about bison.
flex definitely doesn't need all its input in memory, it has a
well-organized buffer mechanism (check for YY_BUFFER_STATE, yyin,
yy_scan_string, YY_INPUT,...). Therefore, bison can't require to have
the whole input in memory. There may be an
application/implementation-specific reason for having everything in
memory in raptor, but that would be a different story.
Regards, Martin.
> b) uses 32 bit unsigned int offsets
>
> So Raptor has to assemble the input in memory (lots of alloc / realloc) and
> end up with a max 2G size. A 5G file is not going to parse.
>
> I have looked at fixing this several times but writing a streaming lexer
> and parser is damn hard - months of work. Using ANTLR and other things
> that do the same job looks like it would make things a lot more complex
> (it's C++). I've also tried looking at sqlite's lemon but it doesn't stream
> so it seems the only road to this is a lot of work.
>
> Dave
>
>
> On 7/9/12 1:30 AM, Nicholas Humfrey wrote:
>> Hello,
>>
>> Yes, the Turtle serialiser puts everything into RAM, in order to build a tree of the data and out a nice pretty file, with all the triples with the same subject next to each other.
>>
>> If you output as ntriples, then output will be much faster and it won't try and load everything into RAM.
>>
>> nick.
>>
>>
>> On 9 Jul 2012, at 02:15, Medha Atre wrote:
>>
>>> Hello,
>>>
>>> I am trying to use the Raptor RDF parser library to parse a very large RDF/XML file of LUBM dataset (synthetically generated) and convert it into Turle representation. The gzipped format of RDF/XML file itself is 5.1 GB (I am reading its input through a fifo and "rapper" reads from this fifo).
>>>
>>> When I run "rapper" command to convert RDF/XML into Turtle on this file, the memory utilization shoots up very high (it consumes almost all of my RAM leaving me unable to do anything else on the computer).
>>>
>>> I was wondering if there is any option to restrict the memory used by "rapper" tool? I checked "configure" and "rapper --help", but didn't find any such option.
>>>
>>> Can someone please let me know what the best and easiest workaround for this?
>>>
>>> Thanks.
>>>
>>> Medha
>>>
>>> _______________________________________________
>>> redland-dev mailing list
>>> redland-dev at lists.librdf.org
>>> http://lists.librdf.org/mailman/listinfo/redland-dev
>>
>> _______________________________________________
>> redland-dev mailing list
>> redland-dev at lists.librdf.org
>> http://lists.librdf.org/mailman/listinfo/redland-dev
>>
>
> _______________________________________________
> redland-dev mailing list
> redland-dev at lists.librdf.org
> http://lists.librdf.org/mailman/listinfo/redland-dev
>
More information about the redland-dev
mailing list