[redland-dev] size of bdb database

Dave Beckett dave at dajobe.org
Fri Apr 7 04:55:22 BST 2006


Sébastien Pierre wrote:
> Hi Ethan,
> 
> Le 06-04-04 à 16:09, Ethan Aubin a écrit :
> 
>> I'm experimenting with redland today and having some problems 
>> creating a store in Berkeley DB.  The owl file I'm trying to import 
>> is 46 M, but creating a new store has taken about 40 minutes so far 
>> and the -po2s.db file has gotten over 5.3 gigs.
> 
> 
> It's funny, as I myself made some experiments with Redland and the  BDD
> storage.
> 
>> This is on cygwin using redland 1.0.3 and bdb 4.4.  Anyone have any 
>> idea whats going on? Whats the normal size expansion storing rdf in  bdb?
> 
> 
> My findings were that the storage scaled quite well (both in db size 
> and in memory consumption), but that each statement as taking *2kb*, 
> which is really too much. I ran some time ago a series of test which 
> you can find here (http://wiki.type-z.org/index.php?
> n=Notes.RedlandStorageImpact), but sadly, I lost the associated  images
> and code.
> 
> It would be really good to have performance measurements and 
> comparisons for Redland, between each storage, and compared to 
> traditional relational databases.

That's way more testing than I've done on how it scales.  I'm guessing
the overhead is due to these factors:
* the bdb backend stores indexes of an entire statement 3 times (+contexts)
* the entire parts of the statement are stored including URIs, not
pointers to short identifiers
* the statement varies in size

The assumption was it was better to have less I/O requests than to do
lots of read/writes to intern URIs.  So it's a disk space vs time thing.

It might be interesting to compare with the sqlite backend which does
intern URIs, and probably works better for this size of data.  I'm
speculating...

Dave



More information about the redland-dev mailing list