[redland-dev] Triple storage overhead

Sébastien Pierre sebastien-lists at type-z.org
Fri May 28 17:35:15 BST 2004


Hi all,

I am a Redland newbie and use it through the Python interface. I am 
currently using Redland BDB storage as a backend for a software 
configuration management system I am working on.

I was surprised to see that storing triples composed roughly of 2 40 
chars strings for subject and object, plus a 10 chars predicate would 
consume 3 to 6 times the total size of these strings on the disk.

I made some further tests, and I had the impression that Redland does 
not "aggregates" subjects, objects or predicates with the same content. 
For instance, storing:

( "John", "likes", "Sushi")
( "John", "dislikes", "Hamburger")
...
( "John", "lives in", "Antarctica")

Takes approximatively the same size as:

( "John", "likes", "Sushi")
( "Joe", "dislikes", "Hamburger")
...
( "Bob", "lives in", "Antarctica")

[of course, this is more relevant with very long subjects]

What I mean is that "John" seems to be repeated in each triple, instead 
of being shared by all triples. I do not know if I am clear enough, but 
maybe you could have a look at a more detailed explanation here 
<http://wiki.type-z.org/index.php/Notes/RedlandStorageImpact>.

TIA,

  -- Sébastien

--
«And never can a man be more disastrously in death than when death
itself shall be deathless.»
<http://www.type-z.org>                     -- St. Augustine



More information about the redland-dev mailing list