[redland-dev] Creating additional storage hashes

Mon Jun 23 17:09:26 BST 2003

On Sun, 15 Jun 2003 15:14:31 -0600
Jason Johnston <redland at lojjic.net> wrote:

> Dave Beckett wrote:
>  >>So... is there a simple way to create a s2po hash?  And, if not, is
>  >>there a difficult way to create it? ;-)  I'm not opposed to modifying
>  >>the C source to accomplish what I need.
>  >
>  > Oh yes :)
>  >
>  > Let me enter tutorial mode.
>  >
>  > [snip fantastic tutorial]
> 
> Thanks so much for your detailed instructions, they were a great help! 

Phew.  My brain dump had enough structure/clues :)

>  From them, I was able to successfully create a s2po indexed hash.  It 
> did take some extra work than you gave in your walkthrough to make 
> find_statements use the new hash (there are several places where it 
> forks the codepath depending on what nodes are provided so I needed to 
> add a fork), but luckily there was already code for the optional p2so 
> hash so I copied that everywhere it appeared.  ...

I expected there would be something more, but enough similar code was
there.

> ... It appeared to work when 
> I tested find_statements with only the subject node known on the new set 
> of BDB hashes.
>
> However, when I tried repeating the find_statements call over a list of 
> several subject nodes, it fails.  Sometimes I just get "Segmentation 
> fault", and the most detailed error I've gotten is:
>    rdf_node.c:381:librdf_node_from_node: fatal error: Do not know how to 
> copy node type 0

That's a node copying/reference issue I'm expecting.  You have
to take care when you are returing a pointer to a shared node
(in streams, iterators) and when you need to return a new one.

> I'm at a loss where to begin tracking down what's going wrong, perhaps 
> I'm missing something or there's another part of the library that 
> expects one of the three normal hashes?  It just seems strange that it 
> can work once or twice and fail after that.  I can send you the 
> particular files and scripts I'm using via your personal email if it 
> would help track it down.

In unformatted patch would be better

> Thanks again for your assistance.
> --Jason Johnston
> 
> --------------------------------------------------
> 
> Here's the patch I'm using (sorry if it wraps, I can send it to you 
> separately if you need):
> 
> diff -u -r1.49 rdf_storage_hashes.c
> --- librdf/rdf_storage_hashes.c	15 Apr 2003 22:26:13 -0000	1.49
> +++ librdf/rdf_storage_hashes.c	16 May 2003 02:19:04 -0000
> @@ -52,10 +52,13 @@
>     {"po2s",
>      LIBRDF_STATEMENT_PREDICATE|LIBRDF_STATEMENT_OBJECT,
>      LIBRDF_STATEMENT_SUBJECT},  /* For 'get sources' */
> -  {"so2p",
> +  {"so2p",
>      LIBRDF_STATEMENT_SUBJECT|LIBRDF_STATEMENT_OBJECT,
>      LIBRDF_STATEMENT_PREDICATE},  /* For 'get arcs' */
> -  {"p2so",
> +  {"s2po",
> +   LIBRDF_STATEMENT_SUBJECT,
> +   LIBRDF_STATEMENT_PREDICATE|LIBRDF_STATEMENT_OBJECT},  /* For '(s, ?, 
> ?)' */
> +  {"p2so",
>      LIBRDF_STATEMENT_PREDICATE,
>      LIBRDF_STATEMENT_SUBJECT|LIBRDF_STATEMENT_OBJECT},  /* For '(?, p, 
> ?)' */
>     {"contexts",
> @@ -107,6 +110,7 @@
>     int targets_index;
> 
>     int p2so_index;
> +  int s2po_index;
> 
>     /* If this is non-0, contexts are being used */
>     int index_contexts;
> @@ -244,7 +248,7 @@
>     context->options=options;
> 
>     /* Work out the number of hashes for allocating stuff below */
> -  hash_count=3;
> +  hash_count=4;
> 
>     if((index_contexts=librdf_hash_get_as_boolean(options, "contexts"))<0)
>       index_contexts=0; /* default is no contexts */
> @@ -278,7 +282,7 @@
>       return 1;
>     }
> 
> -  for(i=0; i<3; i++) {
> +  for(i=0; i<4; i++) {
>       status=librdf_storage_hashes_register(storage, name,
>  
> &librdf_storage_hashes_descriptions[i]);
>       if(status)
> @@ -299,6 +303,7 @@
>     context->arcs_index= -1;
>     context->targets_index= -1;
>     context->p2so_index= -1;
> +  context->s2po_index= -1;
>     /* and index for contexts (no key or value fields) */
>     context->contexts_index= -1;
> 
> @@ -325,6 +330,9 @@
>       } else if(key_fields == LIBRDF_STATEMENT_PREDICATE &&
>                 value_fields == 
> (LIBRDF_STATEMENT_SUBJECT|LIBRDF_STATEMENT_OBJECT)) {
>         context->p2so_index=i;
> +    } else if(key_fields == LIBRDF_STATEMENT_SUBJECT &&
> +              value_fields == 
> (LIBRDF_STATEMENT_PREDICATE|LIBRDF_STATEMENT_OBJECT)) {
> +      context->s2po_index=i;
>       } else if(!key_fields || !value_fields) {
>          context->contexts_index=i;
>       }
> @@ -984,6 +992,16 @@
>                                                     context->p2so_index,
>  
> librdf_statement_get_predicate(statement),
>  
> LIBRDF_STATEMENT_SUBJECT|LIBRDF_STATEMENT_OBJECT);
> +  } else if(librdf_statement_get_subject(statement) &&
> +            !librdf_statement_get_predicate(statement) &&
> +            !librdf_statement_get_object(statement) &&
> +            context->s2po_index >= 0) {
> +    /* (s ? ?) -> (s p o) wanted */
> +    stream=librdf_storage_hashes_serialise_common(storage,
> +                                                  context->s2po_index,
> + 
> librdf_statement_get_subject(statement),
> + 
> LIBRDF_STATEMENT_PREDICATE|LIBRDF_STATEMENT_OBJECT);
> +
>     } else {
>       statement=librdf_new_statement_from_statement(statement);
>       if(!statement)
> @@ -1089,13 +1107,20 @@
>            librdf_free_node(node);
>         break;
> 
> +    case (LIBRDF_STATEMENT_PREDICATE|LIBRDF_STATEMENT_OBJECT): /* s2po */
> +      if((node=librdf_statement_get_predicate(&context->statement)))
> +         librdf_free_node(node);
> +      if((node=librdf_statement_get_object(&context->statement)))
> +         librdf_free_node(node);

So here you free 2 nodes....

> +      break;
> +
>       case (LIBRDF_STATEMENT_SUBJECT|LIBRDF_STATEMENT_OBJECT): /* p2so */
>         if((node=librdf_statement_get_subject(&context->statement)))
>            librdf_free_node(node);
> @@ -1124,6 +1149,17 @@
>         node=librdf_statement_get_object(&context->statement);
>         break;
> 
> +    case (LIBRDF_STATEMENT_PREDICATE|LIBRDF_STATEMENT_OBJECT): /* s2po */
> +      /* fill in the only blank from the node stored in our context */
> +      node=librdf_new_node_from_node(context->search_node);
> +      if(!node)
> +        return NULL;

here you allocate 1 new one

> +      librdf_statement_set_subject(&context->statement2, node);
> +      librdf_statement_set_predicate(&context->statement2, 
> librdf_statement_get_predicate(&context->statement));
> +      librdf_statement_set_object(&context->statement2, 
> librdf_statement_get_object(&context->statement));

here you copy 2 (probably shared)

> +      return (void*)&context->statement2;
> +      break;
> +
>       case (LIBRDF_STATEMENT_SUBJECT|LIBRDF_STATEMENT_OBJECT): /* p2so */
>         librdf_statement_set_subject(&context->statement2, 
> librdf_statement_get_subject(&context->statement));
>         /* fill in the only blank from the node stored in our context */
> @@ -1237,6 +1273,11 @@
>         librdf_statement_set_predicate(&icontext->statement, node2);
>         break;
> 
> +    case (LIBRDF_STATEMENT_PREDICATE|LIBRDF_STATEMENT_OBJECT): /* s2po */
> +      icontext->search_node=librdf_new_node_from_node(node1);
> +      librdf_statement_set_subject(&icontext->statement, node1);
> +      break;
> +
>       case (LIBRDF_STATEMENT_SUBJECT|LIBRDF_STATEMENT_OBJECT): /* p2so */
>         icontext->search_node=librdf_new_node_from_node(node1);
>         librdf_statement_set_predicate(&icontext->statement, node1);

That's my initial skim.  I can't go into much detail right now.  These things need a debugger
and maybe a memory checker such as valgrind or the dmalloc library.

Dave