[redland-dev] Creating additional storage hashes
Dave Beckett
dave.beckett at bristol.ac.uk
Mon Jun 23 17:09:26 BST 2003
On Sun, 15 Jun 2003 15:14:31 -0600
Jason Johnston <redland at lojjic.net> wrote:
> Dave Beckett wrote:
> >>So... is there a simple way to create a s2po hash? And, if not, is
> >>there a difficult way to create it? ;-) I'm not opposed to modifying
> >>the C source to accomplish what I need.
> >
> > Oh yes :)
> >
> > Let me enter tutorial mode.
> >
> > [snip fantastic tutorial]
>
> Thanks so much for your detailed instructions, they were a great help!
Phew. My brain dump had enough structure/clues :)
> From them, I was able to successfully create a s2po indexed hash. It
> did take some extra work than you gave in your walkthrough to make
> find_statements use the new hash (there are several places where it
> forks the codepath depending on what nodes are provided so I needed to
> add a fork), but luckily there was already code for the optional p2so
> hash so I copied that everywhere it appeared. ...
I expected there would be something more, but enough similar code was
there.
> ... It appeared to work when
> I tested find_statements with only the subject node known on the new set
> of BDB hashes.
>
> However, when I tried repeating the find_statements call over a list of
> several subject nodes, it fails. Sometimes I just get "Segmentation
> fault", and the most detailed error I've gotten is:
> rdf_node.c:381:librdf_node_from_node: fatal error: Do not know how to
> copy node type 0
That's a node copying/reference issue I'm expecting. You have
to take care when you are returing a pointer to a shared node
(in streams, iterators) and when you need to return a new one.
> I'm at a loss where to begin tracking down what's going wrong, perhaps
> I'm missing something or there's another part of the library that
> expects one of the three normal hashes? It just seems strange that it
> can work once or twice and fail after that. I can send you the
> particular files and scripts I'm using via your personal email if it
> would help track it down.
In unformatted patch would be better
> Thanks again for your assistance.
> --Jason Johnston
>
> --------------------------------------------------
>
> Here's the patch I'm using (sorry if it wraps, I can send it to you
> separately if you need):
>
> diff -u -r1.49 rdf_storage_hashes.c
> --- librdf/rdf_storage_hashes.c 15 Apr 2003 22:26:13 -0000 1.49
> +++ librdf/rdf_storage_hashes.c 16 May 2003 02:19:04 -0000
> @@ -52,10 +52,13 @@
> {"po2s",
> LIBRDF_STATEMENT_PREDICATE|LIBRDF_STATEMENT_OBJECT,
> LIBRDF_STATEMENT_SUBJECT}, /* For 'get sources' */
> - {"so2p",
> + {"so2p",
> LIBRDF_STATEMENT_SUBJECT|LIBRDF_STATEMENT_OBJECT,
> LIBRDF_STATEMENT_PREDICATE}, /* For 'get arcs' */
> - {"p2so",
> + {"s2po",
> + LIBRDF_STATEMENT_SUBJECT,
> + LIBRDF_STATEMENT_PREDICATE|LIBRDF_STATEMENT_OBJECT}, /* For '(s, ?,
> ?)' */
> + {"p2so",
> LIBRDF_STATEMENT_PREDICATE,
> LIBRDF_STATEMENT_SUBJECT|LIBRDF_STATEMENT_OBJECT}, /* For '(?, p,
> ?)' */
> {"contexts",
> @@ -107,6 +110,7 @@
> int targets_index;
>
> int p2so_index;
> + int s2po_index;
>
> /* If this is non-0, contexts are being used */
> int index_contexts;
> @@ -244,7 +248,7 @@
> context->options=options;
>
> /* Work out the number of hashes for allocating stuff below */
> - hash_count=3;
> + hash_count=4;
>
> if((index_contexts=librdf_hash_get_as_boolean(options, "contexts"))<0)
> index_contexts=0; /* default is no contexts */
> @@ -278,7 +282,7 @@
> return 1;
> }
>
> - for(i=0; i<3; i++) {
> + for(i=0; i<4; i++) {
> status=librdf_storage_hashes_register(storage, name,
>
> &librdf_storage_hashes_descriptions[i]);
> if(status)
> @@ -299,6 +303,7 @@
> context->arcs_index= -1;
> context->targets_index= -1;
> context->p2so_index= -1;
> + context->s2po_index= -1;
> /* and index for contexts (no key or value fields) */
> context->contexts_index= -1;
>
> @@ -325,6 +330,9 @@
> } else if(key_fields == LIBRDF_STATEMENT_PREDICATE &&
> value_fields ==
> (LIBRDF_STATEMENT_SUBJECT|LIBRDF_STATEMENT_OBJECT)) {
> context->p2so_index=i;
> + } else if(key_fields == LIBRDF_STATEMENT_SUBJECT &&
> + value_fields ==
> (LIBRDF_STATEMENT_PREDICATE|LIBRDF_STATEMENT_OBJECT)) {
> + context->s2po_index=i;
> } else if(!key_fields || !value_fields) {
> context->contexts_index=i;
> }
> @@ -984,6 +992,16 @@
> context->p2so_index,
>
> librdf_statement_get_predicate(statement),
>
> LIBRDF_STATEMENT_SUBJECT|LIBRDF_STATEMENT_OBJECT);
> + } else if(librdf_statement_get_subject(statement) &&
> + !librdf_statement_get_predicate(statement) &&
> + !librdf_statement_get_object(statement) &&
> + context->s2po_index >= 0) {
> + /* (s ? ?) -> (s p o) wanted */
> + stream=librdf_storage_hashes_serialise_common(storage,
> + context->s2po_index,
> +
> librdf_statement_get_subject(statement),
> +
> LIBRDF_STATEMENT_PREDICATE|LIBRDF_STATEMENT_OBJECT);
> +
> } else {
> statement=librdf_new_statement_from_statement(statement);
> if(!statement)
> @@ -1089,13 +1107,20 @@
> librdf_free_node(node);
> break;
>
> + case (LIBRDF_STATEMENT_PREDICATE|LIBRDF_STATEMENT_OBJECT): /* s2po */
> + if((node=librdf_statement_get_predicate(&context->statement)))
> + librdf_free_node(node);
> + if((node=librdf_statement_get_object(&context->statement)))
> + librdf_free_node(node);
So here you free 2 nodes....
> + break;
> +
> case (LIBRDF_STATEMENT_SUBJECT|LIBRDF_STATEMENT_OBJECT): /* p2so */
> if((node=librdf_statement_get_subject(&context->statement)))
> librdf_free_node(node);
> @@ -1124,6 +1149,17 @@
> node=librdf_statement_get_object(&context->statement);
> break;
>
> + case (LIBRDF_STATEMENT_PREDICATE|LIBRDF_STATEMENT_OBJECT): /* s2po */
> + /* fill in the only blank from the node stored in our context */
> + node=librdf_new_node_from_node(context->search_node);
> + if(!node)
> + return NULL;
here you allocate 1 new one
> + librdf_statement_set_subject(&context->statement2, node);
> + librdf_statement_set_predicate(&context->statement2,
> librdf_statement_get_predicate(&context->statement));
> + librdf_statement_set_object(&context->statement2,
> librdf_statement_get_object(&context->statement));
here you copy 2 (probably shared)
> + return (void*)&context->statement2;
> + break;
> +
> case (LIBRDF_STATEMENT_SUBJECT|LIBRDF_STATEMENT_OBJECT): /* p2so */
> librdf_statement_set_subject(&context->statement2,
> librdf_statement_get_subject(&context->statement));
> /* fill in the only blank from the node stored in our context */
> @@ -1237,6 +1273,11 @@
> librdf_statement_set_predicate(&icontext->statement, node2);
> break;
>
> + case (LIBRDF_STATEMENT_PREDICATE|LIBRDF_STATEMENT_OBJECT): /* s2po */
> + icontext->search_node=librdf_new_node_from_node(node1);
> + librdf_statement_set_subject(&icontext->statement, node1);
> + break;
> +
> case (LIBRDF_STATEMENT_SUBJECT|LIBRDF_STATEMENT_OBJECT): /* p2so */
> icontext->search_node=librdf_new_node_from_node(node1);
> librdf_statement_set_predicate(&icontext->statement, node1);
That's my initial skim. I can't go into much detail right now. These things need a debugger
and maybe a memory checker such as valgrind or the dmalloc library.
Dave
More information about the redland-dev
mailing list