[redland-dev] Corrupting the mysql store with illegal statements
Dave Beckett
dave at dajobe.org
Mon Jun 19 07:19:28 BST 2006
# 1151284177362691492Stian Soiland wrote way back on 24th May 2006:
> I am quite new to both RDF and Redland, and so I managed to find a bug
> straight away.
>
> I have installed redland-1.0.4 using redland-bindings-1.0.4.1 and
> rasqal-0.9.12.
> I am using redland through the Python bindings, which seems great.
>
> With a mysql store, the following script "corrupts" the backend:
>
>
>
> import RDF
>
> _options_string="""host='localhost',
> user='repository', password='fishme',
> database='repository'"""
>
> storage = RDF.Storage(storage_name="mysql", name="fish",
> options_string=_options_string + ", new='true'")
>
> model = RDF.Model(storage)
> #model = RDF.Model()
>
> # A normal thing
> subject = RDF.Uri("mailto:stian at soiland.no")
> predicate = RDF.Uri("http://soiland.no/likes")
> object = RDF.Node("Gaby")
> model.append(RDF.Statement(subject, predicate, object))
>
> # A weird one, using a Literal as a subject!
> subject = object # ie. "Gaby"
> object = RDF.Node("Stian")
> model.append(RDF.Statement(subject, predicate, object))
>
> # And this is a normal one again
> subject = RDF.Uri("mailto:stian at soiland.no")
> predicate = RDF.Uri("http://soiland.no/kisses")
> object = RDF.Node("Gaby")
> model.append(RDF.Statement(subject, predicate, object))
>
>
> # Should now print 3 lines, or at least 2
> for s in model:
> print s
Yes, confirmed found in current sources.
> This will print out just *1* statement. As a fresh user it was kind of
> difficult to understand what was wrong at first, but the I realized I
> had added a Literal as a subject. Both subjects and predicates should
> be RDF.Uri objects, not literals.
Close, subjects can be blank nodes too. But literals should not be
subjects, and something is going wrong here.
> It seems that adding the illegal statement works fine, inspecting the
> database manually yields all statements and literals, even #3.
yes, I saw that.
> However, when fetching things back, librdf will stop when coming to
> the illegal statement #2. This is confirmed to be a problem in the C
> backend by calling the functions Redland.librdf_stream_get_object and
> librdf_stream_next manually, it stops at the second statement.
>
> Note that if you switch to the memory model (uncommented in this code)
> everything works, even adding the illegal statement.
>
> If you switch to the sqlite backend you'll get a SQL error instead:
>
> : stain at rpc268 ~;python problem.py
> Traceback (most recent call last):
> File "problem.py", line 22, in ?
> model.append(RDF.Statement(subject, predicate, object))
> File "/usr/lib/python2.4/site-packages/RDF.py", line 748, in append
> self.add_statement(statement, context)
> File "/usr/lib/python2.4/site-packages/RDF.py", line 732, in add_statement
> statement._statement)
> RDF.RedlandError: 'SQLite database fish SQL exec \'SELECT COUNT(*)
> FROM triples WHERE =1 AND predicateUri=2 AND objectLiteral=2;\' failed
> - near "=": syntax error (1)'
>
>
> I guess if predicates and subjects can't be literals, this should be
> enforced when creating or adding the statement?
Yes it should be enforced.
I've added enforcement to librdf_add_statement and
librdf_contains_statement so that illegal statements are never stored or
found (never reach the storage layer) and when adding streams of
statements, illegal statements are skipped. That should stop internal
errors in the sqlite backend like the above.
The changes were made in SVN revisions 10988 rdf_model.c, 10989
rdf_statement.c and 10990 rdf_storage.c in the dir
http://svn.librdf.org/view/librdf/trunk/librdf/
> But as this has worked for memory models and for adding to mysql
> backends, the storage should be maybe able to read back such stuff?
These now give the same answer - all returning 2 triples; discarding the
illegal one.
> Btw, the _options_string is a bit weird, is there a nice way to use a
> normal dict()?
If somebody writes the patch :) it shouldn't be hard to construct such
the string in python. It's a string because that works across every
language binding but I expected people would want to make it more
natural when languages had hashes/dicts/assoc arrays/better parameter
functionality.
Thanks for the report.
Dave
More information about the redland-dev
mailing list