[redland-dev] rapper/raptor/redland/rdflib behavior

Wed May 6 21:18:13 CEST 2009

Hi folks,

I'm quite new to using/parsing/querying RDF.  I have a few questions:

(1) I have a 30MB RDF in XML file.  When I parse it with rapper and do
a triple count, it comes back quite quickly:

time rapper -i rdfxml -c my-rdf.xml
~1.5 seconds

Incidentally, the file has about 467000 triples.

(2) If I convert (parse + serialize) that file to ntriples, it is also
very fast:
time rapper -i rdfxml -o ntriples my-rdf.xml > my-rdf.ntrip
~2 seconds

(3) If I convert that file to dot format, it takes a long time:
15 minutes and counting.

(4) If I use the python redland binding and attempt to count the
triples, it takes a *long* time (2.5 hours):
import RDF

redinitstore = RDF.Storage(storage_name="file",
                           name="my-rdf.xml",
                           options_string="contexts='yes'")
redinitmodel = RDF.Model(redinitstore)

print len(redinitmodel)

(5) If I try to use a redland model with SQL storage (postgresql) in
rdflib (the python wrapper around the RDF redland bindings),
I get a segmentation fault after quite a while. [note, the goal here
was to read in an RDF/XML file and store it in postgresql for
convenient access]

import RDF   # redland
import rdflib
from rdflib.store import Store
from rdflib.Graph import Graph

g = Graph()
g.parse(filename)
redstore = RDF.Storage(storage_name="postgresql",
                                    name="db1",
                                    options_string="new='yes',"
                                      "host='xxx',"
                                      "database='xxx',"
                                      "user='xxx',"
                                      "password='xxx'")
redmodel = RDF.Model(redstore)
store = rdflib.plugin.get("Redland", Store)(redmodel)
store.open("",create=True)
graph = Graph(store)
graph += g
graph.commit()
store.close()

That leads me to a couple questions:

(1) From within python, what is the "right" programmatic way to
convert (parse+serialize) an RDF document?
(2) For question 1:  can it be done reliably with redland-bindings and
with rdfLib?  (side note, has anyone tried using rdfAlchemy for
pythonic access to RDF?)
(3) What happens when we go from RDF parsing to RDF graph structure
that "takes a long time"?  I assume this same problem is affecting my
python code (scenario 5 above and is also affecting the dot
serialization)?

Thanks,
Mark