RDBMS, Virtuoso, libb, etc (was: Re: best db schema)
Kjetil Kjernsmo
kjetil at kjernsmo.net
Thu Feb 5 21:47:36 CET 2009
On Thursday 05 February 2009, Robin Berjon wrote:
> I was wondering if anyone here had ideas about the best DB schema one
> can use to store RDF in an RDBMS? A lot of those I've seen seem to
> be crazy with the self-joining, so I was wondering if there were
> better ideas.
We've been thinking quite a lot about this and decided it is the wrong
question. :-) Actually, professor Ian Horrocks told me this a while
ago, but who trust an academic to have any informed opinions about such
practical matters anyway? ;-) I once asked if the rigorous definition
of RDF would help researchers create more optimized database engines
specifically for RDF and he said that no new research was needed, only
a deeper understanding of database theory beyond the relational model.
I've been living in Java hell for a while now, but at least I've gotten
hands-on experience with a variety of database schemas. Jena can use
several RDBMSes natively, and we have used Postgres, but for many
applications (it is lightning fast for some), we have suffered many
performance problems. Jena has also a SDB concept, where you can choose
different database schemas based on your intended application. I'm sure
these are lightning fast for other things, but we found no overall
speedup over the classical RDB for our application. So, I've tried
those four different schemas, in addition to Redland's, and none were
really good. Prof Horrocks was probably right.
However, there is some potential anyway. One strategy is to throw
overboard the flexibility of the RDF model and create conventional
schemas to SPARQL. For some applications, this is reported to work very
well.
What also seems to work pretty OK is to retain essentially the RDBMS,
but optimise the database engine a lot for RDF. This is what Virtuoso
does, and our preliminary tests seems to get us a factor of 5 speedup
over the Jena+Pg-solution for important queries in the application.
Then another option is to hack the engine, which is, as far as I can
see, what Owlgres is about. I haven't seen it benchmarked, but I've
heard good things about it.
Then, one could heed Horrocks' advice and look outside of the relational
model alltogether. Greg is working on Hexastore support in RDF::Query,
which is one such idea: http://i.cs.hku.hk/~pkarras/hexastore.pdf
I don't know how this idea goes with a persistent model, though. Then,
Greg also tried Andrea Marchesini's libb, which I'm sure is much more
familiar to some here than to me. Greg can probably fill in the details
about this, but it didn't get very far, as far as I understood.
Anyway, I think we have three interesting directions here, Hexastore for
a very efficient memory model, libb for a lightweight persistent model
and Virtuoso for a heavyweight database solution.
With these implemented, I think we'd have the right tools for most RDF
jobs, but then, who has the time...?
Cheers,
Kjetil
--
Kjetil Kjernsmo
Programmer / Astrophysicist / Ski-orienteer / Orienteer / Mountaineer
kjetil at kjernsmo.net
Homepage: http://www.kjetil.kjernsmo.net/ OpenPGP KeyID: 6A6A0BBC
More information about the Dev
mailing list