RDBMS, Virtuoso, libb, etc (was: Re: best db schema)

Kjetil Kjernsmo kjetil at kjernsmo.net
Thu Feb 5 21:47:36 CET 2009


On Thursday 05 February 2009, Robin Berjon wrote:
> I was wondering if anyone here had ideas about the best DB schema one
>   can use to store RDF in an RDBMS? A lot of those I've seen seem to
> be crazy with the self-joining, so I was wondering if there were
> better ideas.

We've been thinking quite a lot about this and decided it is the wrong 
question. :-) Actually, professor Ian Horrocks told me this a while 
ago, but who trust an academic to have any informed opinions about such 
practical matters anyway? ;-) I once asked if the rigorous definition 
of RDF would help researchers create more optimized database engines 
specifically for RDF and he said that no new research was needed, only 
a deeper understanding of database theory beyond the relational model. 

I've been living in Java hell for a while now, but at least I've gotten 
hands-on experience with a variety of database schemas. Jena can use 
several RDBMSes natively, and we have used Postgres, but for many 
applications (it is lightning fast for some), we have suffered many 
performance problems. Jena has also a SDB concept, where you can choose 
different database schemas based on your intended application. I'm sure 
these are lightning fast for other things, but we found no overall 
speedup over the classical RDB for our application. So, I've tried 
those four different schemas, in addition to Redland's, and none were 
really good. Prof Horrocks was probably right.

However, there is some potential anyway. One strategy is to throw 
overboard the flexibility of the RDF model and create conventional 
schemas to SPARQL. For some applications, this is reported to work very 
well. 

What also seems to work pretty OK is to retain essentially the RDBMS, 
but optimise the database engine a lot for RDF. This is what Virtuoso 
does, and our preliminary tests seems to get us a factor of 5 speedup 
over the Jena+Pg-solution for important queries in the application. 
Then another option is to hack the engine, which is, as far as I can 
see, what Owlgres is about. I haven't seen it benchmarked, but I've 
heard good things about it.

Then, one could heed Horrocks' advice and look outside of the relational 
model alltogether. Greg is working on Hexastore support in RDF::Query, 
which is one such idea: http://i.cs.hku.hk/~pkarras/hexastore.pdf

I don't know how this idea goes with a persistent model, though. Then, 
Greg also tried Andrea Marchesini's libb, which I'm sure is much more 
familiar to some here than to me. Greg can probably fill in the details 
about this, but it didn't get very far, as far as I understood.

Anyway, I think we have three interesting directions here, Hexastore for 
a very efficient memory model, libb for a lightweight persistent model 
and Virtuoso for a heavyweight database solution. 

With these implemented, I think we'd have the right tools for most RDF 
jobs, but then, who has the time...?

Cheers,

Kjetil
-- 
Kjetil Kjernsmo
Programmer / Astrophysicist / Ski-orienteer / Orienteer / Mountaineer
kjetil at kjernsmo.net
Homepage: http://www.kjetil.kjernsmo.net/     OpenPGP KeyID: 6A6A0BBC


More information about the Dev mailing list