RDBMS, Virtuoso, libb, etc (was: Re: best db schema)

Gregory Williams greg at evilfunhouse.com
Thu Feb 5 23:24:18 CET 2009


I would tend to agree with Kjetil on "it is the wrong question". It  
all depends on the type of data and queries you expect. All of them  
have their pros and cons, but here are the big ones from my perspective:

(1) one big triples table
   pro: simple, and doesn't have to jump through any hoops for  
multiple values, or queries over unbound predicates.
   con: many self-joins, big indices (compared to vertical  
partitioning, for example)

(2) vertical partitioning (one table per predicate, Abadi, et. al  
claim this to be the best, but there's some huge caveats)
   pro: can be very fast for queries with bound predicates, but needs  
extra indices to compete on certain query types (c.f. hexastore)
   con: horrible for query answering with unbound predicates

(3) property tables (roughly one table per class -- like  
Person(uri,name,homepage, ...))
   pro: few self-joins
   con: has trouble with multiple or missing values for any of the  
properties, and with schema-less data

(4) hexastore (non-relational)
   pro: very fast querying on any query type
   con: huge memory footprint, slow loading time, currently only in- 
memory (i believe it can be generalized to disk-based, but don't  
believe there's any work on this yet)

Anyone please correct me if I've misrepresented any of the approaches  
above -- I only have serious experience with (1) and (4).

More below...

On Feb 5, 2009, at 3:47 PM, Kjetil Kjernsmo wrote:

> We've been thinking quite a lot about this and decided it is the wrong
> question. :-) Actually, professor Ian Horrocks told me this a while
> ago, but who trust an academic to have any informed opinions about  
> such
> practical matters anyway? ;-) I once asked if the rigorous definition
> of RDF would help researchers create more optimized database engines
> specifically for RDF and he said that no new research was needed, only
> a deeper understanding of database theory beyond the relational model.

...

> Then, one could heed Horrocks' advice and look outside of the  
> relational
> model alltogether. Greg is working on Hexastore support in RDF::Query,
> which is one such idea: http://i.cs.hku.hk/~pkarras/hexastore.pdf

I've got a pure-perl implementation that works, and am very close to  
finishing a C implementation that should blow it away in terms of  
speed and memory footprint (although since it's a hexastore, the  
memory is still high relative to many other triple stores). Both are  
available in the perlrdf github repo, but the C version doesn't have  
any working perl bindings yet.

I've been told that Weiss, et. al are going to be releasing their  
hexastore code (discussed in the paper), but this hasn't happened yet,  
and I believe it's only in Python.

> I don't know how this idea goes with a persistent model, though.

As mentioned above, I believe it can be extended, but need some more  
thinking on this.

> Then, Greg also tried Andrea Marchesini's libb, which I'm sure is  
> much more
> familiar to some here than to me. Greg can probably fill in the  
> details
> about this, but it didn't get very far, as far as I understood.

Couldn't get it to work on simple tests, so didn't spend the time to  
dig deep into investigating where the problem was (probably with my  
code, but hard to manage without good documentation). Had planned to  
wrap libb up with perl wrappers, but now am spending most of my time  
on the hexastore stuff.


> Anyway, I think we have three interesting directions here, Hexastore  
> for
> a very efficient memory model, libb for a lightweight persistent model
> and Virtuoso for a heavyweight database solution.

Mostly agree with this, but not sure how realistic the libb stuff is.

> With these implemented, I think we'd have the right tools for most RDF
> jobs, but then, who has the time...?

libb is probably a ways off, but Virtuoso can be used now, and  
hexastore will be usable very shortly (or, is usable now if you just  
want the perl version, but the memory requirements make it more of a  
toy than a real in-memory triplestore).


.greg

-- 
"Writing code on one line is like playing
  the trumpet without breathing!"
     - Adam Pisoni



More information about the Dev mailing list