RDBMS, Virtuoso, libb, etc (was: Re: best db schema)
Gregory Williams
greg at evilfunhouse.com
Thu Feb 5 23:24:18 CET 2009
I would tend to agree with Kjetil on "it is the wrong question". It
all depends on the type of data and queries you expect. All of them
have their pros and cons, but here are the big ones from my perspective:
(1) one big triples table
pro: simple, and doesn't have to jump through any hoops for
multiple values, or queries over unbound predicates.
con: many self-joins, big indices (compared to vertical
partitioning, for example)
(2) vertical partitioning (one table per predicate, Abadi, et. al
claim this to be the best, but there's some huge caveats)
pro: can be very fast for queries with bound predicates, but needs
extra indices to compete on certain query types (c.f. hexastore)
con: horrible for query answering with unbound predicates
(3) property tables (roughly one table per class -- like
Person(uri,name,homepage, ...))
pro: few self-joins
con: has trouble with multiple or missing values for any of the
properties, and with schema-less data
(4) hexastore (non-relational)
pro: very fast querying on any query type
con: huge memory footprint, slow loading time, currently only in-
memory (i believe it can be generalized to disk-based, but don't
believe there's any work on this yet)
Anyone please correct me if I've misrepresented any of the approaches
above -- I only have serious experience with (1) and (4).
More below...
On Feb 5, 2009, at 3:47 PM, Kjetil Kjernsmo wrote:
> We've been thinking quite a lot about this and decided it is the wrong
> question. :-) Actually, professor Ian Horrocks told me this a while
> ago, but who trust an academic to have any informed opinions about
> such
> practical matters anyway? ;-) I once asked if the rigorous definition
> of RDF would help researchers create more optimized database engines
> specifically for RDF and he said that no new research was needed, only
> a deeper understanding of database theory beyond the relational model.
...
> Then, one could heed Horrocks' advice and look outside of the
> relational
> model alltogether. Greg is working on Hexastore support in RDF::Query,
> which is one such idea: http://i.cs.hku.hk/~pkarras/hexastore.pdf
I've got a pure-perl implementation that works, and am very close to
finishing a C implementation that should blow it away in terms of
speed and memory footprint (although since it's a hexastore, the
memory is still high relative to many other triple stores). Both are
available in the perlrdf github repo, but the C version doesn't have
any working perl bindings yet.
I've been told that Weiss, et. al are going to be releasing their
hexastore code (discussed in the paper), but this hasn't happened yet,
and I believe it's only in Python.
> I don't know how this idea goes with a persistent model, though.
As mentioned above, I believe it can be extended, but need some more
thinking on this.
> Then, Greg also tried Andrea Marchesini's libb, which I'm sure is
> much more
> familiar to some here than to me. Greg can probably fill in the
> details
> about this, but it didn't get very far, as far as I understood.
Couldn't get it to work on simple tests, so didn't spend the time to
dig deep into investigating where the problem was (probably with my
code, but hard to manage without good documentation). Had planned to
wrap libb up with perl wrappers, but now am spending most of my time
on the hexastore stuff.
> Anyway, I think we have three interesting directions here, Hexastore
> for
> a very efficient memory model, libb for a lightweight persistent model
> and Virtuoso for a heavyweight database solution.
Mostly agree with this, but not sure how realistic the libb stuff is.
> With these implemented, I think we'd have the right tools for most RDF
> jobs, but then, who has the time...?
libb is probably a ways off, but Virtuoso can be used now, and
hexastore will be usable very shortly (or, is usable now if you just
want the perl version, but the memory requirements make it more of a
toy than a real in-memory triplestore).
.greg
--
"Writing code on one line is like playing
the trumpet without breathing!"
- Adam Pisoni
More information about the Dev
mailing list