RDFa linting/validation [Re: Tomorrow: Social Web XG Meeting cancelled again]
Toby Inkster
tai at g5n.co.uk
Wed May 5 15:30:59 CEST 2010
On Tue, 4 May 2010 21:53:23 +0200
Dan Brickley <danbri at danbri.org> wrote:
> Suggested 'homework' for the week off: think about rdfa validators
> and how they could be made modular to help publishers check which
> consumer sites will understand their data...
This is an interesting idea, and one I'd be interested in working on. I
think that the Perl RDF modules on CPAN cover this area quite well, and
it would not be too difficult to get a pretty useful tool working. Basic
technique:
1. We need RDF::RDFa::Linter::Google, RDF::RDFa::Linter::Facebook, etc.
These would provide the following methods:
* RDF::RDFa::Linter::Foo->usual_prefixes
A list of the vocabs used by the service, with the prefixes
they're "normally" bound to, as recommended by the service's
documentation.
* RDF::RDFa::Linter::Foo->filter
Given a stream of triples, should filter some out to leave only
triples that are thought to be understood by the service. e.g.
RDF::RDFa::Linter::Facebook would leave only the OGP triples.
* RDF::RDFa::Linter::Foo->required_predicates($class)
For any given class URI, returns a list of URIs of predicates
that the service considers to be required. e.g. some services
might insist that all foaf:Person instances have a foaf:name.
2. We need RDF::RDFa::Linter which uses RDF::RDFa::Parser to parse an
RDFa document, auto-correcting missing CURIE bindings (but remembering
the error) using the information from a service's 'usual_prefixes'
method, filters using the service's 'filter' function, and generates
warnings based on the 'required_predicates'.
3. We need RDF::RDFa::Writer::Pretty to take an RDF graph and write it
out as pretty, human-readable RDFa. As well as the graph, it should be
able to take a collection of warnings (each of which has a subject
resource URI or bnode identifier associated with it), and include them
in the output.
4. Lastly, we'd need to wrap it up in a web form that asks for a URI
and then presents the results of the various linters in a nice, tabbed
interface.
This seems like a lot of "we need" rather than "we have" given that
I've already said that the Perl RDF modules cover a lot of what we
need. What we already have that should prove useful:
- An RDF parser that supports tag soup HTML, and
provides onprefix, oncurie and ontriple callbacks
that would be needed for this; and
- A decent framework for querying the resulting
graph for the "required predicates".
I'm going to have a go at #3 this evening on my train journey, because
I've been wanting something similar for other purposes anyway.
--
Toby A Inkster
<mailto:mail at tobyinkster.co.uk>
<http://tobyinkster.co.uk>
More information about the Dev
mailing list